langchain/docs/extras/integrations/providers/xinference.mdx

# Xorbits Inference (Xinference)

This page demonstrates how to use [Xinference](https://github.com/xorbitsai/inference)
with LangChain.

`Xinference` is a powerful and versatile library designed to serve LLMs, 
speech recognition models, and multimodal models, even on your laptop. 
With Xorbits Inference, you can effortlessly deploy and serve your or 
state-of-the-art built-in models using just a single command.

## Installation and Setup

Xinference can be installed via pip from PyPI: 

```bash
pip install "xinference[all]"
```

## LLM

Xinference supports various models compatible with GGML, including chatglm, baichuan, whisper, 
vicuna, and orca. To view the builtin models, run the command:

```bash
xinference list --all
```


### Wrapper for Xinference

You can start a local instance of Xinference by running:

```bash
xinference
```

You can also deploy Xinference in a distributed cluster. To do so, first start an Xinference supervisor
on the server you want to run it:

```bash
xinference-supervisor -H "${supervisor_host}"
```


Then, start the Xinference workers on each of the other servers where you want to run them on:

```bash
xinference-worker -e "http://${supervisor_host}:9997"
```

You can also start a local instance of Xinference by running:

```bash
xinference
```

Once Xinference is running, an endpoint will be accessible for model management via CLI or 
Xinference client. 

For local deployment, the endpoint will be http://localhost:9997. 


For cluster deployment, the endpoint will be http://${supervisor_host}:9997.


Then, you need to launch a model. You can specify the model names and other attributes 
including model_size_in_billions and quantization. You can use command line interface (CLI) to 
do it. For example, 

```bash
xinference launch -n orca -s 3 -q q4_0
```

A model uid will be returned.

Example usage:

```python
from langchain.llms import Xinference

llm = Xinference(
    server_url="http://0.0.0.0:9997",
    model_uid = {model_uid} # replace model_uid with the model UID return from launching the model
)

llm(
    prompt="Q: where can we visit in the capital of France? A:",
    generate_config={"max_tokens": 1024, "stream": True},
)

```

### Usage

For more information and detailed examples, refer to the
[example for xinference LLMs](/docs/integrations/llms/xinference.html)

### Embeddings

Xinference also supports embedding queries and documents. See
[example for xinference embeddings](/docs/integrations/text_embedding/xinference.html) 
for a more detailed demo.
FEAT: Integrate Xinference LLMs and Embeddings (#8171) - [Xorbits Inference(Xinference)](https://github.com/xorbitsai/inference) is a powerful and versatile library designed to serve language, speech recognition, and multimodal models. Xinference supports a variety of GGML-compatible models including chatglm, whisper, and vicuna, and utilizes heterogeneous hardware and a distributed architecture for seamless cross-device and cross-server model deployment. - This PR integrates Xinference models and Xinference embeddings into LangChain. - Dependencies: To install the depenedencies for this integration, run `pip install "xinference[all]"` - Example Usage: To start a local instance of Xinference, run `xinference`. To deploy Xinference in a distributed cluster, first start an Xinference supervisor using `xinference-supervisor`: `xinference-supervisor -H "${supervisor_host}"` Then, start the Xinference workers using `xinference-worker` on each server you want to run them on. `xinference-worker -e "http://${supervisor_host}:9997"` To use Xinference with LangChain, you also need to launch a model. You can use command line interface (CLI) to do so. Fo example: `xinference launch -n vicuna-v1.3 -f ggmlv3 -q q4_0`. This launches a model named vicuna-v1.3 with `model_format="ggmlv3"` and `quantization="q4_0"`. A model UID is returned for you to use. Now you can use Xinference with LangChain: ```python from langchain.llms import Xinference llm = Xinference( server_url="http://0.0.0.0:9997", # suppose the supervisor_host is "0.0.0.0" model_uid = {model_uid} # model UID returned from launching a model ) llm( prompt="Q: where can we visit in the capital of France? A:", generate_config={"max_tokens": 1024}, ) ``` You can also use RESTful client to launch a model: ```python from xinference.client import RESTfulClient client = RESTfulClient("http://0.0.0.0:9997") model_uid = client.launch_model(model_name="vicuna-v1.3", model_size_in_billions=7, quantization="q4_0") ``` The following code block demonstrates how to use Xinference embeddings with LangChain: ```python from langchain.embeddings import XinferenceEmbeddings xinference = XinferenceEmbeddings( server_url="http://0.0.0.0:9997", model_uid = model_uid ) ``` ```python query_result = xinference.embed_query("This is a test query") ``` ```python doc_result = xinference.embed_documents(["text A", "text B"]) ``` Xinference is still under rapid development. Feel free to [join our Slack community](https://xorbitsio.slack.com/join/shared_invite/zt-1z3zsm9ep-87yI9YZ_B79HLB2ccTq4WA) to get the latest updates! - Request for review: @hwchase17, @baskaryan - Twitter handle: https://twitter.com/Xorbitsio --------- Co-authored-by: Bagatur <baskaryan@gmail.com> 2023-07-28 04:23:19 +00:00			`# Xorbits Inference (Xinference)`

			`This page demonstrates how to use [Xinference](https://github.com/xorbitsai/inference)`
			`with LangChain.`

			`Xinference` is a powerful and versatile library designed to serve LLMs,
			`speech recognition models, and multimodal models, even on your laptop.`
			`With Xorbits Inference, you can effortlessly deploy and serve your or`
			`state-of-the-art built-in models using just a single command.`

			`## Installation and Setup`

			`Xinference can be installed via pip from PyPI:`

			```bash
			`pip install "xinference[all]"`
			```

			`## LLM`

			`Xinference supports various models compatible with GGML, including chatglm, baichuan, whisper,`
			`vicuna, and orca. To view the builtin models, run the command:`

			```bash
			`xinference list --all`
			```


			`### Wrapper for Xinference`

			`You can start a local instance of Xinference by running:`

			```bash
			`xinference`
			```

			`You can also deploy Xinference in a distributed cluster. To do so, first start an Xinference supervisor`
			`on the server you want to run it:`

			```bash
			`xinference-supervisor -H "${supervisor_host}"`
			```


			`Then, start the Xinference workers on each of the other servers where you want to run them on:`

			```bash
			`xinference-worker -e "http://${supervisor_host}:9997"`
			```

			`You can also start a local instance of Xinference by running:`

			```bash
			`xinference`
			```

			`Once Xinference is running, an endpoint will be accessible for model management via CLI or`
			`Xinference client.`

			`For local deployment, the endpoint will be http://localhost:9997.`


			`For cluster deployment, the endpoint will be http://${supervisor_host}:9997.`


			`Then, you need to launch a model. You can specify the model names and other attributes`
			`including model_size_in_billions and quantization. You can use command line interface (CLI) to`
			`do it. For example,`

			```bash
			`xinference launch -n orca -s 3 -q q4_0`
			```

			`A model uid will be returned.`

			`Example usage:`

			```python
			`from langchain.llms import Xinference`

			`llm = Xinference(`
			`server_url="http://0.0.0.0:9997",`
			`model_uid = {model_uid} # replace model_uid with the model UID return from launching the model`
			`)`

			`llm(`
			`prompt="Q: where can we visit in the capital of France? A:",`
			`generate_config={"max_tokens": 1024, "stream": True},`
			`)`

			```

			`### Usage`

			`For more information and detailed examples, refer to the`
ENH: Add `llm_kwargs` for Xinference LLMs (#10354) - This pr adds `llm_kwargs` to the initialization of Xinference LLMs (integrated in #8171 ). - With this enhancement, users can not only provide `generate_configs` when calling the llms for generation but also during the initialization process. This allows users to include custom configurations when utilizing LangChain features like LLMChain. - It also fixes some format issues for the docstrings. 2023-09-18 15:36:29 +00:00			`[example for xinference LLMs](/docs/integrations/llms/xinference.html)`
FEAT: Integrate Xinference LLMs and Embeddings (#8171) - [Xorbits Inference(Xinference)](https://github.com/xorbitsai/inference) is a powerful and versatile library designed to serve language, speech recognition, and multimodal models. Xinference supports a variety of GGML-compatible models including chatglm, whisper, and vicuna, and utilizes heterogeneous hardware and a distributed architecture for seamless cross-device and cross-server model deployment. - This PR integrates Xinference models and Xinference embeddings into LangChain. - Dependencies: To install the depenedencies for this integration, run `pip install "xinference[all]"` - Example Usage: To start a local instance of Xinference, run `xinference`. To deploy Xinference in a distributed cluster, first start an Xinference supervisor using `xinference-supervisor`: `xinference-supervisor -H "${supervisor_host}"` Then, start the Xinference workers using `xinference-worker` on each server you want to run them on. `xinference-worker -e "http://${supervisor_host}:9997"` To use Xinference with LangChain, you also need to launch a model. You can use command line interface (CLI) to do so. Fo example: `xinference launch -n vicuna-v1.3 -f ggmlv3 -q q4_0`. This launches a model named vicuna-v1.3 with `model_format="ggmlv3"` and `quantization="q4_0"`. A model UID is returned for you to use. Now you can use Xinference with LangChain: ```python from langchain.llms import Xinference llm = Xinference( server_url="http://0.0.0.0:9997", # suppose the supervisor_host is "0.0.0.0" model_uid = {model_uid} # model UID returned from launching a model ) llm( prompt="Q: where can we visit in the capital of France? A:", generate_config={"max_tokens": 1024}, ) ``` You can also use RESTful client to launch a model: ```python from xinference.client import RESTfulClient client = RESTfulClient("http://0.0.0.0:9997") model_uid = client.launch_model(model_name="vicuna-v1.3", model_size_in_billions=7, quantization="q4_0") ``` The following code block demonstrates how to use Xinference embeddings with LangChain: ```python from langchain.embeddings import XinferenceEmbeddings xinference = XinferenceEmbeddings( server_url="http://0.0.0.0:9997", model_uid = model_uid ) ``` ```python query_result = xinference.embed_query("This is a test query") ``` ```python doc_result = xinference.embed_documents(["text A", "text B"]) ``` Xinference is still under rapid development. Feel free to [join our Slack community](https://xorbitsio.slack.com/join/shared_invite/zt-1z3zsm9ep-87yI9YZ_B79HLB2ccTq4WA) to get the latest updates! - Request for review: @hwchase17, @baskaryan - Twitter handle: https://twitter.com/Xorbitsio --------- Co-authored-by: Bagatur <baskaryan@gmail.com> 2023-07-28 04:23:19 +00:00
			`### Embeddings`

			`Xinference also supports embedding queries and documents. See`
ENH: Add `llm_kwargs` for Xinference LLMs (#10354) - This pr adds `llm_kwargs` to the initialization of Xinference LLMs (integrated in #8171 ). - With this enhancement, users can not only provide `generate_configs` when calling the llms for generation but also during the initialization process. This allows users to include custom configurations when utilizing LangChain features like LLMChain. - It also fixes some format issues for the docstrings. 2023-09-18 15:36:29 +00:00			`[example for xinference embeddings](/docs/integrations/text_embedding/xinference.html)`
FEAT: Integrate Xinference LLMs and Embeddings (#8171) - [Xorbits Inference(Xinference)](https://github.com/xorbitsai/inference) is a powerful and versatile library designed to serve language, speech recognition, and multimodal models. Xinference supports a variety of GGML-compatible models including chatglm, whisper, and vicuna, and utilizes heterogeneous hardware and a distributed architecture for seamless cross-device and cross-server model deployment. - This PR integrates Xinference models and Xinference embeddings into LangChain. - Dependencies: To install the depenedencies for this integration, run `pip install "xinference[all]"` - Example Usage: To start a local instance of Xinference, run `xinference`. To deploy Xinference in a distributed cluster, first start an Xinference supervisor using `xinference-supervisor`: `xinference-supervisor -H "${supervisor_host}"` Then, start the Xinference workers using `xinference-worker` on each server you want to run them on. `xinference-worker -e "http://${supervisor_host}:9997"` To use Xinference with LangChain, you also need to launch a model. You can use command line interface (CLI) to do so. Fo example: `xinference launch -n vicuna-v1.3 -f ggmlv3 -q q4_0`. This launches a model named vicuna-v1.3 with `model_format="ggmlv3"` and `quantization="q4_0"`. A model UID is returned for you to use. Now you can use Xinference with LangChain: ```python from langchain.llms import Xinference llm = Xinference( server_url="http://0.0.0.0:9997", # suppose the supervisor_host is "0.0.0.0" model_uid = {model_uid} # model UID returned from launching a model ) llm( prompt="Q: where can we visit in the capital of France? A:", generate_config={"max_tokens": 1024}, ) ``` You can also use RESTful client to launch a model: ```python from xinference.client import RESTfulClient client = RESTfulClient("http://0.0.0.0:9997") model_uid = client.launch_model(model_name="vicuna-v1.3", model_size_in_billions=7, quantization="q4_0") ``` The following code block demonstrates how to use Xinference embeddings with LangChain: ```python from langchain.embeddings import XinferenceEmbeddings xinference = XinferenceEmbeddings( server_url="http://0.0.0.0:9997", model_uid = model_uid ) ``` ```python query_result = xinference.embed_query("This is a test query") ``` ```python doc_result = xinference.embed_documents(["text A", "text B"]) ``` Xinference is still under rapid development. Feel free to [join our Slack community](https://xorbitsio.slack.com/join/shared_invite/zt-1z3zsm9ep-87yI9YZ_B79HLB2ccTq4WA) to get the latest updates! - Request for review: @hwchase17, @baskaryan - Twitter handle: https://twitter.com/Xorbitsio --------- Co-authored-by: Bagatur <baskaryan@gmail.com> 2023-07-28 04:23:19 +00:00			`for a more detailed demo.`