langchain/docs/ecosystem/huggingface.md

# Hugging Face

This page covers how to use the Hugging Face ecosystem (including the [Hugging Face Hub](https://huggingface.co)) within LangChain.
It is broken into two parts: installation and setup, and then references to specific Hugging Face wrappers.

## Installation and Setup

If you want to work with the Hugging Face Hub:
- Install the Hub client library with `pip install huggingface_hub`
- Create a Hugging Face account (it's free!)
- Create an [access token](https://huggingface.co/docs/hub/security-tokens) and set it as an environment variable (`HUGGINGFACEHUB_API_TOKEN`)

If you want work with the Hugging Face Python libraries:
- Install `pip install transformers` for working with models and tokenizers
- Install `pip install datasets` for working with datasets

## Wrappers

### LLM

There exists two Hugging Face LLM wrappers, one for a local pipeline and one for a model hosted on Hugging Face Hub.
Note that these wrappers only work for models that support the following tasks: [`text2text-generation`](https://huggingface.co/models?library=transformers&pipeline_tag=text2text-generation&sort=downloads), [`text-generation`](https://huggingface.co/models?library=transformers&pipeline_tag=text-classification&sort=downloads)

To use the local pipeline wrapper:
```python
from langchain.llms import HuggingFacePipeline
```

To use a the wrapper for a model hosted on Hugging Face Hub:
```python
from langchain.llms import HuggingFaceHub
```
For a more detailed walkthrough of the Hugging Face Hub wrapper, see [this notebook](../modules/llms/integrations/huggingface_hub.ipynb)


### Embeddings

There exists two Hugging Face Embeddings wrappers, one for a local model and one for a model hosted on Hugging Face Hub.
Note that these wrappers only work for [`sentence-transformers` models](https://huggingface.co/models?library=sentence-transformers&sort=downloads).

To use the local pipeline wrapper:
```python
from langchain.embeddings import HuggingFaceEmbeddings
```

To use a the wrapper for a model hosted on Hugging Face Hub:
```python
from langchain.embeddings import HuggingFaceHubEmbeddings
```
For a more detailed walkthrough of this, see [this notebook](../modules/indexes/examples/embeddings.ipynb)

### Tokenizer

There are several places you can use tokenizers available through the `transformers` package.
By default, it is used to count tokens for all LLMs.

You can also use it to count tokens when splitting documents with 
```python
from langchain.text_splitter import CharacterTextSplitter
CharacterTextSplitter.from_huggingface_tokenizer(...)
```
For a more detailed walkthrough of this, see [this notebook](../modules/indexes/examples/textsplitter.ipynb)


### Datasets

The Hugging Face Hub has lots of great [datasets](https://huggingface.co/datasets) that can be used to evaluate your LLM chains.

For a detailed walkthrough of how to use them to do so, see [this notebook](../use_cases/evaluation/huggingface_datasets.ipynb)
Docs refactor (#480) Big docs refactor! Motivation is to make it easier for people to find resources they are looking for. To accomplish this, there are now three main sections: - Getting Started: steps for getting started, walking through most core functionality - Modules: these are different modules of functionality that langchain provides. Each part here has a "getting started", "how to", "key concepts" and "reference" section (except in a few select cases where it didnt easily fit). - Use Cases: this is to separate use cases (like summarization, question answering, evaluation, etc) from the modules, and provide a different entry point to the code base. There is also a full reference section, as well as extra resources (glossary, gallery, etc) Co-authored-by: Shreya Rajpal <ShreyaR@users.noreply.github.com> 2023-01-02 16:24:09 +00:00			`# Hugging Face`

Add links to Hugging Face Hub docs (#518) This PR adds some tweaks to the Hugging Face docs, mostly with links to the Hub + relevant docs. 2023-01-03 15:43:57 +00:00			`This page covers how to use the Hugging Face ecosystem (including the [Hugging Face Hub](https://huggingface.co)) within LangChain.`
Docs refactor (#480) Big docs refactor! Motivation is to make it easier for people to find resources they are looking for. To accomplish this, there are now three main sections: - Getting Started: steps for getting started, walking through most core functionality - Modules: these are different modules of functionality that langchain provides. Each part here has a "getting started", "how to", "key concepts" and "reference" section (except in a few select cases where it didnt easily fit). - Use Cases: this is to separate use cases (like summarization, question answering, evaluation, etc) from the modules, and provide a different entry point to the code base. There is also a full reference section, as well as extra resources (glossary, gallery, etc) Co-authored-by: Shreya Rajpal <ShreyaR@users.noreply.github.com> 2023-01-02 16:24:09 +00:00			`It is broken into two parts: installation and setup, and then references to specific Hugging Face wrappers.`

			`## Installation and Setup`

			`If you want to work with the Hugging Face Hub:`
Add links to Hugging Face Hub docs (#518) This PR adds some tweaks to the Hugging Face docs, mostly with links to the Hub + relevant docs. 2023-01-03 15:43:57 +00:00			- Install the Hub client library with `pip install huggingface_hub`
			`- Create a Hugging Face account (it's free!)`
			- Create an [access token](https://huggingface.co/docs/hub/security-tokens) and set it as an environment variable (`HUGGINGFACEHUB_API_TOKEN`)
Docs refactor (#480) Big docs refactor! Motivation is to make it easier for people to find resources they are looking for. To accomplish this, there are now three main sections: - Getting Started: steps for getting started, walking through most core functionality - Modules: these are different modules of functionality that langchain provides. Each part here has a "getting started", "how to", "key concepts" and "reference" section (except in a few select cases where it didnt easily fit). - Use Cases: this is to separate use cases (like summarization, question answering, evaluation, etc) from the modules, and provide a different entry point to the code base. There is also a full reference section, as well as extra resources (glossary, gallery, etc) Co-authored-by: Shreya Rajpal <ShreyaR@users.noreply.github.com> 2023-01-02 16:24:09 +00:00
Add links to Hugging Face Hub docs (#518) This PR adds some tweaks to the Hugging Face docs, mostly with links to the Hub + relevant docs. 2023-01-03 15:43:57 +00:00			`If you want work with the Hugging Face Python libraries:`
Docs refactor (#480) Big docs refactor! Motivation is to make it easier for people to find resources they are looking for. To accomplish this, there are now three main sections: - Getting Started: steps for getting started, walking through most core functionality - Modules: these are different modules of functionality that langchain provides. Each part here has a "getting started", "how to", "key concepts" and "reference" section (except in a few select cases where it didnt easily fit). - Use Cases: this is to separate use cases (like summarization, question answering, evaluation, etc) from the modules, and provide a different entry point to the code base. There is also a full reference section, as well as extra resources (glossary, gallery, etc) Co-authored-by: Shreya Rajpal <ShreyaR@users.noreply.github.com> 2023-01-02 16:24:09 +00:00			- Install `pip install transformers` for working with models and tokenizers
			- Install `pip install datasets` for working with datasets

			`## Wrappers`

			`### LLM`

			`There exists two Hugging Face LLM wrappers, one for a local pipeline and one for a model hosted on Hugging Face Hub.`
Add links to Hugging Face Hub docs (#518) This PR adds some tweaks to the Hugging Face docs, mostly with links to the Hub + relevant docs. 2023-01-03 15:43:57 +00:00			Note that these wrappers only work for models that support the following tasks: [`text2text-generation`](https://huggingface.co/models?library=transformers&pipeline_tag=text2text-generation&sort=downloads), [`text-generation`](https://huggingface.co/models?library=transformers&pipeline_tag=text-classification&sort=downloads)
Docs refactor (#480) Big docs refactor! Motivation is to make it easier for people to find resources they are looking for. To accomplish this, there are now three main sections: - Getting Started: steps for getting started, walking through most core functionality - Modules: these are different modules of functionality that langchain provides. Each part here has a "getting started", "how to", "key concepts" and "reference" section (except in a few select cases where it didnt easily fit). - Use Cases: this is to separate use cases (like summarization, question answering, evaluation, etc) from the modules, and provide a different entry point to the code base. There is also a full reference section, as well as extra resources (glossary, gallery, etc) Co-authored-by: Shreya Rajpal <ShreyaR@users.noreply.github.com> 2023-01-02 16:24:09 +00:00
			`To use the local pipeline wrapper:`
			```python
			`from langchain.llms import HuggingFacePipeline`
			```

			`To use a the wrapper for a model hosted on Hugging Face Hub:`
			```python
			`from langchain.llms import HuggingFaceHub`
			```
			`For a more detailed walkthrough of the Hugging Face Hub wrapper, see [this notebook](../modules/llms/integrations/huggingface_hub.ipynb)`


			`### Embeddings`

			`There exists two Hugging Face Embeddings wrappers, one for a local model and one for a model hosted on Hugging Face Hub.`
Add links to Hugging Face Hub docs (#518) This PR adds some tweaks to the Hugging Face docs, mostly with links to the Hub + relevant docs. 2023-01-03 15:43:57 +00:00			Note that these wrappers only work for [`sentence-transformers` models](https://huggingface.co/models?library=sentence-transformers&sort=downloads).
Docs refactor (#480) Big docs refactor! Motivation is to make it easier for people to find resources they are looking for. To accomplish this, there are now three main sections: - Getting Started: steps for getting started, walking through most core functionality - Modules: these are different modules of functionality that langchain provides. Each part here has a "getting started", "how to", "key concepts" and "reference" section (except in a few select cases where it didnt easily fit). - Use Cases: this is to separate use cases (like summarization, question answering, evaluation, etc) from the modules, and provide a different entry point to the code base. There is also a full reference section, as well as extra resources (glossary, gallery, etc) Co-authored-by: Shreya Rajpal <ShreyaR@users.noreply.github.com> 2023-01-02 16:24:09 +00:00
			`To use the local pipeline wrapper:`
			```python
			`from langchain.embeddings import HuggingFaceEmbeddings`
			```

			`To use a the wrapper for a model hosted on Hugging Face Hub:`
			```python
			`from langchain.embeddings import HuggingFaceHubEmbeddings`
			```
improve docs for indexes (#1146) 2023-02-20 07:14:50 +00:00			`For a more detailed walkthrough of this, see [this notebook](../modules/indexes/examples/embeddings.ipynb)`
Docs refactor (#480) Big docs refactor! Motivation is to make it easier for people to find resources they are looking for. To accomplish this, there are now three main sections: - Getting Started: steps for getting started, walking through most core functionality - Modules: these are different modules of functionality that langchain provides. Each part here has a "getting started", "how to", "key concepts" and "reference" section (except in a few select cases where it didnt easily fit). - Use Cases: this is to separate use cases (like summarization, question answering, evaluation, etc) from the modules, and provide a different entry point to the code base. There is also a full reference section, as well as extra resources (glossary, gallery, etc) Co-authored-by: Shreya Rajpal <ShreyaR@users.noreply.github.com> 2023-01-02 16:24:09 +00:00
			`### Tokenizer`

			There are several places you can use tokenizers available through the `transformers` package.
			`By default, it is used to count tokens for all LLMs.`

			`You can also use it to count tokens when splitting documents with`
			```python
			`from langchain.text_splitter import CharacterTextSplitter`
			`CharacterTextSplitter.from_huggingface_tokenizer(...)`
			```
improve docs for indexes (#1146) 2023-02-20 07:14:50 +00:00			`For a more detailed walkthrough of this, see [this notebook](../modules/indexes/examples/textsplitter.ipynb)`
Docs refactor (#480) Big docs refactor! Motivation is to make it easier for people to find resources they are looking for. To accomplish this, there are now three main sections: - Getting Started: steps for getting started, walking through most core functionality - Modules: these are different modules of functionality that langchain provides. Each part here has a "getting started", "how to", "key concepts" and "reference" section (except in a few select cases where it didnt easily fit). - Use Cases: this is to separate use cases (like summarization, question answering, evaluation, etc) from the modules, and provide a different entry point to the code base. There is also a full reference section, as well as extra resources (glossary, gallery, etc) Co-authored-by: Shreya Rajpal <ShreyaR@users.noreply.github.com> 2023-01-02 16:24:09 +00:00

			`### Datasets`

Add links to Hugging Face Hub docs (#518) This PR adds some tweaks to the Hugging Face docs, mostly with links to the Hub + relevant docs. 2023-01-03 15:43:57 +00:00			`The Hugging Face Hub has lots of great [datasets](https://huggingface.co/datasets) that can be used to evaluate your LLM chains.`
Docs refactor (#480) Big docs refactor! Motivation is to make it easier for people to find resources they are looking for. To accomplish this, there are now three main sections: - Getting Started: steps for getting started, walking through most core functionality - Modules: these are different modules of functionality that langchain provides. Each part here has a "getting started", "how to", "key concepts" and "reference" section (except in a few select cases where it didnt easily fit). - Use Cases: this is to separate use cases (like summarization, question answering, evaluation, etc) from the modules, and provide a different entry point to the code base. There is also a full reference section, as well as extra resources (glossary, gallery, etc) Co-authored-by: Shreya Rajpal <ShreyaR@users.noreply.github.com> 2023-01-02 16:24:09 +00:00
			`For a detailed walkthrough of how to use them to do so, see [this notebook](../use_cases/evaluation/huggingface_datasets.ipynb)`