langchain/docs/integrations/huggingface.md
Leonid Ganeline e2d7677526
docs: compound ecosystem and integrations (#4870)
# Docs: compound ecosystem and integrations

**Problem statement:** We have a big overlap between the
References/Integrations and Ecosystem/LongChain Ecosystem pages. It
confuses users. It creates a situation when new integration is added
only on one of these pages, which creates even more confusion.
- removed References/Integrations page (but move all its information
into the individual integration pages - in the next PR).
- renamed Ecosystem/LongChain Ecosystem into Integrations/Integrations.
I like the Ecosystem term. It is more generic and semantically richer
than the Integration term. But it mentally overloads users. The
`integration` term is more concrete.
UPDATE: after discussion, the Ecosystem is the term.
Ecosystem/Integrations is the page (in place of Ecosystem/LongChain
Ecosystem).

As a result, a user gets a single place to start with the individual
integration.
2023-05-18 09:29:57 -07:00

3.0 KiB

Hugging Face

This page covers how to use the Hugging Face ecosystem (including the Hugging Face Hub) within LangChain. It is broken into two parts: installation and setup, and then references to specific Hugging Face wrappers.

Installation and Setup

If you want to work with the Hugging Face Hub:

  • Install the Hub client library with pip install huggingface_hub
  • Create a Hugging Face account (it's free!)
  • Create an access token and set it as an environment variable (HUGGINGFACEHUB_API_TOKEN)

If you want work with the Hugging Face Python libraries:

  • Install pip install transformers for working with models and tokenizers
  • Install pip install datasets for working with datasets

Wrappers

LLM

There exists two Hugging Face LLM wrappers, one for a local pipeline and one for a model hosted on Hugging Face Hub. Note that these wrappers only work for models that support the following tasks: text2text-generation, text-generation

To use the local pipeline wrapper:

from langchain.llms import HuggingFacePipeline

To use a the wrapper for a model hosted on Hugging Face Hub:

from langchain.llms import HuggingFaceHub

For a more detailed walkthrough of the Hugging Face Hub wrapper, see this notebook

Embeddings

There exists two Hugging Face Embeddings wrappers, one for a local model and one for a model hosted on Hugging Face Hub. Note that these wrappers only work for sentence-transformers models.

To use the local pipeline wrapper:

from langchain.embeddings import HuggingFaceEmbeddings

To use a the wrapper for a model hosted on Hugging Face Hub:

from langchain.embeddings import HuggingFaceHubEmbeddings

For a more detailed walkthrough of this, see this notebook

Tokenizer

There are several places you can use tokenizers available through the transformers package. By default, it is used to count tokens for all LLMs.

You can also use it to count tokens when splitting documents with

from langchain.text_splitter import CharacterTextSplitter
CharacterTextSplitter.from_huggingface_tokenizer(...)

For a more detailed walkthrough of this, see this notebook

Datasets

The Hugging Face Hub has lots of great datasets that can be used to evaluate your LLM chains.

For a detailed walkthrough of how to use them to do so, see this notebook