Go to file
Chelsea E. Manning 2780d2d4dd
Extend OpenAIEmbeddings class to support non-tiktoken based embeddings (#13884)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
- **Description:** This extends `OpenAIEmbeddings` to add support for
non-`tiktoken` based embeddings, specifically for use with the new
`text-generation-webui` API (`--extensions openai`) which does not
support `tiktoken` encodings, but rather strings
  - **Issue:** Not found,
- **Dependencies:** HuggingFace `transformers.AutoTokenizer` is new
dependency for running the model without `tiktoken`
- **Tag maintainer:** @baskaryan based on last commit for
`langchain-core` refactor
  - **Twitter handle:** @xychelsea

Modified the tokenization process to be model-agnostic, allowing for
both OpenAI and non-OpenAI model tokenizations, by setting the new
default `bool` flag `tiktoken_enabled` to `False`. This requeires
HuggingFace’s AutoTokenizer and handling tokenization for models
requiring different preprocessing steps to generate a chunked string
request rather than a list of integers.

Updated the embeddings generation process to accommodate non-OpenAI
models. This includes converting tokenized text into embeddings using
OpenAI’s and Hugging Face’s model architectures.
 -->
2023-12-03 12:04:17 -08:00
.devcontainer Update README.md (#8570) 2023-11-12 22:07:49 -08:00
.github docs[patch]: Update CONTRIBUTING.md doc (#13965) 2023-11-28 16:32:25 -08:00
cookbook DOCS: Simplified Docugami cookbook to remove code now available in docugami library (#13828) 2023-11-27 00:07:24 -08:00
docker Update Dockerfile.base (#11556) 2023-10-09 16:43:04 +01:00
docs Update Hologres vector store: use hologres-vector (#13767) 2023-12-03 11:50:45 -08:00
libs Extend OpenAIEmbeddings class to support non-tiktoken based embeddings (#13884) 2023-12-03 12:04:17 -08:00
templates Change RunnableMap to RunnableParallel for consistency (#14142) 2023-12-01 13:36:40 -08:00
.gitattributes Update dev container (#6189) 2023-06-16 15:42:14 -07:00
.gitignore template readme's in docs (#13152) 2023-11-09 23:36:21 -08:00
.readthedocs.yaml customize rtd build (#11797) 2023-10-13 19:50:22 -07:00
CITATION.cff rename repo namespace to langchain-ai (#11259) 2023-10-01 15:30:58 -04:00
LICENSE Library Licenses (#13300) 2023-11-28 17:34:27 -08:00
Makefile infra[patch]: add base deps and fix docs lint (#13998) 2023-11-28 17:27:37 -08:00
MIGRATE.md Update main readme (#13298) 2023-11-13 17:37:54 -08:00
poetry.lock infra[patch]: add base deps and fix docs lint (#13998) 2023-11-28 17:27:37 -08:00
poetry.toml Unbreak devcontainer (#8154) 2023-07-23 19:33:47 -07:00
pyproject.toml docs[minor]: lcel why page (#14089) 2023-12-01 16:13:31 -08:00
README.md docs[patch]: add contribs to readme (#14137) 2023-12-01 11:34:28 -08:00
SECURITY.md Update SECURITY.md email address. (#9558) 2023-08-21 14:52:21 -04:00

🦜🔗 LangChain

Building applications with LLMs through composability

Release Notes CI Experimental CI Downloads License: MIT Twitter Open in Dev Containers Open in GitHub Codespaces GitHub star chart Dependency Status Open Issues

Looking for the JS/TS library? Check out LangChain.js.

To help you ship LangChain apps to production faster, check out LangSmith. LangSmith is a unified developer platform for building, testing, and monitoring LLM applications. Fill out this form to get off the waitlist or speak with our sales team.

Quick Install

With pip:

pip install langchain

With conda:

conda install langchain -c conda-forge

🤔 What is LangChain?

LangChain is a framework for developing applications powered by language models. It enables applications that:

  • Are context-aware: connect a language model to sources of context (prompt instructions, few shot examples, content to ground its response in, etc.)
  • Reason: rely on a language model to reason (about how to answer based on provided context, what actions to take, etc.)

This framework consists of several parts.

  • LangChain Libraries: The Python and JavaScript libraries. Contains interfaces and integrations for a myriad of components, a basic run time for combining these components into chains and agents, and off-the-shelf implementations of chains and agents.
  • LangChain Templates: A collection of easily deployable reference architectures for a wide variety of tasks.
  • LangServe: A library for deploying LangChain chains as a REST API.
  • LangSmith: A developer platform that lets you debug, test, evaluate, and monitor chains built on any LLM framework and seamlessly integrates with LangChain.

This repo contains the langchain (here), langchain-experimental (here), and langchain-cli (here) Python packages, as well as LangChain Templates.

LangChain Stack

🧱 What can you build with LangChain?

Retrieval augmented generation

💬 Analyzing structured data

🤖 Chatbots

And much more! Head to the Use cases section of the docs for more.

🚀 How does LangChain help?

The main value props of the LangChain libraries are:

  1. Components: composable tools and integrations for working with language models. Components are modular and easy-to-use, whether you are using the rest of the LangChain framework or not
  2. Off-the-shelf chains: built-in assemblages of components for accomplishing higher-level tasks

Off-the-shelf chains make it easy to get started. Components make it easy to customize existing chains and build new ones.

Components fall into the following modules:

📃 Model I/O:

This includes prompt management, prompt optimization, a generic interface for all LLMs, and common utilities for working with LLMs.

📚 Retrieval:

Data Augmented Generation involves specific types of chains that first interact with an external data source to fetch data for use in the generation step. Examples include summarization of long pieces of text and question/answering over specific data sources.

🤖 Agents:

Agents involve an LLM making decisions about which Actions to take, taking that Action, seeing an Observation, and repeating that until done. LangChain provides a standard interface for agents, a selection of agents to choose from, and examples of end-to-end agents.

📖 Documentation

Please see here for full documentation, which includes:

💁 Contributing

As an open-source project in a rapidly developing field, we are extremely open to contributions, whether it be in the form of a new feature, improved infrastructure, or better documentation.

For detailed information on how to contribute, see here.

🌟 Contributors

langchain contributors