Compare commits

...

65 Commits

Author SHA1 Message Date
blob42 84d7ad397d langchain-docker readme 1 year ago
blob42 de551d62a8 linting in docker and parallel make jobs
- linting can be run in docker in parallel with `make -j4 docker.lint`
1 year ago
blob42 d8fd0e790c enable test + lint on docker 1 year ago
blob42 97c2b31cc5 added all extra dependencies to dev image + customized builds
- downgraded to python 3.10 to accomadate installing all dependencies
- by default installs all dev + extra dependencies
- option to install only dev dependencies by customizing .env file
1 year ago
blob42 f1dc03d0cc docker development image and helper makefile
separate makefile and build env:

- separate makefile for docker
- only show docker commands when docker detected in system
- only rebuild container on change
- use an unpriviliged user

builder image and base dev image:

- fully isolated environment inside container.
- all venv installed inside container shell and available as commands.
    - ex: `docker run IMG jupyter notebook` to launch notebook.
- pure python based container without poetry.
- custom motd to add a message displayed to users when they connect to
container.
- print environment versions (git, package, python) on login
- display help message when starting container
1 year ago
Harrison Chase f76e9eaab1 bump version (#1342) 1 year ago
Harrison Chase db2e9c2b0d partial variables (#1308) 1 year ago
Tim Asp d22651d82a Add new iFixit document loader (#1333)
iFixit is a wikipedia-like site that has a huge amount of open content
on how to fix things, questions/answers for common troubleshooting and
"things" related content that is more technical in nature. All content
is licensed under CC-BY-SA-NC 3.0

Adding docs from iFixit as context for user questions like "I dropped my
phone in water, what do I do?" or "My macbook pro is making a whining
noise, what's wrong with it?" can yield significantly better responses
than context free response from LLMs.
1 year ago
Matt Robinson c46478d70e feat: document loader for image files (#1330)
### Summary

Adds a document loader for image files such as `.jpg` and `.png` files.

### Testing

Run the following using the example document from the [`unstructured`
repo](https://github.com/Unstructured-IO/unstructured/tree/main/example-docs).

```python
from langchain.document_loaders.image import UnstructuredImageLoader

loader = UnstructuredImageLoader("layout-parser-paper-fast.jpg")
loader.load()
```
1 year ago
Eugene Yurtsev e3fcc72879 Documentation: Minor typo fixes (#1327)
Fixing a few minor typos in the documentation (and likely introducing
other
ones in the process).
1 year ago
blob42 2fdb1d842b refactoring into submodules 1 year ago
blob42 c30ef7dbc4 drop network capabilities by default, example on using networking 1 year ago
blob42 8a7871ece3 add exec_attached: attach to running container and exec cmd 1 year ago
blob42 201ecdc9ee fix run and exec_run default commands, actually use gVisor
- run and exec_run need a separate default command. Run usually executes
  a script while exec_run simulates an interactive session. The image
  templates and run funcs have been upgraded to handle both
  types of commands.

- test: make docker tests run when docker is installed and docker lib
  avaialble.
  - test that runsc runtime is used by default when gVisor is installed.
    (manually removing gVisor skips the test)
1 year ago
blob42 149fe0055e exec_run fixes to keep stdin open 1 year ago
blob42 096b82f2a1 update notebook for utility 1 year ago
blob42 87b5a84cfb update tests and docstrings 1 year ago
blob42 ed97aa65af exec_run: add timeout and delay params
- use `delay` to wait for sent payload to finish
- use `timeout` to control how long to wait for output
1 year ago
blob42 c9e6baf60d image templates, enhanced wrapper building with custom prameters
- quickly run or exec_run commands with sane defaults
- wip image templates with parameters for common docker images
- shell escaping logic
- capture stdout+stderr for exec commands
- added minimal testing
1 year ago
blob42 7cde1cbfc3 docker: attach to container's stdin
- wip image helper for optimized params with common images
- gVisor runtime checker
- make tests skipped if docker installed
1 year ago
blob42 17213209e0 stream stdin and stdout to container through docker API's socket 1 year ago
blob42 895f862662 docker wrapper tool for untrusted execution 1 year ago
Harrison Chase f61858163d
bump version to 0.0.95 (#1324) 1 year ago
Harrison Chase 0824d65a5c
Harrison/indexing pipeline (#1317) 1 year ago
Akshay a0bf856c70
Update agent_vectorstore.ipynb (#1318)
nitpicking but just thought i'd add this typo which I found when going
through the How-to 😄 (unless it was intentional) also, it's amazing that
you added ReAct to LangChain!
1 year ago
Harrison Chase 166cda2cc6
Harrison/deeplake (#1316)
Co-authored-by: Davit Buniatyan <d@activeloop.ai>
1 year ago
Harrison Chase aaad6cc954
Harrison/atlas db (#1315)
Co-authored-by: Brandon Duderstadt <brandonduderstadt@gmail.com>
1 year ago
Marc Puig 3989c793fd
Making it possible to use "certainty" as a parameter for the weaviate similarity_search (#1218)
Checking if weaviate similarity_search kwargs contains "certainty" and
use it accordingly. The minimal level of certainty must be a float, and
it is computed by normalized distance.
1 year ago
Alexander Hoyle 42b892c21b
Avoid IntegrityError for SQLiteCache updates (#1286)
While using a `SQLiteCache`, if there are duplicate `(prompt, llm, idx)`
tuples passed to
[`update_cache()`](c5dd491a21/langchain/llms/base.py (L39)),
then an `IntegrityError` is thrown. This can happen when there are
duplicated prompts within the same batch.

This PR changes the SQLAlchemy `session.add()` to a `session.merge()` in
`cache.py`, [following the solution from this SO
thread](https://stackoverflow.com/questions/10322514/dealing-with-duplicate-primary-keys-on-insert-in-sqlalchemy-declarative-style).
I believe this fixes #983, but not entirely sure since that also
involves async

Here's a minimal example of the error:
```python
from pathlib import Path

import langchain
from langchain.cache import SQLiteCache

llm = langchain.OpenAI(model_name="text-ada-001", openai_api_key=Path("/.openai_api_key").read_text().strip())
langchain.llm_cache = SQLiteCache("test_cache.db")
llm.generate(['a'] * 5)
```
```
>   IntegrityError: (sqlite3.IntegrityError) UNIQUE constraint failed: full_llm_cache.prompt, full_llm_cache.llm, full_llm_cache.idx
    [SQL: INSERT INTO full_llm_cache (prompt, llm, idx, response) VALUES (?, ?, ?, ?)]
    [parameters: ('a', "[('_type', 'openai'), ('best_of', 1), ('frequency_penalty', 0), ('logit_bias', {}), ('max_tokens', 256), ('model_name', 'text-ada-001'), ('n', 1), ('presence_penalty', 0), ('request_timeout', None), ('stop', None), ('temperature', 0.7), ('top_p', 1)]", 0, '\n\nA is for air.\n\nA is for atmosphere.')]
    (Background on this error at: https://sqlalche.me/e/14/gkpj)
```

After the change, we now have the following
```python
class Output:
    def __init__(self, text):
        self.text = text

# make dummy data
cache = SQLiteCache("test_cache_2.db")
cache.update(prompt="prompt_0", llm_string="llm_0", return_val=[Output("text_0")])
cache.engine.execute("SELECT * FROM full_llm_cache").fetchall()

# output
>   [('prompt_0', 'llm_0', 0, 'text_0')]
```

```python
#  update data, before change this would have thrown an `IntegrityError`
cache.update(prompt="prompt_0", llm_string="llm_0", return_val=[Output("text_0_new")])
cache.engine.execute("SELECT * FROM full_llm_cache").fetchall()

# output
>   [('prompt_0', 'llm_0', 0, 'text_0_new')]
```
1 year ago
Harrison Chase 81abcae91a
Harrison/banana fix (#1311)
Co-authored-by: Erik Dunteman <44653944+erik-dunteman@users.noreply.github.com>
1 year ago
Casey A. Fitzpatrick 648b3b3909
Fix use case sentence for bash util doc (#1295)
Thanks for all your hard work!

I noticed a small typo in the bash util doc so here's a quick update.
Additionally, my formatter caught some spacing in the `.md` as well.
Happy to revert that if it's an issue.

The main change is just
```
- A common use case this is for letting it interact with your local file system. 

+ A common use case for this is letting the LLM interact with your local file system.
```

## Testing

`make docs_build` succeeds locally and the changes show as expected ✌️ 
<img width="704" alt="image"
src="https://user-images.githubusercontent.com/17773666/221376160-e99e59a6-b318-49d1-a1d7-89f5c17cdab4.png">
1 year ago
Ingo Kleiber fd9975dad7
add CoNLL-U document loader (#1297)
I've added a simple
[CoNLL-U](https://universaldependencies.org/format.html) document
loader. CoNLL-U is a common format for NLP tasks and is used, for
example, in the Universal Dependencies treebank corpora. The loader
reads a single file in standard CoNLL-U format and returns a document.
1 year ago
Harrison Chase d29f74114e
copy paste loader (#1302) 1 year ago
Harrison Chase ce441edd9c
improve docs (#1309) 1 year ago
Harrison Chase 6f30d68581
add example of using agent with vectorstores (#1285) 1 year ago
Harrison Chase 002da6edc0
ruff ruff (#1203) 1 year ago
Harrison Chase 0963096491
fix imports (#1288) 1 year ago
Harrison Chase c5dd491a21
bump version to 0094 (#1280) 1 year ago
Matt Robinson 2f15c11b87
feat: document loader for MS Word documents (#1282)
### Summary

Adds a document loader for MS Word Documents. Works with both `.docx`
and `.doc` files as longer as the user has installed
`unstructured>=0.4.11`.

### Testing

The follow workflow test the loader for both `.doc` and `.docx` files
using example docs from the `unstructured` repo.

#### `.docx`

```python
from langchain.document_loaders import UnstructuredWordDocumentLoader

filename = "../unstructured/example-docs/fake.docx"
loader = UnstructuredWordDocumentLoader(filename)
loader.load()
```

#### `.doc`

```python
from langchain.document_loaders import UnstructuredWordDocumentLoader

filename = "../unstructured/example-docs/fake.doc"
loader = UnstructuredWordDocumentLoader(filename)
loader.load()
```
1 year ago
Harrison Chase 96db6ed073
cleanup (#1274) 1 year ago
Harrison Chase 7e8f832cd6
Harrison/cohere params (#1278)
Co-authored-by: Stefano Faraggi <40745694+stepp1@users.noreply.github.com>
1 year ago
Harrison Chase a8e88e1874
Harrison/logprobs (#1279)
Co-authored-by: Prateek Shah <97124740+prateekspanning@users.noreply.github.com>
1 year ago
Harrison Chase 42167a1e24
Harrison/fb loader (#1277)
Co-authored-by: Vairo Di Pasquale <vairo.dp@gmail.com>
1 year ago
Harrison Chase bb53d9722d
Harrison/errors (#1276)
Co-authored-by: Kevin Huo <5000881+kwhuo68@users.noreply.github.com>
1 year ago
Klein Tahiraj 8a0751dadd
adding .ipynb loader and documentation Fixes #1248 (#1252)
`NotebookLoader.load()` loads the `.ipynb` notebook file into a
`Document` object.

**Parameters**:

* `include_outputs` (bool): whether to include cell outputs in the
resulting document (default is False).
* `max_output_length` (int): the maximum number of characters to include
from each cell output (default is 10).
* `remove_newline` (bool): whether to remove newline characters from the
cell sources and outputs (default is False).
* `traceback` (bool): whether to include full traceback (default is
False).
1 year ago
Harrison Chase 4b5d427421
Harrison/source docs (#1275)
Co-authored-by: Tushar Dhadiwal <tushardhadiwal@users.noreply.github.com>
1 year ago
Enrico Shippole 9becdeaadf
Add Writer, Banana, Modal, StochasticAI (#1270)
Add LLM wrappers and examples for Banana, Writer, Modal, Stochastic AI

Added rigid json format for Banana and Modal
1 year ago
blob42 5457d48416
searx: add `query_suffix` parameter (#1259)
- allows to build tools and dynamically inject extra searxh suffix in
  the query. example:
  `search.run("python library", query_suffix="site:github.com")`
 resulting query: `python library site:github.com`

Co-authored-by: blob42 <spike@w530>
1 year ago
Harrison Chase 9381005098
fix bug with length function (#1257) 1 year ago
Matt Robinson 10e73a3723
docs: remove nltk download steps (#1253)
### Summary

Updates the docs to remove the `nltk` download steps from
`unstructured`. As of `unstructured` `0.4.14`, this is handled
automatically in the relevant modules within `unstructured`.
1 year ago
Justin Torre 5bc6dc076e
added caching and properties docs (#1255) 1 year ago
Harrison Chase 6d37d089e9
bump version to 0093 (#1251) 1 year ago
Iskren Ivov Chernev 8e3cd3e0dd
Add DeepInfra LLM support (#1232)
DeepInfra is an Inference-as-a-Service provider. Add a simple wrapper
using HTTPS requests.
1 year ago
Dmitri Melikyan b7765a95a0
docs: add Graphsignal ecosystem page (#1228)
Adds a Graphsignal ecosystem page
1 year ago
Satoru Sakamoto d480330fae
fix to specific language transcript (#1231)
Currently youtube loader only seems to support English audio. 
Changed to load videos in the specified language.
1 year ago
Harrison Chase 6085fe18d4
add ifttt tool (#1244) 1 year ago
Jon Luo 8a35811556
Don't instruct LLM to use the LIMIT clause, which is incompatible with SQL Server (#1242)
The current prompt specifically instructs the LLM to use the `LIMIT`
clause. This will cause issues with MS SQL Server, which uses `SELECT
TOP` instead of `LIMIT`. The generated SQL will use `LIMIT`; the
instruction to "always limit... using the LIMIT clause" seems to
override the "create a syntactically correct mssql query to run"
portion. Reported here:
https://github.com/hwchase17/langchain/issues/1103#issuecomment-1441144224

I don't have access to a SQL Server instance to test, but removing that
part of the prompt in OpenAI Playground results in the correct `SELECT
TOP` syntax, whereas keeping it in results in the `LIMIT` clause, even
when instructing it to generate syntactically correct mssql. It's also
still correctly using `LIMIT` in my MariaDB database. I think in this
case we can assume that the model will select the appropriate method
based on the dialect specified.

In general, it would be nice to be able to test a suite of SQL dialects
for things like dialect-specific syntax and other issues we've run into
in the past, but I'm not quite sure how to best approach that yet.
1 year ago
Harrison Chase 71709ad5d5
Update key_concepts.md (#1209) (#1237)
Link for easier navigation (it's not immediately clear where to find
more info on SimpleSequentialChain (3 clicks away)

---------

Co-authored-by: Larry Fisherman <l4rryfisherman@protonmail.com>
1 year ago
Dennis Antela Martinez 53c67e04d4
add aleph alpha llm (#1207)
Integrate Aleph Alpha's client into Langchain to provide access to the
luminous models - more info on latest benchmarks here:
https://www.aleph-alpha.com/luminous-performance-benchmarks
1 year ago
Klein Tahiraj c6ab1bb3cb
Fixing typo in loading.py (#1235)
Just fixing a typo I found in loading.py
1 year ago
Ikko Eltociear Ashimine 334b553260
Update petals.md (#1225)
Huggingface -> Hugging Face
1 year ago
Jon Luo ac1320aae8
fix sqlite internal tables breaking table_info (#1224)
With the current method used to get the SQL table info, sqlite internal
schema tables are being included and are not being handled correctly by
sqlalchemy because the columns have no types. This is easy to see with
the Chinook database:
```python
db = SQLDatabase.from_uri("sqlite:///Chinook.db")
print(db.table_info)
```
```python
...
sqlalchemy.exc.CompileError: (in table 'sqlite_sequence', column 'name'): Can't generate DDL for NullType(); did you forget to specify a type on this Column?
```

SQLAlchemy 2.0 [ignores these by
default](63d90b0f44/lib/sqlalchemy/dialects/sqlite/base.py (L856-L880)):

63d90b0f44/lib/sqlalchemy/dialects/sqlite/base.py (L2096-L2123)
1 year ago
djacobs7 4e28982d2b
Fix typo in constitutional_ai base.py (#1216)
Found a typo in the documentation code for the constitutional_ai module
1 year ago
Sason cc7d2e5621
Correct typo in "Question Answering" How-To Guide (#1221) 1 year ago
blob42 424e71705d
searx: remove duplicate param (#1219)
Co-authored-by: blob42 <spike@w530>
1 year ago

@ -0,0 +1,144 @@
.vscode/
.idea/
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
target/
# Jupyter Notebook
.ipynb_checkpoints
notebooks/
# IPython
profile_default/
ipython_config.py
# pyenv
.python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock
# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/
# Celery stuff
celerybeat-schedule
celerybeat.pid
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
.venvs
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# macOS display setting files
.DS_Store
# docker
docker/
!docker/assets/
.dockerignore
docker.build

2
.gitignore vendored

@ -106,6 +106,7 @@ celerybeat.pid
# Environments
.env
!docker/.env
.venv
.venvs
env/
@ -134,3 +135,4 @@ dmypy.json
# macOS display setting files
.DS_Store
docker.build

@ -151,6 +151,10 @@ poetry run jupyter notebook
When you run `poetry install`, the `langchain` package is installed as editable in the virtualenv, so your new logic can be imported into the notebook.
## Using Docker
Refer to [DOCKER.md](docker/DOCKER.md) for more information.
## Documentation
### Contribute Documentation

@ -1,5 +1,8 @@
.PHONY: all clean format lint test tests test_watch integration_tests help
GIT_HASH ?= $(shell git rev-parse --short HEAD)
LANGCHAIN_VERSION := $(shell grep '^version' pyproject.toml | cut -d '=' -f2 | tr -d '"')
all: help
coverage:
@ -21,19 +24,17 @@ docs_linkcheck:
format:
poetry run black .
poetry run isort .
poetry run ruff --select I --fix .
lint:
poetry run mypy .
poetry run black . --check
poetry run isort . --check
poetry run flake8 .
poetry run ruff .
test:
poetry run pytest tests/unit_tests
tests:
poetry run pytest tests/unit_tests
tests: test
test_watch:
poetry run ptw --now . -- tests/unit_tests
@ -47,8 +48,26 @@ help:
@echo 'docs_build - build the documentation'
@echo 'docs_clean - clean the documentation build artifacts'
@echo 'docs_linkcheck - run linkchecker on the documentation'
ifneq ($(shell command -v docker 2> /dev/null),)
@echo 'docker - build and run the docker dev image'
@echo 'docker.run - run the docker dev image'
@echo 'docker.jupyter - start a jupyter notebook inside container'
@echo 'docker.build - build the docker dev image'
@echo 'docker.force_build - force a rebuild'
@echo 'docker.test - run the unit tests in docker'
@echo 'docker.lint - run the linters in docker'
@echo 'docker.clean - remove the docker dev image'
endif
@echo 'format - run code formatters'
@echo 'lint - run linters'
@echo 'test - run unit tests'
@echo 'test_watch - run unit tests in watch mode'
@echo 'integration_tests - run integration tests'
# include the following makefile if the docker executable is available
ifeq ($(shell command -v docker 2> /dev/null),)
$(info Docker not found, skipping docker-related targets)
else
include docker/Makefile
endif

@ -1,11 +1,15 @@
# 🦜️🔗 LangChain
# 🦜️🔗 LangChain - Docker
⚡ Building applications with LLMs through composability ⚡
WIP: This is a fork of langchain focused on implementing a docker warpper and
toolchain. The goal is to make it easy to use LLM chains running inside a
container, build custom docker based tools and let agents run arbitrary
untrusted code inside.
[![lint](https://github.com/hwchase17/langchain/actions/workflows/lint.yml/badge.svg)](https://github.com/hwchase17/langchain/actions/workflows/lint.yml) [![test](https://github.com/hwchase17/langchain/actions/workflows/test.yml/badge.svg)](https://github.com/hwchase17/langchain/actions/workflows/test.yml) [![linkcheck](https://github.com/hwchase17/langchain/actions/workflows/linkcheck.yml/badge.svg)](https://github.com/hwchase17/langchain/actions/workflows/linkcheck.yml) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![Twitter](https://img.shields.io/twitter/url/https/twitter.com/langchainai.svg?style=social&label=Follow%20%40LangChainAI)](https://twitter.com/langchainai) [![](https://dcbadge.vercel.app/api/server/6adMQxSpJS?compact=true&style=flat)](https://discord.gg/6adMQxSpJS)
Currently exploring the following:
**Production Support:** As you move your LangChains into production, we'd love to offer more comprehensive support.
Please fill out [this form](https://forms.gle/57d8AmXBYp8PP8tZA) and we'll set up a dedicated support Slack channel.
- Docker wrapper for LLMs and chains
- Creating a toolchain for building docker based LLM tools.
- Building agents that can run arbitrary untrusted code inside a container.
## Quick Install

@ -0,0 +1,13 @@
# python env
PYTHON_VERSION=3.10
# -E flag is required
# comment the following line to only install dev dependencies
POETRY_EXTRA_PACKAGES="-E all"
# at least one group needed
POETRY_DEPENDENCIES="dev,test,lint,typing"
# langchain env. warning: these variables will be baked into the docker image !
OPENAI_API_KEY=${OPENAI_API_KEY:-}
SERPAPI_API_KEY=${SERPAPI_API_KEY:-}

@ -0,0 +1,53 @@
# Using Docker
To quickly get started, run the command `make docker`.
If docker is installed the Makefile will export extra targets in the fomrat `docker.*` to build and run the docker image. Type `make` for a list of available tasks.
There is a basic `docker-compose.yml` in the docker directory.
## Building the development image
Using `make docker` will build the dev image if it does not exist, then drops
you inside the container with the langchain environment available in the shell.
### Customizing the image and installed dependencies
The image is built with a default python version and all extras and dev
dependencies. It can be customized by changing the variables in the [.env](/docker/.env)
file.
If you don't need all the `extra` dependencies a slimmer image can be obtained by
commenting out `POETRY_EXTRA_PACKAGES` in the [.env](docker/.env) file.
### Image caching
The Dockerfile is optimized to cache the poetry install step. A rebuild is triggered when there a change to the source code.
## Example Usage
All commands from langchain's python environment are available by default in the container.
A few examples:
```bash
# run jupyter notebook
docker run --rm -it IMG jupyter notebook
# run ipython
docker run --rm -it IMG ipython
# start web server
docker run --rm -p 8888:8888 IMG python -m http.server 8888
```
## Testing / Linting
Tests and lints are run using your local source directory that is mounted on the volume /src.
Run unit tests in the container with `make docker.test`.
Run the linting and formatting checks with `make docker.lint`.
Note: this task can run in parallel using `make -j4 docker.lint`.

@ -0,0 +1,104 @@
# vim: ft=dockerfile
#
# see also: https://github.com/python-poetry/poetry/discussions/1879
# - with https://github.com/bneijt/poetry-lock-docker
# see https://github.com/thehale/docker-python-poetry
# see https://github.com/max-pfeiffer/uvicorn-poetry
# use by default the slim version of python
ARG PYTHON_IMAGE_TAG=slim
ARG PYTHON_VERSION=${PYTHON_VERSION:-3.11.2}
####################
# Base Environment
####################
FROM python:$PYTHON_VERSION-$PYTHON_IMAGE_TAG AS lchain-base
ARG UID=1000
ARG USERNAME=lchain
ENV USERNAME=$USERNAME
RUN groupadd -g ${UID} $USERNAME
RUN useradd -l -m -u ${UID} -g ${UID} $USERNAME
# used for mounting source code
RUN mkdir /src
VOLUME /src
#######################
## Poetry Builder Image
#######################
FROM lchain-base AS lchain-base-builder
ARG POETRY_EXTRA_PACKAGES=$POETRY_EXTRA_PACKAGES
ARG POETRY_DEPENDENCIES=$POETRY_DEPENDENCIES
ENV HOME=/root
ENV POETRY_HOME=/root/.poetry
ENV POETRY_VIRTUALENVS_IN_PROJECT=false
ENV POETRY_NO_INTERACTION=1
ENV CACHE_DIR=$HOME/.cache
ENV POETRY_CACHE_DIR=$CACHE_DIR/pypoetry
ENV PATH="$POETRY_HOME/bin:$PATH"
WORKDIR /root
RUN apt-get update && \
apt-get install -y \
build-essential \
git \
curl
SHELL ["/bin/bash", "-o", "pipefail", "-c"]
RUN mkdir -p $CACHE_DIR
## setup poetry
RUN curl -sSL -o $CACHE_DIR/pypoetry-installer.py https://install.python-poetry.org/
RUN python3 $CACHE_DIR/pypoetry-installer.py
# # Copy poetry files
COPY poetry.* pyproject.toml ./
RUN mkdir /pip-prefix
RUN poetry export $POETRY_EXTRA_PACKAGES --with $POETRY_DEPENDENCIES -f requirements.txt --output requirements.txt --without-hashes && \
pip install --no-cache-dir --disable-pip-version-check --prefix /pip-prefix -r requirements.txt
# add custom motd message
COPY docker/assets/etc/motd /tmp/motd
RUN cat /tmp/motd > /etc/motd
RUN printf "\n%s\n%s\n" "$(poetry version)" "$(python --version)" >> /etc/motd
###################
## Runtime Image
###################
FROM lchain-base AS lchain
#jupyter port
EXPOSE 8888
COPY docker/assets/entry.sh /entry
RUN chmod +x /entry
COPY --from=lchain-base-builder /etc/motd /etc/motd
COPY --from=lchain-base-builder /usr/bin/git /usr/bin/git
USER ${USERNAME:-lchain}
ENV HOME /home/$USERNAME
WORKDIR /home/$USERNAME
COPY --chown=lchain:lchain --from=lchain-base-builder /pip-prefix $HOME/.local/
COPY . .
SHELL ["/bin/bash", "-o", "pipefail", "-c"]
RUN pip install --no-deps --disable-pip-version-check --no-cache-dir -e .
entrypoint ["/entry"]

@ -0,0 +1,84 @@
#do not call this makefile it is included in the main Makefile
.PHONY: docker docker.jupyter docker.run docker.force_build docker.clean \
docker.test docker.lint docker.lint.mypy docker.lint.black \
docker.lint.isort docker.lint.flake
# read python version from .env file ignoring comments
PYTHON_VERSION := $(shell grep PYTHON_VERSION docker/.env | cut -d '=' -f2)
POETRY_EXTRA_PACKAGES := $(shell grep '^[^#]*POETRY_EXTRA_PACKAGES' docker/.env | cut -d '=' -f2)
POETRY_DEPENDENCIES := $(shell grep 'POETRY_DEPENDENCIES' docker/.env | cut -d '=' -f2)
DOCKER_SRC := $(shell find docker -type f)
DOCKER_IMAGE_NAME = langchain/dev
# SRC is all files matched by the git ls-files command
SRC := $(shell git ls-files -- '*' ':!:docker/*')
# set DOCKER_BUILD_PROGRESS=plain to see detailed build progress
DOCKER_BUILD_PROGRESS ?= auto
# extra message to show when entering the docker container
DOCKER_MOTD := docker/assets/etc/motd
ROOTDIR := $(shell git rev-parse --show-toplevel)
DOCKER_LINT_CMD = docker run --rm -i -u lchain -v $(ROOTDIR):/src $(DOCKER_IMAGE_NAME):$(GIT_HASH)
docker: docker.run
docker.run: docker.build
@echo "Docker image: $(DOCKER_IMAGE_NAME):$(GIT_HASH)"
docker run --rm -it -u lchain -v $(ROOTDIR):/src $(DOCKER_IMAGE_NAME):$(GIT_HASH)
docker.jupyter: docker.build
docker run --rm -it -v $(ROOTDIR):/src $(DOCKER_IMAGE_NAME):$(GIT_HASH) jupyter notebook
docker.build: $(SRC) $(DOCKER_SRC) $(DOCKER_MOTD)
ifdef $(DOCKER_BUILDKIT)
docker buildx build --build-arg PYTHON_VERSION=$(PYTHON_VERSION) \
--build-arg POETRY_EXTRA_PACKAGES=$(POETRY_EXTRA_PACKAGES) \
--build-arg POETRY_DEPENDENCIES=$(POETRY_DEPENDENCIES) \
--progress=$(DOCKER_BUILD_PROGRESS) \
$(BUILD_FLAGS) -f docker/Dockerfile -t $(DOCKER_IMAGE_NAME):$(GIT_HASH) .
else
docker build --build-arg PYTHON_VERSION=$(PYTHON_VERSION) \
--build-arg POETRY_EXTRA_PACKAGES=$(POETRY_EXTRA_PACKAGES) \
--build-arg POETRY_DEPENDENCIES=$(POETRY_DEPENDENCIES) \
$(BUILD_FLAGS) -f docker/Dockerfile -t $(DOCKER_IMAGE_NAME):$(GIT_HASH) .
endif
docker tag $(DOCKER_IMAGE_NAME):$(GIT_HASH) $(DOCKER_IMAGE_NAME):latest
@touch $@ # this prevents docker from rebuilding dependencies that have not
@ # changed. Remove the file `docker/docker.build` to force a rebuild.
docker.force_build: $(DOCKER_SRC)
@rm -f docker.build
@$(MAKE) docker.build BUILD_FLAGS=--no-cache
docker.clean:
docker rmi $(DOCKER_IMAGE_NAME):$(GIT_HASH) $(DOCKER_IMAGE_NAME):latest
docker.test: docker.build
docker run --rm -it -u lchain -v $(ROOTDIR):/src $(DOCKER_IMAGE_NAME):$(GIT_HASH) \
pytest /src/tests/unit_tests
# this assumes that the docker image has been built
docker.lint: docker.lint.mypy docker.lint.black docker.lint.isort \
docker.lint.flake
# these can run in parallel with -j[njobs]
docker.lint.mypy:
@$(DOCKER_LINT_CMD) mypy /src
@printf "\t%s\n" "mypy ... "
docker.lint.black:
@$(DOCKER_LINT_CMD) black /src --check
@printf "\t%s\n" "black ... "
docker.lint.isort:
@$(DOCKER_LINT_CMD) isort /src --check
@printf "\t%s\n" "isort ... "
docker.lint.flake:
@$(DOCKER_LINT_CMD) flake8 /src
@printf "\t%s\n" "flake8 ... "

@ -0,0 +1,10 @@
#!/usr/bin/env bash
export PATH=$HOME/.local/bin:$PATH
if [ -z "$1" ]; then
cat /etc/motd
exec /bin/bash
fi
exec "$@"

@ -0,0 +1,8 @@
All dependencies have been installed in the current shell. There is no
virtualenv or a need for `poetry` inside the container.
Running the command `make docker.run` at the root directory of the project will
build the container the first time. On the next runs it will use the cached
image. A rebuild will happen when changes are made to the source code.
You local source directory has been mounted to the /src directory.

@ -0,0 +1,17 @@
version: "3.7"
services:
langchain:
hostname: langchain
image: langchain/dev:latest
build:
context: ../
dockerfile: docker/Dockerfile
args:
PYTHON_VERSION: ${PYTHON_VERSION}
POETRY_EXTRA_PACKAGES: ${POETRY_EXTRA_PACKAGES}
POETRY_DEPENDENCIES: ${POETRY_DEPENDENCIES}
restart: unless-stopped
ports:
- 127.0.0.1:8888:8888

@ -0,0 +1,25 @@
# AtlasDB
This page covers how to Nomic's Atlas ecosystem within LangChain.
It is broken into two parts: installation and setup, and then references to specific Atlas wrappers.
## Installation and Setup
- Install the Python package with `pip install nomic`
- Nomic is also included in langchains poetry extras `poetry install -E all`
-
## Wrappers
### VectorStore
There exists a wrapper around the Atlas neural database, allowing you to use it as a vectorstore.
This vectorstore also gives you full access to the underlying AtlasProject object, which will allow you to use the full range of Atlas map interactions, such as bulk tagging and automatic topic modeling.
Please see [the Nomic docs](https://docs.nomic.ai/atlas_api.html) for more detailed information.
To import this vectorstore:
```python
from langchain.vectorstores import AtlasDB
```
For a more detailed walkthrough of the Chroma wrapper, see [this notebook](../modules/indexes/examples/vectorstores.ipynb)

@ -0,0 +1,79 @@
# Banana
This page covers how to use the Banana ecosystem within LangChain.
It is broken into two parts: installation and setup, and then references to specific Banana wrappers.
## Installation and Setup
- Install with `pip3 install banana-dev`
- Get an Banana api key and set it as an environment variable (`BANANA_API_KEY`)
## Define your Banana Template
If you want to use an available language model template you can find one [here](https://app.banana.dev/templates/conceptofmind/serverless-template-palmyra-base).
This template uses the Palmyra-Base model by [Writer](https://writer.com/product/api/).
You can check out an example Banana repository [here](https://github.com/conceptofmind/serverless-template-palmyra-base).
## Build the Banana app
Banana Apps must include the "output" key in the return json.
There is a rigid response structure.
```python
# Return the results as a dictionary
result = {'output': result}
```
An example inference function would be:
```python
def inference(model_inputs:dict) -> dict:
global model
global tokenizer
# Parse out your arguments
prompt = model_inputs.get('prompt', None)
if prompt == None:
return {'message': "No prompt provided"}
# Run the model
input_ids = tokenizer.encode(prompt, return_tensors='pt').cuda()
output = model.generate(
input_ids,
max_length=100,
do_sample=True,
top_k=50,
top_p=0.95,
num_return_sequences=1,
temperature=0.9,
early_stopping=True,
no_repeat_ngram_size=3,
num_beams=5,
length_penalty=1.5,
repetition_penalty=1.5,
bad_words_ids=[[tokenizer.encode(' ', add_prefix_space=True)[0]]]
)
result = tokenizer.decode(output[0], skip_special_tokens=True)
# Return the results as a dictionary
result = {'output': result}
return result
```
You can find a full example of a Banana app [here](https://github.com/conceptofmind/serverless-template-palmyra-base/blob/main/app.py).
## Wrappers
### LLM
There exists an Banana LLM wrapper, which you can access with
```python
from langchain.llms import Banana
```
You need to provide a model key located in the dashboard:
```python
llm = Banana(model_key="YOUR_MODEL_KEY")
```

@ -0,0 +1,17 @@
# DeepInfra
This page covers how to use the DeepInfra ecosystem within LangChain.
It is broken into two parts: installation and setup, and then references to specific DeepInfra wrappers.
## Installation and Setup
- Get your DeepInfra api key from this link [here](https://deepinfra.com/).
- Get an DeepInfra api key and set it as an environment variable (`DEEPINFRA_API_TOKEN`)
## Wrappers
### LLM
There exists an DeepInfra LLM wrapper, which you can access with
```python
from langchain.llms import DeepInfra
```

@ -0,0 +1,25 @@
# Deep Lake
This page covers how to use the Deep Lake ecosystem within LangChain.
It is broken into two parts: installation and setup, and then references to specific Deep Lake wrappers. For more information.
1. Here is [whitepaper](https://www.deeplake.ai/whitepaper) and [academic paper](https://arxiv.org/pdf/2209.10785.pdf) for Deep Lake
2. Here is a set of additional resources available for review: [Deep Lake](https://github.com/activeloopai/deeplake), [Getting Started](https://docs.activeloop.ai/getting-started) and [Tutorials](https://docs.activeloop.ai/hub-tutorials)
## Installation and Setup
- Install the Python package with `pip install deeplake`
## Wrappers
### VectorStore
There exists a wrapper around Deep Lake, a data lake for Deep Learning applications, allowing you to use it as a vectorstore (for now), whether for semantic search or example selection.
To import this vectorstore:
```python
from langchain.vectorstores import DeepLake
```
For a more detailed walkthrough of the Deep Lake wrapper, see [this notebook](../modules/indexes/vectorstore_examples/deeplake.ipynb)

@ -0,0 +1,38 @@
# Graphsignal
This page covers how to use the Graphsignal to trace and monitor LangChain.
## Installation and Setup
- Install the Python library with `pip install graphsignal`
- Create free Graphsignal account [here](https://graphsignal.com)
- Get an API key and set it as an environment variable (`GRAPHSIGNAL_API_KEY`)
## Tracing and Monitoring
Graphsignal automatically instruments and starts tracing and monitoring chains. Traces, metrics and errors are then available in your [Graphsignal dashboard](https://app.graphsignal.com/). No prompts or other sensitive data are sent to Graphsignal cloud, only statistics and metadata.
Initialize the tracer by providing a deployment name:
```python
import graphsignal
graphsignal.configure(deployment='my-langchain-app-prod')
```
In order to trace full runs and see a breakdown by chains and tools, you can wrap the calling routine or use a decorator:
```python
with graphsignal.start_trace('my-chain'):
chain.run("some initial text")
```
Optionally, enable profiling to record function-level statistics for each trace.
```python
with graphsignal.start_trace(
'my-chain', options=graphsignal.TraceOptions(enable_profiling=True)):
chain.run("some initial text")
```
See the [Quick Start](https://graphsignal.com/docs/guides/quick-start/) guide for complete setup instructions.

@ -19,3 +19,35 @@ export OPENAI_API_BASE="https://oai.hconeai.com/v1"
Now head over to [helicone.ai](https://helicone.ai/onboarding?step=2) to create your account, and add your OpenAI API key within our dashboard to view your logs.
![Helicone](../_static/HeliconeKeys.png)
## How to enable Helicone caching
```python
from langchain.llms import OpenAI
import openai
openai.api_base = "https://oai.hconeai.com/v1"
llm = OpenAI(temperature=0.9, headers={"Helicone-Cache-Enabled": "true"})
text = "What is a helicone?"
print(llm(text))
```
[Helicone caching docs](https://docs.helicone.ai/advanced-usage/caching)
## How to use Helicone custom properties
```python
from langchain.llms import OpenAI
import openai
openai.api_base = "https://oai.hconeai.com/v1"
llm = OpenAI(temperature=0.9, headers={
"Helicone-Property-Session": "24",
"Helicone-Property-Conversation": "support_issue_2",
"Helicone-Property-App": "mobile",
})
text = "What is a helicone?"
print(llm(text))
```
[Helicone property docs](https://docs.helicone.ai/advanced-usage/custom-properties)

@ -0,0 +1,66 @@
# Modal
This page covers how to use the Modal ecosystem within LangChain.
It is broken into two parts: installation and setup, and then references to specific Modal wrappers.
## Installation and Setup
- Install with `pip install modal-client`
- Run `modal token new`
## Define your Modal Functions and Webhooks
You must include a prompt. There is a rigid response structure.
```python
class Item(BaseModel):
prompt: str
@stub.webhook(method="POST")
def my_webhook(item: Item):
return {"prompt": my_function.call(item.prompt)}
```
An example with GPT2:
```python
from pydantic import BaseModel
import modal
stub = modal.Stub("example-get-started")
volume = modal.SharedVolume().persist("gpt2_model_vol")
CACHE_PATH = "/root/model_cache"
@stub.function(
gpu="any",
image=modal.Image.debian_slim().pip_install(
"tokenizers", "transformers", "torch", "accelerate"
),
shared_volumes={CACHE_PATH: volume},
retries=3,
)
def run_gpt2(text: str):
from transformers import GPT2Tokenizer, GPT2LMHeadModel
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')
encoded_input = tokenizer(text, return_tensors='pt').input_ids
output = model.generate(encoded_input, max_length=50, do_sample=True)
return tokenizer.decode(output[0], skip_special_tokens=True)
class Item(BaseModel):
prompt: str
@stub.webhook(method="POST")
def get_text(item: Item):
return {"prompt": run_gpt2.call(item.prompt)}
```
## Wrappers
### LLM
There exists an Modal LLM wrapper, which you can access with
```python
from langchain.llms import Modal
```

@ -5,7 +5,7 @@ It is broken into two parts: installation and setup, and then references to spec
## Installation and Setup
- Install with `pip install petals`
- Get an Huggingface api key and set it as an environment variable (`HUGGINGFACE_API_KEY`)
- Get a Hugging Face api key and set it as an environment variable (`HUGGINGFACE_API_KEY`)
## Wrappers
@ -14,4 +14,4 @@ It is broken into two parts: installation and setup, and then references to spec
There exists an Petals LLM wrapper, which you can access with
```python
from langchain.llms import Petals
```
```

@ -0,0 +1,17 @@
# StochasticAI
This page covers how to use the StochasticAI ecosystem within LangChain.
It is broken into two parts: installation and setup, and then references to specific StochasticAI wrappers.
## Installation and Setup
- Install with `pip install stochasticx`
- Get an StochasticAI api key and set it as an environment variable (`STOCHASTICAI_API_KEY`)
## Wrappers
### LLM
There exists an StochasticAI LLM wrapper, which you can access with
```python
from langchain.llms import StochasticAI
```

@ -17,10 +17,6 @@ This page is broken into two parts: installation and setup, and then references
- `poppler-utils`
- `tesseract-ocr`
- `libreoffice`
- Run the following to install NLTK dependencies. `unstructured` will handle this automatically
soon.
- `python -c "import nltk; nltk.download('punkt')"`
- `python -c "import nltk; nltk.download('averaged_perceptron_tagger')"`
- If you are parsing PDFs, run the following to install the `detectron2` model, which
`unstructured` uses for layout detection:
- `pip install "detectron2@git+https://github.com/facebookresearch/detectron2.git@v0.6#egg=detectron2"`

@ -0,0 +1,16 @@
# Writer
This page covers how to use the Writer ecosystem within LangChain.
It is broken into two parts: installation and setup, and then references to specific Writer wrappers.
## Installation and Setup
- Get an Writer api key and set it as an environment variable (`WRITER_API_KEY`)
## Wrappers
### LLM
There exists an Writer LLM wrapper, which you can access with
```python
from langchain.llms import Writer
```

@ -2,7 +2,7 @@ Agents
==========================
Some applications will require not just a predetermined chain of calls to LLMs/other tools,
but potentially an unknown chain that depends on the user input.
but potentially an unknown chain that depends on the user's input.
In these types of chains, there is a “agent” which has access to a suite of tools.
Depending on the user input, the agent can then decide which, if any, of these tools to call.
@ -12,7 +12,7 @@ The following sections of documentation are provided:
- `Key Concepts <./agents/key_concepts.html>`_: A conceptual guide going over the various concepts related to agents.
- `How-To Guides <./agents/how_to_guides.html>`_: A collection of how-to guides. These highlight how to integrate various types of tools, how to work with different types of agent, and how to customize agents.
- `How-To Guides <./agents/how_to_guides.html>`_: A collection of how-to guides. These highlight how to integrate various types of tools, how to work with different types of agents, and how to customize agents.
- `Reference <../reference/modules/agents.html>`_: API reference documentation for all Agent classes.
@ -27,4 +27,4 @@ The following sections of documentation are provided:
./agents/getting_started.ipynb
./agents/key_concepts.md
./agents/how_to_guides.rst
Reference<../reference/modules/agents.rst>
Reference<../reference/modules/agents.rst>

@ -1,7 +1,7 @@
# Agents
Agents use an LLM to determine which actions to take and in what order.
An action can either be using a tool and observing its output, or returning to the user.
An action can either be using a tool and observing its output, or returning a response to the user.
For a list of easily loadable tools, see [here](tools.md).
Here are the agents available in LangChain.

@ -0,0 +1,494 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "68b24990",
"metadata": {},
"source": [
"# Agents and Vectorstores\n",
"\n",
"This notebook covers how to combine agents and vectorstores. The use case for this is that you've ingested your data into a vectorstore and want to interact with it in an agentic manner.\n",
"\n",
"The reccomended method for doing so is to create a VectorDBQAChain and then use that as a tool in the overall agent. Let's take a look at doing this below. You can do this with multiple different vectordbs, and use the agent as a way to route between them. There are two different ways of doing this - you can either let the agent use the vectorstores as normal tools, or you can set `return_direct=True` to really just use the agent as a router."
]
},
{
"cell_type": "markdown",
"id": "9b22020a",
"metadata": {},
"source": [
"## Create the Vectorstore"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "2e87c10a",
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.vectorstores import Chroma\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain import OpenAI, VectorDBQA\n",
"llm = OpenAI(temperature=0)"
]
},
{
"cell_type": "code",
"execution_count": 37,
"id": "f2675861",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Running Chroma using direct local API.\n",
"Using DuckDB in-memory for database. Data will be transient.\n"
]
}
],
"source": [
"from langchain.document_loaders import TextLoader\n",
"loader = TextLoader('../../state_of_the_union.txt')\n",
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"texts = text_splitter.split_documents(documents)\n",
"\n",
"embeddings = OpenAIEmbeddings()\n",
"docsearch = Chroma.from_documents(texts, embeddings, collection_name=\"state-of-union\")"
]
},
{
"cell_type": "code",
"execution_count": 38,
"id": "bc5403d4",
"metadata": {},
"outputs": [],
"source": [
"state_of_union = VectorDBQA.from_chain_type(llm=llm, chain_type=\"stuff\", vectorstore=docsearch)"
]
},
{
"cell_type": "code",
"execution_count": 39,
"id": "1431cded",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import WebBaseLoader"
]
},
{
"cell_type": "code",
"execution_count": 40,
"id": "915d3ff3",
"metadata": {},
"outputs": [],
"source": [
"loader = WebBaseLoader(\"https://beta.ruff.rs/docs/faq/\")"
]
},
{
"cell_type": "code",
"execution_count": 41,
"id": "96a2edf8",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Running Chroma using direct local API.\n",
"Using DuckDB in-memory for database. Data will be transient.\n"
]
}
],
"source": [
"docs = loader.load()\n",
"ruff_texts = text_splitter.split_documents(docs)\n",
"ruff_db = Chroma.from_documents(ruff_texts, embeddings, collection_name=\"ruff\")\n",
"ruff = VectorDBQA.from_chain_type(llm=llm, chain_type=\"stuff\", vectorstore=ruff_db)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "71ecef90",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "c0a6c031",
"metadata": {},
"source": [
"## Create the Agent"
]
},
{
"cell_type": "code",
"execution_count": 43,
"id": "eb142786",
"metadata": {},
"outputs": [],
"source": [
"# Import things that are needed generically\n",
"from langchain.agents import initialize_agent, Tool\n",
"from langchain.tools import BaseTool\n",
"from langchain.llms import OpenAI\n",
"from langchain import LLMMathChain, SerpAPIWrapper"
]
},
{
"cell_type": "code",
"execution_count": 44,
"id": "850bc4e9",
"metadata": {},
"outputs": [],
"source": [
"tools = [\n",
" Tool(\n",
" name = \"State of Union QA System\",\n",
" func=state_of_union.run,\n",
" description=\"useful for when you need to answer questions about the most recent state of the union address. Input should be a fully formed question.\"\n",
" ),\n",
" Tool(\n",
" name = \"Ruff QA System\",\n",
" func=ruff.run,\n",
" description=\"useful for when you need to answer questions about ruff (a python linter). Input should be a fully formed question.\"\n",
" ),\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": 45,
"id": "fc47f230",
"metadata": {},
"outputs": [],
"source": [
"# Construct the agent. We will use the default agent type here.\n",
"# See documentation for a full list of options.\n",
"agent = initialize_agent(tools, llm, agent=\"zero-shot-react-description\", verbose=True)"
]
},
{
"cell_type": "code",
"execution_count": 46,
"id": "10ca2db8",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m I need to find out what Biden said about Ketanji Brown Jackson in the State of the Union address.\n",
"Action: State of Union QA System\n",
"Action Input: What did Biden say about Ketanji Brown Jackson in the State of the Union address?\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3m Biden said that Jackson is one of the nation's top legal minds and that she will continue Justice Breyer's legacy of excellence.\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
"Final Answer: Biden said that Jackson is one of the nation's top legal minds and that she will continue Justice Breyer's legacy of excellence.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"\"Biden said that Jackson is one of the nation's top legal minds and that she will continue Justice Breyer's legacy of excellence.\""
]
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent.run(\"What did biden say about ketanji brown jackson is the state of the union address?\")"
]
},
{
"cell_type": "code",
"execution_count": 47,
"id": "4e91b811",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m I need to find out the advantages of using ruff over flake8\n",
"Action: Ruff QA System\n",
"Action Input: What are the advantages of using ruff over flake8?\u001b[0m\n",
"Observation: \u001b[33;1m\u001b[1;3m Ruff can be used as a drop-in replacement for Flake8 when used (1) without or with a small number of plugins, (2) alongside Black, and (3) on Python 3 code. It also re-implements some of the most popular Flake8 plugins and related code quality tools natively, including isort, yesqa, eradicate, and most of the rules implemented in pyupgrade. Ruff also supports automatically fixing its own lint violations, which Flake8 does not.\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
"Final Answer: Ruff can be used as a drop-in replacement for Flake8 when used (1) without or with a small number of plugins, (2) alongside Black, and (3) on Python 3 code. It also re-implements some of the most popular Flake8 plugins and related code quality tools natively, including isort, yesqa, eradicate, and most of the rules implemented in pyupgrade. Ruff also supports automatically fixing its own lint violations, which Flake8 does not.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'Ruff can be used as a drop-in replacement for Flake8 when used (1) without or with a small number of plugins, (2) alongside Black, and (3) on Python 3 code. It also re-implements some of the most popular Flake8 plugins and related code quality tools natively, including isort, yesqa, eradicate, and most of the rules implemented in pyupgrade. Ruff also supports automatically fixing its own lint violations, which Flake8 does not.'"
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent.run(\"Why use ruff over flake8?\")"
]
},
{
"cell_type": "markdown",
"id": "787a9b5e",
"metadata": {},
"source": [
"## Use the Agent solely as a router"
]
},
{
"cell_type": "markdown",
"id": "9161ba91",
"metadata": {},
"source": [
"You can also set `return_direct=True` if you intend to use the agent as a router and just want to directly return the result of the VectorDBQaChain.\n",
"\n",
"Notice that in the above examples the agent did some extra work after querying the VectorDBQAChain. You can avoid that and just return the result directly."
]
},
{
"cell_type": "code",
"execution_count": 48,
"id": "f59b377e",
"metadata": {},
"outputs": [],
"source": [
"tools = [\n",
" Tool(\n",
" name = \"State of Union QA System\",\n",
" func=state_of_union.run,\n",
" description=\"useful for when you need to answer questions about the most recent state of the union address. Input should be a fully formed question.\",\n",
" return_direct=True\n",
" ),\n",
" Tool(\n",
" name = \"Ruff QA System\",\n",
" func=ruff.run,\n",
" description=\"useful for when you need to answer questions about ruff (a python linter). Input should be a fully formed question.\",\n",
" return_direct=True\n",
" ),\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": 49,
"id": "8615707a",
"metadata": {},
"outputs": [],
"source": [
"agent = initialize_agent(tools, llm, agent=\"zero-shot-react-description\", verbose=True)"
]
},
{
"cell_type": "code",
"execution_count": 50,
"id": "36e718a9",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m I need to find out what Biden said about Ketanji Brown Jackson in the State of the Union address.\n",
"Action: State of Union QA System\n",
"Action Input: What did Biden say about Ketanji Brown Jackson in the State of the Union address?\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3m Biden said that Jackson is one of the nation's top legal minds and that she will continue Justice Breyer's legacy of excellence.\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"\" Biden said that Jackson is one of the nation's top legal minds and that she will continue Justice Breyer's legacy of excellence.\""
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent.run(\"What did biden say about ketanji brown jackson in the state of the union address?\")"
]
},
{
"cell_type": "code",
"execution_count": 51,
"id": "edfd0a1a",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m I need to find out the advantages of using ruff over flake8\n",
"Action: Ruff QA System\n",
"Action Input: What are the advantages of using ruff over flake8?\u001b[0m\n",
"Observation: \u001b[33;1m\u001b[1;3m Ruff can be used as a drop-in replacement for Flake8 when used (1) without or with a small number of plugins, (2) alongside Black, and (3) on Python 3 code. It also re-implements some of the most popular Flake8 plugins and related code quality tools natively, including isort, yesqa, eradicate, and most of the rules implemented in pyupgrade. Ruff also supports automatically fixing its own lint violations, which Flake8 does not.\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"' Ruff can be used as a drop-in replacement for Flake8 when used (1) without or with a small number of plugins, (2) alongside Black, and (3) on Python 3 code. It also re-implements some of the most popular Flake8 plugins and related code quality tools natively, including isort, yesqa, eradicate, and most of the rules implemented in pyupgrade. Ruff also supports automatically fixing its own lint violations, which Flake8 does not.'"
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent.run(\"Why use ruff over flake8?\")"
]
},
{
"cell_type": "markdown",
"id": "49a0cbbe",
"metadata": {},
"source": [
"## Multi-Hop vectorstore reasoning\n",
"\n",
"Because vectorstores are easily usable as tools in agents, it is easy to use answer multi-hop questions that depend on vectorstores using the existing agent framework"
]
},
{
"cell_type": "code",
"execution_count": 57,
"id": "d397a233",
"metadata": {},
"outputs": [],
"source": [
"tools = [\n",
" Tool(\n",
" name = \"State of Union QA System\",\n",
" func=state_of_union.run,\n",
" description=\"useful for when you need to answer questions about the most recent state of the union address. Input should be a fully formed question, not referencing any obscure pronouns from the conversation before.\"\n",
" ),\n",
" Tool(\n",
" name = \"Ruff QA System\",\n",
" func=ruff.run,\n",
" description=\"useful for when you need to answer questions about ruff (a python linter). Input should be a fully formed question, not referencing any obscure pronouns from the conversation before.\"\n",
" ),\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": 58,
"id": "06157240",
"metadata": {},
"outputs": [],
"source": [
"# Construct the agent. We will use the default agent type here.\n",
"# See documentation for a full list of options.\n",
"agent = initialize_agent(tools, llm, agent=\"zero-shot-react-description\", verbose=True)"
]
},
{
"cell_type": "code",
"execution_count": 59,
"id": "b492b520",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m I need to find out what tool ruff uses to run over Jupyter Notebooks, and if the president mentioned it in the state of the union.\n",
"Action: Ruff QA System\n",
"Action Input: What tool does ruff use to run over Jupyter Notebooks?\u001b[0m\n",
"Observation: \u001b[33;1m\u001b[1;3m Ruff is integrated into nbQA, a tool for running linters and code formatters over Jupyter Notebooks. After installing ruff and nbqa, you can run Ruff over a notebook like so: > nbqa ruff Untitled.ipynb\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now need to find out if the president mentioned this tool in the state of the union.\n",
"Action: State of Union QA System\n",
"Action Input: Did the president mention nbQA in the state of the union?\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3m No, the president did not mention nbQA in the state of the union.\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer.\n",
"Final Answer: No, the president did not mention nbQA in the state of the union.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'No, the president did not mention nbQA in the state of the union.'"
]
},
"execution_count": 59,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent.run(\"What tool does ruff use to run over Jupyter Notebooks? Did the president mention that tool in the state of the union?\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b3b857d6",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

@ -7,6 +7,8 @@ The first category of how-to guides here cover specific parts of working with ag
`Custom Tools <./examples/custom_tools.html>`_: How to create custom tools that an agent can use.
`Agents With Vectorstores <./examples/agent_vectorstore.html>`_: How to use vectorstores with agents.
`Intermediate Steps <./examples/intermediate_steps.html>`_: How to access and use intermediate steps to get more visibility into the internals of an agent.
`Custom Agent <./examples/custom_agent.html>`_: How to create a custom agent (specifically, a custom LLM + prompt to drive that agent).

@ -2,8 +2,8 @@ Chains
==========================
Using an LLM in isolation is fine for some simple applications,
but many more complex ones require chaining LLMs - either with eachother or with other experts.
LangChain provides a standard interface for Chains, as well as some common implementations of chains for easy use.
but many more complex ones require chaining LLMs - either with each other or with other experts.
LangChain provides a standard interface for Chains, as well as some common implementations of chains for ease of use.
The following sections of documentation are provided:
@ -26,4 +26,4 @@ The following sections of documentation are provided:
./chains/getting_started.ipynb
./chains/how_to_guides.rst
./chains/key_concepts.rst
Reference<../reference/modules/chains.rst>
Reference<../reference/modules/chains.rst>

@ -9,13 +9,13 @@
"In this tutorial, we will learn about creating simple chains in LangChain. We will learn how to create a chain, add components to it, and run it.\n",
"\n",
"In this tutorial, we will cover:\n",
"- Using the simple LLM chain\n",
"- Using a simple LLM chain\n",
"- Creating sequential chains\n",
"- Creating a custom chain\n",
"\n",
"## Why do we need chains?\n",
"\n",
"Chains allow us to combine multiple components together to create a single, coherent application. For example, we can create a chain that takes user input, format it with a PromptTemplate, and then passes the formatted response to an LLM. We can build more complex chains by combining multiple chains together, or by combining chains with other components.\n"
"Chains allow us to combine multiple components together to create a single, coherent application. For example, we can create a chain that takes user input, formats it with a PromptTemplate, and then passes the formatted response to an LLM. We can build more complex chains by combining multiple chains together, or by combining chains with other components.\n"
]
},
{
@ -88,7 +88,7 @@
"source": [
"## Combine chains with the `SequentialChain`\n",
"\n",
"The next step after calling a language model is make a series of calls to a language model. We can do this using sequential chains, which are chains that execute their links in a predefined order. Specifically, we will use the `SimpleSequentialChain`. This is the simplest form of sequential chains, where each step has a singular input/output, and the output of one step is the input to the next.\n",
"The next step after calling a language model is to make a series of calls to a language model. We can do this using sequential chains, which are chains that execute their links in a predefined order. Specifically, we will use the `SimpleSequentialChain`. This is the simplest type of a sequential chain, where each step has a single input/output, and the output of one step is the input to the next.\n",
"\n",
"In this tutorial, our sequential chain will:\n",
"1. First, create a company name for a product. We will reuse the `LLMChain` we'd previously initialized to create this company name.\n",
@ -156,7 +156,7 @@
"source": [
"## Create a custom chain with the `Chain` class\n",
"\n",
"LangChain provides many chains out of the box, but sometimes you may want to create a custom chains for your specific use case. For this example, we will create a custom chain that concatenates the outputs of 2 `LLMChain`s.\n",
"LangChain provides many chains out of the box, but sometimes you may want to create a custom chain for your specific use case. For this example, we will create a custom chain that concatenates the outputs of 2 `LLMChain`s.\n",
"\n",
"In order to create a custom chain:\n",
"1. Start by subclassing the `Chain` class,\n",

@ -6,6 +6,6 @@ They vary greatly in complexity and are combination of generic, highly configura
## Sequential Chain
This is a specific type of chain where multiple other chains are run in sequence, with the outputs being added as inputs
to the next. A subtype of this type of chain is the `SimpleSequentialChain`, where all subchains have only one input and one output,
to the next. A subtype of this type of chain is the [`SimpleSequentialChain`](./generic/sequential_chains.html#simplesequentialchain), where all subchains have only one input and one output,
and the output of one is therefore used as sole input to the next chain.

@ -0,0 +1,116 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "9f98a15e",
"metadata": {},
"source": [
"# CoNLL-U\n",
"This is an example of how to load a file in [CoNLL-U](https://universaldependencies.org/format.html) format. The whole file is treated as one document. The example data (`conllu.conllu`) is based on one of the standard UD/CoNLL-U examples."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d9b2e33e",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import CoNLLULoader"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5b5eec48",
"metadata": {},
"outputs": [],
"source": [
"loader = CoNLLULoader(\"example_data/conllu.conllu\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "10f3f725",
"metadata": {},
"outputs": [],
"source": [
"document = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "acbb3579",
"metadata": {},
"outputs": [],
"source": [
"document"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.8"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
},
"varInspector": {
"cols": {
"lenName": 16,
"lenType": 16,
"lenVar": 40
},
"kernels_config": {
"python": {
"delete_cmd_postfix": "",
"delete_cmd_prefix": "del ",
"library": "var_list.py",
"varRefreshCmd": "print(var_dic_list())"
},
"r": {
"delete_cmd_postfix": ") ",
"delete_cmd_prefix": "rm(",
"library": "var_list.r",
"varRefreshCmd": "cat(var_dic_list()) "
}
},
"types_to_exclude": [
"module",
"function",
"builtin_function_or_method",
"instance",
"_Feature"
],
"window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 5
}

@ -0,0 +1,102 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "d9826810",
"metadata": {},
"source": [
"# Copy Paste\n",
"\n",
"This notebook covers how to load a document object from something you just want to copy and paste. In this case, you don't even need to use a DocumentLoader, but rather can just construct the Document directly."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "fd9e71a2",
"metadata": {},
"outputs": [],
"source": [
"from langchain.docstore.document import Document"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "f40d3f30",
"metadata": {},
"outputs": [],
"source": [
"text = \"..... put the text you copy pasted here......\""
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "d409bdba",
"metadata": {},
"outputs": [],
"source": [
"doc = Document(page_content=text)"
]
},
{
"cell_type": "markdown",
"id": "cc0eff72",
"metadata": {},
"source": [
"## Metadata\n",
"If you want to add metadata about the where you got this piece of text, you easily can with the metadata key."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "fe3aa5aa",
"metadata": {},
"outputs": [],
"source": [
"metadata = {\"source\": \"internet\", \"date\": \"Friday\"}"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "827d4e91",
"metadata": {},
"outputs": [],
"source": [
"doc = Document(page_content=text, metadata=metadata)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c986a43d",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

@ -0,0 +1,8 @@
# sent_id = 1
# text = They buy and sell books.
1 They they PRON PRP Case=Nom|Number=Plur 2 nsubj 2:nsubj|4:nsubj _
2 buy buy VERB VBP Number=Plur|Person=3|Tense=Pres 0 root 0:root _
3 and and CONJ CC _ 4 cc 4:cc _
4 sell sell VERB VBP Number=Plur|Person=3|Tense=Pres 2 conj 0:root|2:conj _
5 books book NOUN NNS Number=Plur 2 obj 2:obj|4:obj SpaceAfter=No
6 . . PUNCT . _ 2 punct 2:punct _

@ -0,0 +1,64 @@
{
"participants": [{"name": "User 1"}, {"name": "User 2"}],
"messages": [
{"sender_name": "User 2", "timestamp_ms": 1675597571851, "content": "Bye!"},
{
"sender_name": "User 1",
"timestamp_ms": 1675597435669,
"content": "Oh no worries! Bye",
},
{
"sender_name": "User 2",
"timestamp_ms": 1675596277579,
"content": "No Im sorry it was my mistake, the blue one is not for sale",
},
{
"sender_name": "User 1",
"timestamp_ms": 1675595140251,
"content": "I thought you were selling the blue one!",
},
{
"sender_name": "User 1",
"timestamp_ms": 1675595109305,
"content": "Im not interested in this bag. Im interested in the blue one!",
},
{
"sender_name": "User 2",
"timestamp_ms": 1675595068468,
"content": "Here is $129",
},
{
"sender_name": "User 2",
"timestamp_ms": 1675595060730,
"photos": [
{"uri": "url_of_some_picture.jpg", "creation_timestamp": 1675595059}
],
},
{
"sender_name": "User 2",
"timestamp_ms": 1675595045152,
"content": "Online is at least $100",
},
{
"sender_name": "User 1",
"timestamp_ms": 1675594799696,
"content": "How much do you want?",
},
{
"sender_name": "User 2",
"timestamp_ms": 1675577876645,
"content": "Goodmorning! $50 is too low.",
},
{
"sender_name": "User 1",
"timestamp_ms": 1675549022673,
"content": "Hi! Im interested in your bag. Im offering $50. Let me know if you are interested. Thanks!",
},
],
"title": "User 1 and User 2 chat",
"is_still_participant": true,
"thread_path": "inbox/User 1 and User 2 chat",
"magic_words": [],
"image": {"uri": "image_of_the_chat.jpg", "creation_timestamp": 1675549016},
"joinable_mode": {"mode": 1, "link": ""},
}

@ -0,0 +1,83 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Notebook\n",
"\n",
"This notebook covers how to load data from an .ipynb notebook into a format suitable by LangChain."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import NotebookLoader"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"loader = NotebookLoader(\"example_data/notebook.ipynb\")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"`NotebookLoader.load()` loads the `.ipynb` notebook file into a `Document` object.\n",
"\n",
"**Parameters**:\n",
"\n",
"* `include_outputs` (bool): whether to include cell outputs in the resulting document (default is False).\n",
"* `max_output_length` (int): the maximum number of characters to include from each cell output (default is 10).\n",
"* `remove_newline` (bool): whether to remove newline characters from the cell sources and outputs (default is False).\n",
"* `traceback` (bool): whether to include full traceback (default is False)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"loader.load(include_outputs=True, max_output_length=20, remove_newline=True)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.1"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "981b6680a42bdb5eb22187741e1607b3aae2cf73db800d1af1f268d1de6a1f70"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}

@ -0,0 +1,77 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Facebook Chat\n",
"\n",
"This notebook covers how to load data from the Facebook Chats into a format that can be ingested into LangChain."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import FacebookChatLoader"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"loader = FacebookChatLoader(\"example_data/facebook_chat.json\")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='User 2 on 2023-02-05 12:46:11: Bye!\\n\\nUser 1 on 2023-02-05 12:43:55: Oh no worries! Bye\\n\\nUser 2 on 2023-02-05 12:24:37: No Im sorry it was my mistake, the blue one is not for sale\\n\\nUser 1 on 2023-02-05 12:05:40: I thought you were selling the blue one!\\n\\nUser 1 on 2023-02-05 12:05:09: Im not interested in this bag. Im interested in the blue one!\\n\\nUser 2 on 2023-02-05 12:04:28: Here is $129\\n\\nUser 2 on 2023-02-05 12:04:05: Online is at least $100\\n\\nUser 1 on 2023-02-05 11:59:59: How much do you want?\\n\\nUser 2 on 2023-02-05 07:17:56: Goodmorning! $50 is too low.\\n\\nUser 1 on 2023-02-04 23:17:02: Hi! Im interested in your bag. Im offering $50. Let me know if you are interested. Thanks!\\n\\n', lookup_str='', metadata={'source': 'docs/modules/document_loaders/examples/example_data/facebook_chat.json'}, lookup_index=0)]"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"loader.load()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.1"
},
"vscode": {
"interpreter": {
"hash": "384707f4965e853a82006e90614c2e1a578ea1f6eb0ee07a1dd78a657d37dd67"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}

File diff suppressed because one or more lines are too long

@ -0,0 +1,145 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "f70e6118",
"metadata": {},
"source": [
"# Images\n",
"\n",
"This covers how to load images such as JPGs PNGs into a document format that we can use downstream."
]
},
{
"cell_type": "markdown",
"id": "09d64998",
"metadata": {},
"source": [
"## Using Unstructured"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "0cc0cd42",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders.image import UnstructuredImageLoader"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "082d557c",
"metadata": {},
"outputs": [],
"source": [
"loader = UnstructuredImageLoader(\"layout-parser-paper-fast.jpg\")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "df11c953",
"metadata": {},
"outputs": [],
"source": [
"data = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "4284d44c",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Document(page_content=\"LayoutParser: A Unified Toolkit for Deep\\nLearning Based Document Image Analysis\\n\\n\\nZxjiang Shen' (F3}, Ruochen Zhang”, Melissa Dell*, Benjamin Charles Germain\\nLeet, Jacob Carlson, and Weining LiF\\n\\n\\nsugehen\\n\\nshangthrows, et\\n\\n“Abstract. Recent advanocs in document image analysis (DIA) have been\\npimarliy driven bythe application of neural networks dell roar\\n{uteomer could be aly deployed in production and extended fo farther\\n[nvetigtion. However, various factory ke lcely organize codebanee\\nsnd sophisticated modal cnigurations compat the ey ree of\\nerin! innovation by wide sence, Though there have been sng\\nHors to improve reuablty and simplify deep lees (DL) mode\\naon, sone of them ae optimized for challenge inthe demain of DIA,\\nThis roprscte a major gap in the extng fol, sw DIA i eal to\\nscademic research acon wie range of dpi in the social ssencee\\n[rary for streamlining the sage of DL in DIA research and appicn\\ntons The core LayoutFaraer brary comes with a sch of simple and\\nIntative interfaee or applying and eutomiing DI. odel fr Inyo de\\npltfom for sharing both protrined modes an fal document dist\\n{ation pipeline We demonutate that LayootPareer shea fr both\\nlightweight and lrgeseledgtieation pipelines in eal-word uae ces\\nThe leary pblely smal at Btspe://layost-pareergsthab So\\n\\n\\n\\nKeywords: Document Image Analysis» Deep Learning Layout Analysis\\nCharacter Renguition - Open Serres dary « Tol\\n\\n\\nIntroduction\\n\\n\\nDeep Learning(DL)-based approaches are the state-of-the-art for a wide range of\\ndoctiment image analysis (DIA) tea including document image clasiffeation [I]\\n\", lookup_str='', metadata={'source': 'layout-parser-paper-fast.jpg'}, lookup_index=0)"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data[0]"
]
},
{
"cell_type": "markdown",
"id": "09957371",
"metadata": {},
"source": [
"### Retain Elements\n",
"\n",
"Under the hood, Unstructured creates different \"elements\" for different chunks of text. By default we combine those together, but you can easily keep that separation by specifying `mode=\"elements\"`."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "0fab833b",
"metadata": {},
"outputs": [],
"source": [
"loader = UnstructuredImageLoader(\"layout-parser-paper-fast.jpg\", mode=\"elements\")"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "c3e8ff1b",
"metadata": {},
"outputs": [],
"source": [
"data = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "43c23d2d",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Document(page_content='LayoutParser: A Unified Toolkit for Deep\\nLearning Based Document Image Analysis\\n', lookup_str='', metadata={'source': 'layout-parser-paper-fast.jpg', 'filename': 'layout-parser-paper-fast.jpg', 'page_number': 1, 'category': 'Title'}, lookup_index=0)"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data[0]"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

@ -0,0 +1,98 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Notebook\n",
"\n",
"This notebook covers how to load data from an .ipynb notebook into a format suitable by LangChain."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import NotebookLoader"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"loader = NotebookLoader(\"example_data/notebook.ipynb\", include_outputs=True, max_output_length=20, remove_newline=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`NotebookLoader.load()` loads the `.ipynb` notebook file into a `Document` object.\n",
"\n",
"**Parameters**:\n",
"\n",
"* `include_outputs` (bool): whether to include cell outputs in the resulting document (default is False).\n",
"* `max_output_length` (int): the maximum number of characters to include from each cell output (default is 10).\n",
"* `remove_newline` (bool): whether to remove newline characters from the cell sources and outputs (default is False).\n",
"* `traceback` (bool): whether to include full traceback (default is False)."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='\\'markdown\\' cell: \\'[\\'# Notebook\\', \\'\\', \\'This notebook covers how to load data from an .ipynb notebook into a format suitable by LangChain.\\']\\'\\n\\n \\'code\\' cell: \\'[\\'from langchain.document_loaders import NotebookLoader\\']\\'\\n\\n \\'code\\' cell: \\'[\\'loader = NotebookLoader(\"example_data/notebook.ipynb\")\\']\\'\\n\\n \\'markdown\\' cell: \\'[\\'`NotebookLoader.load()` loads the `.ipynb` notebook file into a `Document` object.\\', \\'\\', \\'**Parameters**:\\', \\'\\', \\'* `include_outputs` (bool): whether to include cell outputs in the resulting document (default is False).\\', \\'* `max_output_length` (int): the maximum number of characters to include from each cell output (default is 10).\\', \\'* `remove_newline` (bool): whether to remove newline characters from the cell sources and outputs (default is False).\\', \\'* `traceback` (bool): whether to include full traceback (default is False).\\']\\'\\n\\n \\'code\\' cell: \\'[\\'loader.load(include_outputs=True, max_output_length=20, remove_newline=True)\\']\\'\\n\\n', lookup_str='', metadata={'source': 'example_data/notebook.ipynb'}, lookup_index=0)]"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"loader.load()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
},
"vscode": {
"interpreter": {
"hash": "981b6680a42bdb5eb22187741e1607b3aae2cf73db800d1af1f268d1de6a1f70"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}

@ -0,0 +1,137 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "39af9ecd",
"metadata": {},
"source": [
"# Word Documents\n",
"\n",
"This covers how to load Word documents into a document format that we can use downstream."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "721c48aa",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import UnstructuredWordDocumentLoader"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "9d3d0e35",
"metadata": {},
"outputs": [],
"source": [
"loader = UnstructuredWordDocumentLoader(\"fake.docx\")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "06073f91",
"metadata": {},
"outputs": [],
"source": [
"data = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "c9adc5cb",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='Lorem ipsum dolor sit amet.', lookup_str='', metadata={'source': 'fake.docx'}, lookup_index=0)]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data"
]
},
{
"cell_type": "markdown",
"id": "525d6b67",
"metadata": {},
"source": [
"## Retain Elements\n",
"\n",
"Under the hood, Unstructured creates different \"elements\" for different chunks of text. By default we combine those together, but you can easily keep that separation by specifying `mode=\"elements\"`."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "064f9162",
"metadata": {},
"outputs": [],
"source": [
"loader = UnstructuredWordDocumentLoader(\"fake.docx\", mode=\"elements\")"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "abefbbdb",
"metadata": {},
"outputs": [],
"source": [
"data = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "a547c534",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Document(page_content='Lorem ipsum dolor sit amet.', lookup_str='', metadata={'source': 'fake.docx', 'filename': 'fake.docx', 'category': 'Title'}, lookup_index=0)"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data[0]"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

@ -57,6 +57,10 @@ There are a lot of different document loaders that LangChain supports. Below are
`Online PDF <./examples/online_pdf.html>`_: A walkthrough of how to load data from an online PDF.
`CoNLL-U <./examples/CoNLL-U.html>`_: A walkthrough of how to load data from a ConLL-U file.
`iFixit <./examples/ifixit.html>`_: A walkthrough of how to search and load data like guides, technical Q&A's, and device wikis from iFixit.com
.. toctree::
:maxdepth: 1
:glob:

@ -268,12 +268,48 @@
},
{
"cell_type": "markdown",
"id": "7fb44daa",
"metadata": {},
"source": [
"## Chat Vector DB with `search_distance`\n",
"If you are using a vector store that supports filtering by search distance, you can add a threshold value parameter."
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": null,
"outputs": [],
"source": [
"vectordbkwargs = {\"search_distance\": 0.9}"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": null,
"outputs": [],
"source": [
"qa = ChatVectorDBChain.from_llm(OpenAI(temperature=0), vectorstore, return_source_documents=True)\n",
"chat_history = []\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"result = qa({\"question\": query, \"chat_history\": chat_history, \"vectordbkwargs\": vectordbkwargs})"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "markdown",
"source": [
"## Chat Vector DB with `map_reduce`\n",
"We can also use different types of combine document chains with the Chat Vector DB chain."
]
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
@ -486,7 +522,7 @@
"source": [
"chat_history = [(query, result[\"answer\"])]\n",
"query = \"Did he mention who she suceeded\"\n",
"result = qa({\"question\": query, \"chat_history\": chat_history})"
"result = qa({\"question\": query, \"chat_history\": chat_history})\n"
]
}
],

@ -7,7 +7,7 @@
"source": [
"# Question Answering\n",
"\n",
"This notebook walks through how to use LangChain for question answering over a list of documents. It covers four different types of chaings: `stuff`, `map_reduce`, `refine`, `map-rerank`. For a more in depth explanation of what these chain types are, see [here](../combine_docs.md)."
"This notebook walks through how to use LangChain for question answering over a list of documents. It covers four different types of chains: `stuff`, `map_reduce`, `refine`, `map-rerank`. For a more in depth explanation of what these chain types are, see [here](../combine_docs.md)."
]
},
{
@ -30,29 +30,24 @@
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.vectorstores import Chroma\n",
"from langchain.docstore.document import Document\n",
"from langchain.prompts import PromptTemplate"
"from langchain.prompts import PromptTemplate\n",
"from langchain.indexes.vectorstore import VectorstoreIndexCreator"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "291f0117",
"id": "ef9305cc",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import TextLoader\n",
"loader = TextLoader('../../state_of_the_union.txt')\n",
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"texts = text_splitter.split_documents(documents)\n",
"\n",
"embeddings = OpenAIEmbeddings()"
"index_creator = VectorstoreIndexCreator()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "fd9666a9",
"execution_count": 3,
"id": "291f0117",
"metadata": {},
"outputs": [
{
@ -65,12 +60,14 @@
}
],
"source": [
"docsearch = Chroma.from_documents(texts, embeddings)"
"from langchain.document_loaders import TextLoader\n",
"loader = TextLoader('../../state_of_the_union.txt')\n",
"docsearch = index_creator.from_loaders([loader])"
]
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 4,
"id": "d1eaf6e6",
"metadata": {},
"outputs": [],

@ -2,45 +2,204 @@
"cells": [
{
"cell_type": "markdown",
"id": "07c1e3b9",
"id": "2244801b",
"metadata": {},
"source": [
"# Getting Started\n",
"\n",
"This example showcases question answering over a vector database.\n",
"We have chosen this as the example for getting started because it nicely combines a lot of different elements (Text splitters, embeddings, vectorstores) and then also shows how to use them in a chain."
"This example showcases question answering over documents.\n",
"We have chosen this as the example for getting started because it nicely combines a lot of different elements (Text splitters, embeddings, vectorstores) and then also shows how to use them in a chain.\n",
"\n",
"Question answering over documents consists of three steps:\n",
"\n",
"1. Create an index\n",
"2. Create a question answering chain\n",
"3. Ask questions!\n",
"\n",
"Each of the steps has multiple sub steps and potential configurations. In this notebook we will primarily focus on (1). We will start by showing the one-liner for doing so, but then break down what is actually going on.\n",
"\n",
"First, let's import some common classes we'll use no matter what."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "82525493",
"id": "8d369452",
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.vectorstores import Chroma\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain import OpenAI, VectorDBQA"
"from langchain.chains import VectorDBQA\n",
"from langchain.llms import OpenAI"
]
},
{
"cell_type": "markdown",
"id": "0b7adc54",
"id": "07c1e3b9",
"metadata": {},
"source": [
"Here we load in the documents we want to use to create our index."
"Next in the generic setup, let's specify the document loader we want to use."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "611e0c19",
"execution_count": 2,
"id": "33958a86",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import TextLoader\n",
"loader = TextLoader('../state_of_the_union.txt')\n",
"loader = TextLoader('../state_of_the_union.txt')"
]
},
{
"cell_type": "markdown",
"id": "489c74bb",
"metadata": {},
"source": [
"## One Line Index Creation\n",
"\n",
"To get started as quickly as possible, we can use the `VectorstoreIndexCreator`."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "403fc231",
"metadata": {},
"outputs": [],
"source": [
"from langchain.indexes import VectorstoreIndexCreator"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "57a8a199",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Running Chroma using direct local API.\n",
"Using DuckDB in-memory for database. Data will be transient.\n"
]
}
],
"source": [
"index = VectorstoreIndexCreator().from_loaders([loader])"
]
},
{
"cell_type": "markdown",
"id": "f3493fa4",
"metadata": {},
"source": [
"Now that the index is created, we can use it to ask questions of the data! Note that under the hood this is actually doing a few steps as well, which we will cover later in this guide."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "23d0d234",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\" The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, a former federal public defender, and from a family of public school educators and police officers. He also said that she is a consensus builder and has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.\""
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"index.query(query)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "ae46b239",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'question': 'What did the president say about Ketanji Brown Jackson',\n",
" 'answer': \" The president said that he nominated Circuit Court of Appeals Judge Ketanji Brown Jackson, one of the nation's top legal minds, to continue Justice Breyer's legacy of excellence, and that she has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.\\n\",\n",
" 'sources': '../state_of_the_union.txt'}"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"index.query_with_sources(query)"
]
},
{
"cell_type": "markdown",
"id": "ff100212",
"metadata": {},
"source": [
"What is returned from the `VectorstoreIndexCreator` is `VectorStoreIndexWrapper`, which provides these nice `query` and `query_with_sources` functionality. If we just wanted to access the vectorstore directly, we can also do that."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "b04f3c10",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<langchain.vectorstores.chroma.Chroma at 0x113a3a700>"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"index.vectorstore"
]
},
{
"cell_type": "markdown",
"id": "2cb6d2eb",
"metadata": {},
"source": [
"## Walkthrough\n",
"\n",
"Okay, so what's actually going on? How is this index getting created?\n",
"\n",
"A lot of the magic is being hid in this `VectorstoreIndexCreator`. What is this doing?\n",
"\n",
"There are three main steps going on after the documents are loaded:\n",
"\n",
"1. Splitting documents into chunks\n",
"2. Creating embeddings for each document\n",
"3. Storing documents and embeddings in a vectorstore\n",
"\n",
"Let's walk through this in code"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "54270abc",
"metadata": {},
"outputs": [],
"source": [
"documents = loader.load()"
]
},
@ -54,11 +213,12 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 8,
"id": "afecb8cf",
"metadata": {},
"outputs": [],
"source": [
"from langchain.text_splitter import CharacterTextSplitter\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"texts = text_splitter.split_documents(documents)"
]
@ -73,11 +233,12 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 10,
"id": "9eaaa735",
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings import OpenAIEmbeddings\n",
"embeddings = OpenAIEmbeddings()"
]
},
@ -91,7 +252,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 11,
"id": "5c7049db",
"metadata": {},
"outputs": [
@ -105,6 +266,7 @@
}
],
"source": [
"from langchain.vectorstores import Chroma\n",
"db = Chroma.from_documents(texts, embeddings)"
]
},
@ -113,12 +275,13 @@
"id": "30c4e5c6",
"metadata": {},
"source": [
"Finally, we create a chain and use it to answer questions!"
"So that's creating the index.\n",
"Then, as before, we create a chain and use it to answer questions!"
]
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": 12,
"id": "3018f865",
"metadata": {},
"outputs": [],
@ -128,17 +291,17 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": 13,
"id": "032a47f8",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\" The President said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, a former federal public defender, and from a family of public school educators and police officers. He said that she is a consensus builder and has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.\""
"\" The President said that Ketanji Brown Jackson is one of the nation's top legal minds and a consensus builder, with a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. She is a former top litigator in private practice, a former federal public defender, and from a family of public school educators and police officers.\""
]
},
"execution_count": 10,
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
@ -148,10 +311,40 @@
"qa.run(query)"
]
},
{
"cell_type": "markdown",
"id": "9464690e",
"metadata": {},
"source": [
"`VectorstoreIndexCreator` is just a wrapper around all this logic. It is configurable in the text splitter it uses, the embeddings it uses, and the vectorstore it uses. For example, you can configure it as below:"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "4001bbc6",
"metadata": {},
"outputs": [],
"source": [
"index_creator = VectorstoreIndexCreator(\n",
" vectorstore_cls=Chroma, \n",
" embedding=OpenAIEmbeddings(),\n",
" text_splitter=CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
")"
]
},
{
"cell_type": "markdown",
"id": "78d8d143",
"metadata": {},
"source": [
"Hopefully this highlights what is going on under the hood of `VectorstoreIndexCreator`. While we think it's important to have a simple way to create indexes, we also think it's important to understand what's going on under the hood."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8b403637",
"id": "dd7257bf",
"metadata": {},
"outputs": [],
"source": []

@ -36,6 +36,8 @@ In the below guides, we cover different types of vectorstores and how to use the
`Chroma <./vectorstore_examples/chroma.html>`_: A walkthrough of how to use the Chroma vectorstore wrapper.
`DeepLake <./vectorstore_examples/deeplake.html>`_: A walkthrough of how to use the Deep Lake, data lake, wrapper.
`FAISS <./vectorstore_examples/faiss.html>`_: A walkthrough of how to use the FAISS vectorstore wrapper.
`Elastic Search <./vectorstore_examples/elasticsearch.html>`_: A walkthrough of how to use the ElasticSearch wrapper.

@ -0,0 +1,266 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"# AtlasDB\n",
"\n",
"This notebook shows you how to use functionality related to the AtlasDB"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"import time\n",
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.text_splitter import SpacyTextSplitter\n",
"from langchain.vectorstores import AtlasDB\n",
"from langchain.document_loaders import TextLoader"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Collecting en-core-web-sm==3.5.0\n",
" Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.5.0/en_core_web_sm-3.5.0-py3-none-any.whl (12.8 MB)\n",
"\u001B[2K \u001B[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001B[0m \u001B[32m12.8/12.8 MB\u001B[0m \u001B[31m90.8 MB/s\u001B[0m eta \u001B[36m0:00:00\u001B[0m00:01\u001B[0m00:01\u001B[0m\n",
"\u001B[?25hRequirement already satisfied: spacy<3.6.0,>=3.5.0 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from en-core-web-sm==3.5.0) (3.5.0)\n",
"Requirement already satisfied: packaging>=20.0 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (23.0)\n",
"Requirement already satisfied: wasabi<1.2.0,>=0.9.1 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (1.1.1)\n",
"Requirement already satisfied: langcodes<4.0.0,>=3.2.0 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (3.3.0)\n",
"Requirement already satisfied: srsly<3.0.0,>=2.4.3 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (2.4.5)\n",
"Requirement already satisfied: pathy>=0.10.0 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (0.10.1)\n",
"Requirement already satisfied: setuptools in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (67.4.0)\n",
"Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (4.64.1)\n",
"Requirement already satisfied: spacy-loggers<2.0.0,>=1.0.0 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (1.0.4)\n",
"Requirement already satisfied: smart-open<7.0.0,>=5.2.1 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (6.3.0)\n",
"Requirement already satisfied: thinc<8.2.0,>=8.1.0 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (8.1.7)\n",
"Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (2.0.7)\n",
"Requirement already satisfied: typer<0.8.0,>=0.3.0 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (0.7.0)\n",
"Requirement already satisfied: requests<3.0.0,>=2.13.0 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (2.28.2)\n",
"Requirement already satisfied: jinja2 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (3.1.2)\n",
"Requirement already satisfied: pydantic!=1.8,!=1.8.1,<1.11.0,>=1.7.4 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (1.10.5)\n",
"Requirement already satisfied: catalogue<2.1.0,>=2.0.6 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (2.0.8)\n",
"Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.11 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (3.0.12)\n",
"Requirement already satisfied: numpy>=1.15.0 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (1.24.2)\n",
"Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (1.0.9)\n",
"Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (3.0.8)\n",
"Requirement already satisfied: typing-extensions>=4.2.0 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from pydantic!=1.8,!=1.8.1,<1.11.0,>=1.7.4->spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (4.5.0)\n",
"Requirement already satisfied: charset-normalizer<4,>=2 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from requests<3.0.0,>=2.13.0->spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (3.0.1)\n",
"Requirement already satisfied: idna<4,>=2.5 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from requests<3.0.0,>=2.13.0->spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (3.4)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from requests<3.0.0,>=2.13.0->spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (2022.12.7)\n",
"Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from requests<3.0.0,>=2.13.0->spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (1.26.14)\n",
"Requirement already satisfied: blis<0.8.0,>=0.7.8 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from thinc<8.2.0,>=8.1.0->spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (0.7.9)\n",
"Requirement already satisfied: confection<1.0.0,>=0.0.1 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from thinc<8.2.0,>=8.1.0->spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (0.0.4)\n",
"Requirement already satisfied: click<9.0.0,>=7.1.1 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from typer<0.8.0,>=0.3.0->spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (8.1.3)\n",
"Requirement already satisfied: MarkupSafe>=2.0 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from jinja2->spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (2.1.2)\n",
"\n",
"\u001B[1m[\u001B[0m\u001B[34;49mnotice\u001B[0m\u001B[1;39;49m]\u001B[0m\u001B[39;49m A new release of pip is available: \u001B[0m\u001B[31;49m23.0\u001B[0m\u001B[39;49m -> \u001B[0m\u001B[32;49m23.0.1\u001B[0m\n",
"\u001B[1m[\u001B[0m\u001B[34;49mnotice\u001B[0m\u001B[1;39;49m]\u001B[0m\u001B[39;49m To update, run: \u001B[0m\u001B[32;49mpip install --upgrade pip\u001B[0m\n",
"\u001B[38;5;2m✔ Download and installation successful\u001B[0m\n",
"You can now load the package via spacy.load('en_core_web_sm')\n"
]
}
],
"source": [
"!python -m spacy download en_core_web_sm"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"ATLAS_TEST_API_KEY = '7xDPkYXSYDc1_ErdTPIcoAR9RNd8YDlkS3nVNXcVoIMZ6'"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"loader = TextLoader('../../state_of_the_union.txt')\n",
"documents = loader.load()\n",
"text_splitter = SpacyTextSplitter(separator='|')\n",
"texts = []\n",
"for doc in text_splitter.split_documents(documents):\n",
" texts.extend(doc.page_content.split('|'))\n",
" \n",
"texts = [e.strip() for e in texts]"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"2023-02-24 16:13:49.696 | INFO | nomic.project:_create_project:884 - Creating project `test_index_1677255228.136989` in organization `Atlas Demo`\n",
"2023-02-24 16:13:51.087 | INFO | nomic.project:wait_for_project_lock:993 - test_index_1677255228.136989: Project lock is released.\n",
"2023-02-24 16:13:51.225 | INFO | nomic.project:wait_for_project_lock:993 - test_index_1677255228.136989: Project lock is released.\n",
"2023-02-24 16:13:51.481 | INFO | nomic.project:add_text:1351 - Uploading text to Atlas.\n",
"1it [00:00, 1.20it/s]\n",
"2023-02-24 16:13:52.318 | INFO | nomic.project:add_text:1422 - Text upload succeeded.\n",
"2023-02-24 16:13:52.628 | INFO | nomic.project:wait_for_project_lock:993 - test_index_1677255228.136989: Project lock is released.\n",
"2023-02-24 16:13:53.380 | INFO | nomic.project:create_index:1192 - Created map `test_index_1677255228.136989_index` in project `test_index_1677255228.136989`: https://atlas.nomic.ai/map/ee2354a3-7f9a-4c6b-af43-b0cda09d7198/db996d77-8981-48a0-897a-ff2c22bbf541\n"
]
}
],
"source": [
"db = AtlasDB.from_texts(texts=texts,\n",
" name='test_index_'+str(time.time()),\n",
" description='test_index',\n",
" api_key=ATLAS_TEST_API_KEY,\n",
" index_kwargs={'build_topic_model': True})"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"scrolled": false
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"2023-02-24 16:14:09.106 | INFO | nomic.project:wait_for_project_lock:993 - test_index_1677255228.136989: Project lock is released.\n"
]
}
],
"source": [
"with db.project.wait_for_project_lock():\n",
" time.sleep(1)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
" <strong><a href=\"https://atlas.nomic.ai/dashboard/project/ee2354a3-7f9a-4c6b-af43-b0cda09d7198\">test_index_1677255228.136989</strong></a>\n",
" <br>\n",
" A description for your project 508 datums inserted.\n",
" <br>\n",
" 1 index built.\n",
" <br><strong>Projections</strong>\n",
"<ul>\n",
"<li>test_index_1677255228.136989_index. Status Completed. <a target=\"_blank\" href=\"https://atlas.nomic.ai/map/ee2354a3-7f9a-4c6b-af43-b0cda09d7198/db996d77-8981-48a0-897a-ff2c22bbf541\">view online</a></li></ul><hr><script>\n",
" destroy = function() {\n",
" document.getElementById(\"iframedb996d77-8981-48a0-897a-ff2c22bbf541\").remove()\n",
" }\n",
" </script>\n",
"\n",
" <h4>Projection ID: db996d77-8981-48a0-897a-ff2c22bbf541</h4>\n",
" <div class=\"actions\">\n",
" <div id=\"hide\" class=\"action\" onclick=\"destroy()\">Hide embedded project</div>\n",
" <div class=\"action\" id=\"out\">\n",
" <a href=\"https://atlas.nomic.ai/map/ee2354a3-7f9a-4c6b-af43-b0cda09d7198/db996d77-8981-48a0-897a-ff2c22bbf541\" target=\"_blank\">Explore on atlas.nomic.ai</a>\n",
" </div>\n",
" </div>\n",
" \n",
" <iframe class=\"iframe\" id=\"iframedb996d77-8981-48a0-897a-ff2c22bbf541\" allow=\"clipboard-read; clipboard-write\" src=\"https://atlas.nomic.ai/map/ee2354a3-7f9a-4c6b-af43-b0cda09d7198/db996d77-8981-48a0-897a-ff2c22bbf541\">\n",
" </iframe>\n",
"\n",
" <style>\n",
" .iframe {\n",
" /* vh can be **very** large in vscode ipynb. */\n",
" height: min(75vh, 66vw);\n",
" width: 100%;\n",
" }\n",
" </style>\n",
" \n",
" <style>\n",
" .actions {\n",
" display: block;\n",
" }\n",
" .action {\n",
" min-height: 18px;\n",
" margin: 5px;\n",
" transition: all 500ms ease-in-out;\n",
" }\n",
" .action:hover {\n",
" cursor: pointer;\n",
" }\n",
" #hide:hover::after {\n",
" content: \" X\";\n",
" }\n",
" #out:hover::after {\n",
" content: \"\";\n",
" }\n",
" </style>\n",
" "
],
"text/plain": [
"AtlasProject: <{'id': 'ee2354a3-7f9a-4c6b-af43-b0cda09d7198', 'owner': '9c29afbb-a002-4d49-958e-ecf5ae1351ac', 'project_name': 'test_index_1677255228.136989', 'creator': 'auth0|63efc4b5462246f4d9a6ecf2', 'description': 'A description for your project', 'opensearch_index_id': 'f61fb8dd-0abf-4f31-9130-41870e443902', 'is_public': True, 'project_fields': ['atlas_id', 'text'], 'unique_id_field': 'atlas_id', 'modality': 'text', 'total_datums_in_project': 508, 'created_timestamp': '2023-02-24T16:13:50.313363+00:00', 'atlas_indices': [{'id': 'b1b01833-0964-4597-a4bc-a2d60700949d', 'project_id': 'ee2354a3-7f9a-4c6b-af43-b0cda09d7198', 'index_name': 'test_index_1677255228.136989_index', 'indexed_field': 'text', 'created_timestamp': '2023-02-24T16:13:52.957101+00:00', 'updated_timestamp': '2023-02-24T16:14:03.469621+00:00', 'atoms': ['charchunk', 'document'], 'colorable_fields': [], 'embedders': [{'id': '7ec0868a-4eed-4414-a482-25cce9803e1b', 'atlas_index_id': 'b1b01833-0964-4597-a4bc-a2d60700949d', 'ready': True, 'model_name': 'NomicEmbed', 'hyperparameters': {'norm': 'both', 'batch_size': 20, 'polymerize_by': 'charchunk', 'dataset_buffer_size': 1000}}], 'nearest_neighbor_indices': [{'id': '86f8e3ff-e07c-4678-a4d7-144db4b0301d', 'index_name': 'NomicOrganize', 'ready': True, 'hyperparameters': {'dim': 384, 'space': 'l2'}, 'atom_strategies': ['document']}], 'projections': [{'id': 'db996d77-8981-48a0-897a-ff2c22bbf541', 'projection_name': 'NomicProject', 'ready': True, 'hyperparameters': {'spread': 1.0, 'n_epochs': 50, 'n_neighbors': 15}, 'atom_strategies': ['document'], 'created_timestamp': '2023-02-24T16:13:52.979561+00:00', 'updated_timestamp': '2023-02-24T16:14:03.466309+00:00'}]}], 'insert_update_delete_lock': False}>"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"db.project"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.4"
}
},
"nbformat": 4,
"nbformat_minor": 1
}

@ -0,0 +1,234 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Deep Lake\n",
"\n",
"This notebook showcases basic functionality related to Deep Lake. While Deep Lake can store embeddings, it is capable of storing any type of data. It is a fully fledged serverless data lake with version control, query engine and streaming dataloader to deep learning frameworks. \n",
"\n",
"For more information, please see the Deep Lake [documentation](docs.activeloop.ai) or [api reference](docs.deeplake.ai)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.vectorstores import DeepLake\n",
"from langchain.document_loaders import TextLoader"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import TextLoader\n",
"loader = TextLoader('../../state_of_the_union.txt')\n",
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"docs = text_splitter.split_documents(documents)\n",
"\n",
"embeddings = OpenAIEmbeddings()"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Evaluating ingest: 100%|██████████| 41/41 [00:00<00:00\n"
]
}
],
"source": [
"db = DeepLake.from_documents(docs, embeddings)\n",
"\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = db.similarity_search(query)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. \n",
"\n",
"We cannot let this happen. \n",
"\n",
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
"\n",
"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.\n"
]
}
],
"source": [
"print(docs[0].page_content)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Deep Lake datasets on cloud or local\n",
"By default deep lake datasets are stored in memory, in case you want to persist locally or to any object storage you can simply provide path to the dataset. You can retrieve token from [app.activeloop.ai](https://app.activeloop.ai/)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/bin/bash: -c: line 0: syntax error near unexpected token `newline'\n",
"/bin/bash: -c: line 0: `activeloop login -t <token>'\n"
]
}
],
"source": [
"!activeloop login -t <token>"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Evaluating ingest: 100%|██████████| 4/4 [00:00<00:00\n"
]
}
],
"source": [
"# Embed and store the texts\n",
"dataset_path = \"hub://{username}/{dataset_name}\" # could be also ./local/path (much faster locally), s3://bucket/path/to/dataset, gcs://, etc.\n",
"\n",
"embedding = OpenAIEmbeddings()\n",
"vectordb = DeepLake.from_documents(documents=docs, embedding=embedding, dataset_path=dataset_path)"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. \n",
"\n",
"We cannot let this happen. \n",
"\n",
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
"\n",
"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.\n"
]
}
],
"source": [
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = db.similarity_search(query)\n",
"print(docs[0].page_content)"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Dataset(path='./local/path', tensors=['embedding', 'ids', 'metadata', 'text'])\n",
"\n",
" tensor htype shape dtype compression\n",
" ------- ------- ------- ------- ------- \n",
" embedding generic (4, 1536) None None \n",
" ids text (4, 1) str None \n",
" metadata json (4, 1) str None \n",
" text text (4, 1) str None \n"
]
}
],
"source": [
"vectordb.ds.summary()"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [],
"source": [
"embeddings = vectordb.ds.embedding.numpy()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "base",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.7"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "7b14174bb6f9d4680b62ac2a6390e1ce94fbfabf172a10844870451d539c58d6"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}

@ -17,6 +17,14 @@ The examples here are all "how-to" guides for how to integrate with various LLM
`Goose AI <./integrations/gooseai_example.html>`_: Covers how to utilize the Goose AI wrapper.
`Writer <./integrations/writer.html>`_: Covers how to utilize the Writer wrapper.
`Banana <./integrations/banana.html>`_: Covers how to utilize the Banana wrapper.
`Modal <./integrations/modal.html>`_: Covers how to utilize the Modal wrapper.
`StochasticAI <./integrations/stochasticai.html>`_: Covers how to utilize the Stochastic AI wrapper.
`Cerebrium <./integrations/cerebriumai_example.html>`_: Covers how to utilize the Cerebrium AI wrapper.
`Petals <./integrations/petals_example.html>`_: Covers how to utilize the Petals wrapper.
@ -27,6 +35,8 @@ The examples here are all "how-to" guides for how to integrate with various LLM
`Anthropic <./integrations/anthropic_example.html>`_: Covers how to use Anthropic models with Langchain.
`DeepInfra <./integrations/deepinfra_example.html>`_: Covers how to utilize the DeepInfra wrapper.
`Self-Hosted Models (via Runhouse) <./integrations/self_hosted_examples.html>`_: Covers how to run models on existing or on-demand remote compute with Langchain.

@ -0,0 +1,108 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "9597802c",
"metadata": {},
"source": [
"# Aleph Alpha\n",
"This example goes over how to use LangChain to interact with Aleph Alpha models"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "6fb585dd",
"metadata": {},
"outputs": [],
"source": [
"from langchain.llms import AlephAlpha\n",
"from langchain import PromptTemplate, LLMChain"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "f81a230d",
"metadata": {},
"outputs": [],
"source": [
"template = \"\"\"Q: {question}\n",
"\n",
"A:\"\"\"\n",
"\n",
"prompt = PromptTemplate(template=template, input_variables=[\"question\"])"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "f0d26e48",
"metadata": {},
"outputs": [],
"source": [
"llm = AlephAlpha(model=\"luminous-extended\", maximum_tokens=20, stop_sequences=[\"Q:\"])"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "6811d621",
"metadata": {},
"outputs": [],
"source": [
"llm_chain = LLMChain(prompt=prompt, llm=llm)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "3058e63f",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"' Artificial Intelligence (AI) is the simulation of human intelligence processes by machines, especially computer systems.\\n'"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"question = \"What is AI?\"\n",
"\n",
"llm_chain.run(question)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
},
"vscode": {
"interpreter": {
"hash": "2d002ec47225e662695b764370d7966aa11eeb4302edc2f497bbf96d49c8f899"
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}

@ -0,0 +1,85 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Banana\n",
"This example goes over how to use LangChain to interact with Banana models"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"from langchain.llms import Banana\n",
"from langchain import PromptTemplate, LLMChain\n",
"os.environ[\"BANANA_API_KEY\"] = \"YOUR_API_KEY\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"template = \"\"\"Question: {question}\n",
"\n",
"Answer: Let's think step by step.\"\"\"\n",
"\n",
"prompt = PromptTemplate(template=template, input_variables=[\"question\"])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"llm = Banana(model_key=\"YOUR_MODEL_KEY\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"llm_chain = LLMChain(prompt=prompt, llm=llm)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"question = \"What NFL team won the Super Bowl in the year Justin Beiber was born?\"\n",
"\n",
"llm_chain.run(question)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.9.12 ('palm')",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.9.12"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "a0a0263b650d907a3bfe41c0f8d6a63a071b884df3cfdc1579f00cdc1aed6b03"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}

@ -0,0 +1,141 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# DeepInfra LLM Example\n",
"This notebook goes over how to use Langchain with [DeepInfra](https://deepinfra.com)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Imports"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"from langchain.llms import DeepInfra\n",
"from langchain import PromptTemplate, LLMChain"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Set the Environment API Key\n",
"Make sure to get your API key from DeepInfra. You are given a 1 hour free of serverless GPU compute to test different models.\n",
"You can print your token with `deepctl auth token`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"os.environ[\"DEEPINFRA_API_TOKEN\"] = \"YOUR_KEY_HERE\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create the DeepInfra instance\n",
"Make sure to deploy your model first via `deepctl deploy create -m google/flat-t5-xl` (for example)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"llm = DeepInfra(model_id=\"DEPLOYED MODEL ID\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create a Prompt Template\n",
"We will create a prompt template for Question and Answer."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"template = \"\"\"Question: {question}\n",
"\n",
"Answer: Let's think step by step.\"\"\"\n",
"\n",
"prompt = PromptTemplate(template=template, input_variables=[\"question\"])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initiate the LLMChain"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"llm_chain = LLMChain(prompt=prompt, llm=llm)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Run the LLMChain\n",
"Provide a question and run the LLMChain."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"question = \"What NFL team won the Super Bowl in 2015?\"\n",
"\n",
"llm_chain.run(question)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.9.12 ('palm')",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.9.12"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "a0a0263b650d907a3bfe41c0f8d6a63a071b884df3cfdc1579f00cdc1aed6b03"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}

@ -0,0 +1,83 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Modal\n",
"This example goes over how to use LangChain to interact with Modal models"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain.llms import Modal\n",
"from langchain import PromptTemplate, LLMChain"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"template = \"\"\"Question: {question}\n",
"\n",
"Answer: Let's think step by step.\"\"\"\n",
"\n",
"prompt = PromptTemplate(template=template, input_variables=[\"question\"])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"llm = Modal(endpoint_url=\"YOUR_ENDPOINT_URL\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"llm_chain = LLMChain(prompt=prompt, llm=llm)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"question = \"What NFL team won the Super Bowl in the year Justin Beiber was born?\"\n",
"\n",
"llm_chain.run(question)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.9.12 ('palm')",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.9.12"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "a0a0263b650d907a3bfe41c0f8d6a63a071b884df3cfdc1579f00cdc1aed6b03"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}

@ -88,7 +88,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"display_name": "Python 3.9.12 ('palm')",
"language": "python",
"name": "python3"
},
@ -102,7 +102,12 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.9.12"
},
"vscode": {
"interpreter": {
"hash": "a0a0263b650d907a3bfe41c0f8d6a63a071b884df3cfdc1579f00cdc1aed6b03"
}
}
},
"nbformat": 4,

@ -0,0 +1,83 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# StochasticAI\n",
"This example goes over how to use LangChain to interact with StochasticAI models"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain.llms import StochasticAI\n",
"from langchain import PromptTemplate, LLMChain"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"template = \"\"\"Question: {question}\n",
"\n",
"Answer: Let's think step by step.\"\"\"\n",
"\n",
"prompt = PromptTemplate(template=template, input_variables=[\"question\"])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"llm = StochasticAI(api_url=\"YOUR_API_URL\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"llm_chain = LLMChain(prompt=prompt, llm=llm)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"question = \"What NFL team won the Super Bowl in the year Justin Beiber was born?\"\n",
"\n",
"llm_chain.run(question)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.9.12 ('palm')",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.9.12"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "a0a0263b650d907a3bfe41c0f8d6a63a071b884df3cfdc1579f00cdc1aed6b03"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}

@ -0,0 +1,83 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Writer\n",
"This example goes over how to use LangChain to interact with Writer models"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain.llms import Writer\n",
"from langchain import PromptTemplate, LLMChain"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"template = \"\"\"Question: {question}\n",
"\n",
"Answer: Let's think step by step.\"\"\"\n",
"\n",
"prompt = PromptTemplate(template=template, input_variables=[\"question\"])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"llm = Writer()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"llm_chain = LLMChain(prompt=prompt, llm=llm)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"question = \"What NFL team won the Super Bowl in the year Justin Beiber was born?\"\n",
"\n",
"llm_chain.run(question)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.9.12 ('palm')",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.9.12"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "a0a0263b650d907a3bfe41c0f8d6a63a071b884df3cfdc1579f00cdc1aed6b03"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}

@ -0,0 +1,184 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "9355a547",
"metadata": {},
"source": [
"# Partial Prompt Templates\n",
"\n",
"A prompt template is a class with a `.format` method which takes in a key-value map and returns a string (a prompt) to pass to the language model. Like other methods, it can make sense to \"partial\" a prompt template - eg pass in a subset of the required values, as to create a new prompt template which expects only the remaining subset of values.\n",
"\n",
"LangChain supports this in two ways: we allow for partially formatted prompts (1) with string values, (2) with functions that return string values. These two different ways support different use cases. In the documentation below we go over the motivations for both use cases as well as how to do it in LangChain.\n",
"\n",
"## Partial With Strings\n",
"\n",
"One common use case for wanting to partial a prompt template is if you get some of the variables before others. For example, suppose you have a prompt template that requires two variables, `foo` and `baz`. If you get the `foo` value early on in the chain, but the `baz` value later, it can be annoying to wait until you have both variables in the same place to pass them to the prompt template. Instead, you can partial the prompt template with the `foo` value, and then pass the partialed prompt template along and just use that. Below is an example of doing this:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "643af5da",
"metadata": {},
"outputs": [],
"source": [
"from langchain.prompts import PromptTemplate"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "4080d8d7",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"foobaz\n"
]
}
],
"source": [
"prompt = PromptTemplate(template=\"{foo}{bar}\", input_variables=[\"foo\", \"bar\"])\n",
"partial_prompt = prompt.partial(foo=\"foo\");\n",
"print(partial_prompt.format(bar=\"baz\"))"
]
},
{
"cell_type": "markdown",
"id": "9986766e",
"metadata": {},
"source": [
"You can also just initialize the prompt with the partialed variables."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "e2ce95b3",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"foobaz\n"
]
}
],
"source": [
"prompt = PromptTemplate(template=\"{foo}{bar}\", input_variables=[\"bar\"], partial_variables={\"foo\": \"foo\"})\n",
"print(prompt.format(bar=\"baz\"))"
]
},
{
"cell_type": "markdown",
"id": "a9c66f83",
"metadata": {},
"source": [
"## Partial With Functions\n",
"\n",
"The other common use is to partial with a function. The use case for this is when you have a variable you know that you always want to fetch in a common way. A prime example of this is with date or time. Imagine you have a prompt which you always want to have the current date. You can't hard code it in the prompt, and passing it along with the other input variables is a bit annoying. In this case, it's very handy to be able to partial the prompt with a function that always returns the current date."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "d0712d8a",
"metadata": {},
"outputs": [],
"source": [
"from datetime import datetime\n",
"\n",
"def _get_datetime():\n",
" now = datetime.now()\n",
" return now.strftime(\"%m/%d/%Y, %H:%M:%S\")"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "4cbcb666",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Tell me a funny joke about the day 02/27/2023, 22:15:16\n"
]
}
],
"source": [
"prompt = PromptTemplate(\n",
" template=\"Tell me a {adjective} joke about the day {date}\", \n",
" input_variables=[\"adjective\", \"date\"]\n",
");\n",
"partial_prompt = prompt.partial(date=_get_datetime)\n",
"print(partial_prompt.format(adjective=\"funny\"))"
]
},
{
"cell_type": "markdown",
"id": "ffed6811",
"metadata": {},
"source": [
"You can also just initialize the prompt with the partialed variables, which often makes more sense in this workflow."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "96285b25",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Tell me a funny joke about the day 02/27/2023, 22:15:16\n"
]
}
],
"source": [
"prompt = PromptTemplate(\n",
" template=\"Tell me a {adjective} joke about the day {date}\", \n",
" input_variables=[\"adjective\"],\n",
" partial_variables={\"date\": _get_datetime}\n",
");\n",
"print(prompt.format(adjective=\"funny\"))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4bff16f7",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

@ -17,6 +17,8 @@ The user guide here shows more advanced workflows and how to use the library in
`Few Shot Prompt Examples <./examples/few_shot_examples.html>`_: Examples of Few Shot Prompt Templates.
`Partial Prompt Template <./examples/partial.html>`_: How to partial Prompt Templates.
.. toctree::

@ -1,85 +1,85 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "8f210ec3",
"metadata": {},
"source": [
"# Bash\n",
"It can often be useful to have an LLM generate bash commands, and then run them. A common use case this is for letting it interact with your local file system. We provide an easy util to execute bash commands."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "f7b3767b",
"metadata": {},
"outputs": [],
"source": [
"from langchain.utilities import BashProcess"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "cf1c92f0",
"metadata": {},
"outputs": [],
"source": [
"bash = BashProcess()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "2fa952fc",
"metadata": {},
"outputs": [
"cells": [
{
"cell_type": "markdown",
"id": "8f210ec3",
"metadata": {},
"source": [
"# Bash\n",
"It can often be useful to have an LLM generate bash commands, and then run them. A common use case for this is letting the LLM interact with your local file system. We provide an easy util to execute bash commands."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "f7b3767b",
"metadata": {},
"outputs": [],
"source": [
"from langchain.utilities import BashProcess"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"bash.ipynb\n",
"google_search.ipynb\n",
"python.ipynb\n",
"requests.ipynb\n",
"serpapi.ipynb\n",
"\n"
]
"cell_type": "code",
"execution_count": 2,
"id": "cf1c92f0",
"metadata": {},
"outputs": [],
"source": [
"bash = BashProcess()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "2fa952fc",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"bash.ipynb\n",
"google_search.ipynb\n",
"python.ipynb\n",
"requests.ipynb\n",
"serpapi.ipynb\n",
"\n"
]
}
],
"source": [
"print(bash.run(\"ls\"))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "851fee9f",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
}
],
"source": [
"print(bash.run(\"ls\"))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "851fee9f",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
"nbformat": 4,
"nbformat_minor": 5
}

@ -0,0 +1,180 @@
{
"cells": [
{
"cell_type": "code",
"metadata": {
"jukit_cell_id": "O4HPx3boF0"
},
"source": [],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {
"jukit_cell_id": "hqQkbPEwTJ"
},
"source": [
"# Using the DockerWrapper utility"
]
},
{
"cell_type": "code",
"metadata": {
"jukit_cell_id": "vCepuypaFH"
},
"source": [
"from langchain.utilities.docker import DockerWrapper"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "code",
"metadata": {
"jukit_cell_id": "BtYVqy2YtO"
},
"source": [
"d = DockerWrapper(image='shell')"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "code",
"metadata": {
"jukit_cell_id": "ELWWm03ptQ"
},
"source": [
"query = \"\"\"\n",
"for i in $(seq 1 10)\n",
"do\n",
" echo $i\n",
"done\n",
"\"\"\"\n",
"print(d.exec_run(query))"
],
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": "1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n"
}
],
"execution_count": 1
},
{
"cell_type": "code",
"metadata": {
"jukit_cell_id": "lGMqLz5sDo"
},
"source": [
"p = DockerWrapper(image='python')\n",
"\n",
"py_payload = \"\"\"\n",
"def hello_world():\n",
" return 'hello world'\n",
"\n",
"hello_world()\n",
"\"\"\""
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "code",
"metadata": {
"jukit_cell_id": "X04Wd6zbrk"
},
"source": [
"print(p.exec_run(py_payload))"
],
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": "'hello world'\n"
}
],
"execution_count": 2
},
{
"cell_type": "code",
"metadata": {
"jukit_cell_id": "lKOfuDoJGk"
},
"source": [],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {
"jukit_cell_id": "eSzXtDrpqU"
},
"source": [
"## Passing custom parameters\n",
"\n",
"By default containers are run with a safe set of parameters. You can pass any parameters\n",
"that are accepted by the docker python sdk to the run and exec commands.\n",
"\n",
"### Using networking"
]
},
{
"cell_type": "code",
"metadata": {
"jukit_cell_id": "eWFGCxD9pv"
},
"source": [
"# by default containers don't have access to the network\n",
"print(d.run('ping -c 1 google.com'))"
],
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": "STDERR: Command '/bin/sh -c 'ping -c 1 google.com'' in image 'alpine:latest' returned non-zero exit status 1: b\"ping: bad address 'google.com'\\n\"\n"
}
],
"execution_count": 3
},
{
"cell_type": "code",
"metadata": {
"jukit_cell_id": "Z0YkpuXVyL"
},
"source": [
"# using the network parameter\n",
"print(d.run('ping -c 1 google.com', network='bridge'))"
],
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": "PING google.com (142.250.200.110): 56 data bytes\n64 bytes from 142.250.200.110: seq=0 ttl=42 time=13.695 ms\n\n--- google.com ping statistics ---\n1 packets transmitted, 1 packets received, 0% packet loss\nround-trip min/avg/max = 13.695/13.695/13.695 ms\n"
}
],
"execution_count": 4
},
{
"cell_type": "code",
"metadata": {
"jukit_cell_id": "3rMWzzuLHq"
},
"source": [],
"outputs": [],
"execution_count": null
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "python",
"language": "python",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

@ -0,0 +1,121 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "16763ed3",
"metadata": {},
"source": [
"# IFTTT WebHooks\n",
"\n",
"This notebook shows how to use IFTTT Webhooks.\n",
"\n",
"From https://github.com/SidU/teams-langchain-js/wiki/Connecting-IFTTT-Services.\n",
"\n",
"# Creating a webhook\n",
"- Go to https://ifttt.com/create\n",
"\n",
"# Configuring the \"If This\"\n",
"- Click on the \"If This\" button in the IFTTT interface.\n",
"- Search for \"Webhooks\" in the search bar.\n",
"- Choose the first option for \"Receive a web request with a JSON payload.\"\n",
"- Choose an Event Name that is specific to the service you plan to connect to.\n",
"This will make it easier for you to manage the webhook URL.\n",
"For example, if you're connecting to Spotify, you could use \"Spotify\" as your\n",
"Event Name.\n",
"- Click the \"Create Trigger\" button to save your settings and create your webhook.\n",
"\n",
"# Configuring the \"Then That\"\n",
"- Tap on the \"Then That\" button in the IFTTT interface.\n",
"- Search for the service you want to connect, such as Spotify.\n",
"- Choose an action from the service, such as \"Add track to a playlist\".\n",
"- Configure the action by specifying the necessary details, such as the playlist name,\n",
"e.g., \"Songs from AI\".\n",
"- Reference the JSON Payload received by the Webhook in your action. For the Spotify\n",
"scenario, choose \"{{JsonPayload}}\" as your search query.\n",
"- Tap the \"Create Action\" button to save your action settings.\n",
"- Once you have finished configuring your action, click the \"Finish\" button to\n",
"complete the setup.\n",
"- Congratulations! You have successfully connected the Webhook to the desired\n",
"service, and you're ready to start receiving data and triggering actions 🎉\n",
"\n",
"# Finishing up\n",
"- To get your webhook URL go to https://ifttt.com/maker_webhooks/settings\n",
"- Copy the IFTTT key value from there. The URL is of the form\n",
"https://maker.ifttt.com/use/YOUR_IFTTT_KEY. Grab the YOUR_IFTTT_KEY value.\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "10a46e7e",
"metadata": {},
"outputs": [],
"source": [
"from langchain.tools.ifttt import IFTTTWebhook"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "12003d72",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"key = os.environ[\"IFTTTKey\"]\n",
"url = f\"https://maker.ifttt.com/trigger/spotify/json/with/key/{key}\"\n",
"tool = IFTTTWebhook(name=\"Spotify\", description=\"Add a song to spotify playlist\", url=url)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "6e68f846",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\"Congratulations! You've fired the spotify JSON event\""
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tool.run(\"taylor swift\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a7e599c9",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

@ -1,29 +1,35 @@
# Key Concepts
## Python REPL
Sometimes, for complex calculations, rather than have an LLM generate the answer directly,
it can be better to have the LLM generate code to calculate the answer, and then run that code to get the answer.
Sometimes, for complex calculations, rather than have an LLM generate the answer directly,
it can be better to have the LLM generate code to calculate the answer, and then run that code to get the answer.
In order to easily do that, we provide a simple Python REPL to execute commands in.
This interface will only return things that are printed -
This interface will only return things that are printed -
therefore, if you want to use it to calculate an answer, make sure to have it print out the answer.
## Bash
It can often be useful to have an LLM generate bash commands, and then run them.
A common use case this is for letting it interact with your local file system.
It can often be useful to have an LLM generate bash commands, and then run them.
A common use case for this is letting the LLM interact with your local file system.
We provide an easy component to execute bash commands.
## Requests Wrapper
The web contains a lot of information that LLMs do not have access to.
In order to easily let LLMs interact with that information,
The web contains a lot of information that LLMs do not have access to.
In order to easily let LLMs interact with that information,
we provide a wrapper around the Python Requests module that takes in a URL and fetches data from that URL.
## Google Search
This uses the official Google Search API to look up information on the web.
## SerpAPI
This uses SerpAPI, a third party search API engine, to interact with Google Search.
## Searx Search
This uses the Searx (SearxNG fork) meta search engine API to lookup information
on the web. It supports 139 search engines and is easy to self-host
on the web. It supports 139 search engines and is easy to self-host
which makes it a good choice for privacy-conscious users.

@ -50,6 +50,8 @@ The following use cases require specific installs and api keys:
- _OpenSearch_:
- Install requirements with `pip install opensearch-py`
- If you want to set up OpenSearch on your local, [here](https://opensearch.org/docs/latest/)
- _DeepLake_:
- Install requirements with `pip install deeplake`
If you are using the `NLTKTextSplitter` or the `SpacyTextSplitter`, you will also need to install the appropriate models. For example, if you want to use the `SpacyTextSplitter`, you will need to install the `en_core_web_sm` model with `python -m spacy download en_core_web_sm`. Similarly, if you want to use the `NLTKTextSplitter`, you will need to install the `punkt` model with `python -m nltk.downloader punkt`.

@ -1,5 +1,41 @@
# Question Answering
Question answering in this context refers to question answering over your document data.
For question answering over other types of data, like [SQL databases](../modules/chains/examples/sqlite.html) or [APIs](../modules/chains/examples/api.html), please see [here](../modules/chains/utility_how_to.html)
For question answering over many documents, you almost always want to create an index over the data.
This can be used to smartly access the most relevant documents for a given question, allowing you to avoid having to pass all the documents to the LLM (saving you time and money).
See [this notebook](../modules/indexes/getting_started.ipynb) for a more detailed introduction to this, but for a super quick start the steps involved are:
**Load Your Documents**
```python
from langchain.document_loaders import TextLoader
loader = TextLoader('../state_of_the_union.txt')
```
See [here](../modules/document_loaders/how_to_guides.rst) for more information on how to get started with document loading.
**Create Your Index**
```python
from langchain.indexes import VectorstoreIndexCreator
index = VectorstoreIndexCreator().from_loaders([loader])
```
The best and most popular index by far at the moment is the VectorStore index.
**Query Your Index**
```python
query = "What did the president say about Ketanji Brown Jackson"
index.query(query)
```
Alternatively, use `query_with_sources` to also get back the sources involved
```python
query = "What did the president say about Ketanji Brown Jackson"
index.query_with_sources(query)
```
Again, these high level interfaces obfuscate a lot of what is going on under the hood, so please see [this notebook](../modules/indexes/getting_started.ipynb) for a lower level walkthrough.
## Document Question Answering
Question answering involves fetching multiple documents, and then asking a question of them.
The LLM response will contain the answer to your question, based on the content of the documents.
@ -15,7 +51,7 @@ The following resources exist:
- [Question Answering Notebook](/modules/indexes/chain_examples/question_answering.ipynb): A notebook walking through how to accomplish this task.
- [VectorDB Question Answering Notebook](/modules/indexes/chain_examples/vector_db_qa.ipynb): A notebook walking through how to do question answering over a vector database. This can often be useful for when you have a LOT of documents, and you don't want to pass them all to the LLM, but rather first want to do some semantic search over embeddings.
### Adding in sources
## Adding in sources
There is also a variant of this, where in addition to responding with the answer the language model will also cite its sources (eg which of the documents passed in it used).
@ -31,7 +67,7 @@ The following resources exist:
- [QA With Sources Notebook](/modules/indexes/chain_examples/qa_with_sources.ipynb): A notebook walking through how to accomplish this task.
- [VectorDB QA With Sources Notebook](/modules/indexes/chain_examples/vector_db_qa_with_sources.ipynb): A notebook walking through how to do question answering with sources over a vector database. This can often be useful for when you have a LOT of documents, and you don't want to pass them all to the LLM, but rather first want to do some semantic search over embeddings.
### Additional Related Resources
## Additional Related Resources
Additional related resources include:
- [Utilities for working with Documents](/modules/utils/how_to_guides.rst): Guides on how to use several of the utilities which will prove helpful for this task, including Text Splitters (for splitting up long documents) and Embeddings & Vectorstores (useful for the above Vector DB example).

@ -24,13 +24,17 @@ from langchain.chains import (
from langchain.docstore import InMemoryDocstore, Wikipedia
from langchain.llms import (
Anthropic,
Banana,
CerebriumAI,
Cohere,
ForefrontAI,
GooseAI,
HuggingFaceHub,
Modal,
OpenAI,
Petals,
StochasticAI,
Writer,
)
from langchain.llms.huggingface_pipeline import HuggingFacePipeline
from langchain.prompts import (
@ -67,12 +71,16 @@ __all__ = [
"GoogleSerperAPIWrapper",
"WolframAlphaAPIWrapper",
"Anthropic",
"Banana",
"CerebriumAI",
"Cohere",
"ForefrontAI",
"GooseAI",
"Modal",
"OpenAI",
"Petals",
"StochasticAI",
"Writer",
"BasePromptTemplate",
"Prompt",
"FewShotPromptTemplate",

@ -179,7 +179,7 @@ _EXTRA_OPTIONAL_TOOLS = {
"bing-search": (_get_bing_search, ["bing_subscription_key", "bing_search_url"]),
"google-serper": (_get_google_serper, ["serper_api_key"]),
"serpapi": (_get_serpapi, ["serpapi_api_key", "aiosession"]),
"searx-search": (_get_searx_search, ["searx_host", "searx_host"]),
"searx-search": (_get_searx_search, ["searx_host"]),
}

@ -87,7 +87,7 @@ class SQLAlchemyCache(BaseCache):
prompt=prompt, llm=llm_string, response=generation.text, idx=i
)
with Session(self.engine) as session, session.begin():
session.add(item)
session.merge(item)
class SQLiteCache(SQLAlchemyCache):

@ -101,7 +101,7 @@ class ChatVectorDBChain(Chain, BaseModel):
else:
return {self.output_key: answer}
async def _acall(self, inputs: Dict[str, Any]) -> Dict[str, str]:
async def _acall(self, inputs: Dict[str, Any]) -> Dict[str, Any]:
question = inputs["question"]
chat_history_str = _get_chat_history(inputs["chat_history"])
vectordbkwargs = inputs.get("vectordbkwargs", {})
@ -119,4 +119,7 @@ class ChatVectorDBChain(Chain, BaseModel):
new_inputs["question"] = new_question
new_inputs["chat_history"] = chat_history_str
answer, _ = await self.combine_docs_chain.acombine_docs(docs, **new_inputs)
return {self.output_key: answer}
if self.return_source_documents:
return {self.output_key: answer, "source_documents": docs}
else:
return {self.output_key: answer}

@ -16,7 +16,7 @@ class ConstitutionalChain(Chain):
.. code-block:: python
from langchain.llms import OpenAI
from langchian.chains import LLMChain, ConstitutionalChain
from langchain.chains import LLMChain, ConstitutionalChain
qa_prompt = PromptTemplate(
template="Q: {question} A:",

@ -2,7 +2,7 @@
from langchain.prompts.base import CommaSeparatedListOutputParser
from langchain.prompts.prompt import PromptTemplate
_DEFAULT_TEMPLATE = """Given an input question, first create a syntactically correct {dialect} query to run, then look at the results of the query and return the answer. Unless the user specifies in his question a specific number of examples he wishes to obtain, always limit your query to at most {top_k} results using the LIMIT clause. You can order the results by a relevant column to return the most interesting examples in the database.
_DEFAULT_TEMPLATE = """Given an input question, first create a syntactically correct {dialect} query to run, then look at the results of the query and return the answer. Unless the user specifies in his question a specific number of examples he wishes to obtain, always limit your query to at most {top_k} results. You can order the results by a relevant column to return the most interesting examples in the database.
Never query for all the columns from a specific table, only ask for a the few relevant columns given the question.

@ -3,10 +3,12 @@
from langchain.document_loaders.airbyte_json import AirbyteJSONLoader
from langchain.document_loaders.azlyrics import AZLyricsLoader
from langchain.document_loaders.college_confidential import CollegeConfidentialLoader
from langchain.document_loaders.conllu import CoNLLULoader
from langchain.document_loaders.directory import DirectoryLoader
from langchain.document_loaders.docx import UnstructuredDocxLoader
from langchain.document_loaders.email import UnstructuredEmailLoader
from langchain.document_loaders.evernote import EverNoteLoader
from langchain.document_loaders.facebook_chat import FacebookChatLoader
from langchain.document_loaders.gcs_directory import GCSDirectoryLoader
from langchain.document_loaders.gcs_file import GCSFileLoader
from langchain.document_loaders.gitbook import GitbookLoader
@ -14,7 +16,10 @@ from langchain.document_loaders.googledrive import GoogleDriveLoader
from langchain.document_loaders.gutenberg import GutenbergLoader
from langchain.document_loaders.hn import HNLoader
from langchain.document_loaders.html import UnstructuredHTMLLoader
from langchain.document_loaders.ifixit import IFixitLoader
from langchain.document_loaders.image import UnstructuredImageLoader
from langchain.document_loaders.imsdb import IMSDbLoader
from langchain.document_loaders.notebook import NotebookLoader
from langchain.document_loaders.notion import NotionDirectoryLoader
from langchain.document_loaders.obsidian import ObsidianLoader
from langchain.document_loaders.online_pdf import OnlinePDFLoader
@ -34,6 +39,7 @@ from langchain.document_loaders.unstructured import (
)
from langchain.document_loaders.url import UnstructuredURLLoader
from langchain.document_loaders.web_base import WebBaseLoader
from langchain.document_loaders.word_document import UnstructuredWordDocumentLoader
from langchain.document_loaders.youtube import YoutubeLoader
__all__ = [
@ -46,7 +52,9 @@ __all__ = [
"GoogleDriveLoader",
"UnstructuredHTMLLoader",
"UnstructuredPowerPointLoader",
"UnstructuredWordDocumentLoader",
"UnstructuredPDFLoader",
"UnstructuredImageLoader",
"ObsidianLoader",
"UnstructuredDocxLoader",
"UnstructuredEmailLoader",
@ -63,6 +71,7 @@ __all__ = [
"IMSDbLoader",
"AZLyricsLoader",
"CollegeConfidentialLoader",
"IFixitLoader",
"GutenbergLoader",
"PagedPDFSplitter",
"EverNoteLoader",
@ -71,4 +80,7 @@ __all__ = [
"PDFMinerLoader",
"TelegramChatLoader",
"SRTLoader",
"FacebookChatLoader",
"NotebookLoader",
"CoNLLULoader",
]

@ -0,0 +1,33 @@
"""Load CoNLL-U files."""
import csv
from typing import List
from langchain.docstore.document import Document
from langchain.document_loaders.base import BaseLoader
class CoNLLULoader(BaseLoader):
"""Load CoNLL-U files."""
def __init__(self, file_path: str):
"""Initialize with file path."""
self.file_path = file_path
def load(self) -> List[Document]:
"""Load from file path."""
with open(self.file_path, encoding="utf8") as f:
tsv = list(csv.reader(f, delimiter="\t"))
# If len(line) > 1, the line is not a comment
lines = [line for line in tsv if len(line) > 1]
text = ""
for i, line in enumerate(lines):
# Do not add a space after a punctuation mark or at the end of the sentence
if line[9] == "SpaceAfter=No" or i == len(lines) - 1:
text += line[1]
else:
text += line[1] + " "
metadata = {"source": self.file_path}
return [Document(page_content=text, metadata=metadata)]

@ -0,0 +1,57 @@
"""Loader that loads Facebook chat json dump."""
import datetime
import json
from pathlib import Path
from typing import List
from langchain.docstore.document import Document
from langchain.document_loaders.base import BaseLoader
def concatenate_rows(row: dict) -> str:
"""Combine message information in a readable format ready to be used."""
sender = row["sender_name"]
text = row["content"]
date = datetime.datetime.fromtimestamp(row["timestamp_ms"] / 1000).strftime(
"%Y-%m-%d %H:%M:%S"
)
return f"{sender} on {date}: {text}\n\n"
class FacebookChatLoader(BaseLoader):
"""Loader that loads Facebook messages json directory dump."""
def __init__(self, path: str):
"""Initialize with path."""
self.file_path = path
def load(self) -> List[Document]:
"""Load documents."""
try:
import pandas as pd
except ImportError:
raise ValueError(
"pandas is needed for Facebook chat loader, "
"please install with `pip install pandas`"
)
p = Path(self.file_path)
with open(p, encoding="utf8") as f:
d = json.load(f)
normalized_messages = pd.json_normalize(d["messages"])
df_normalized_messages = pd.DataFrame(normalized_messages)
# Only keep plain text messages
# (no services, nor links, hashtags, code, bold ...)
df_filtered = df_normalized_messages[
(df_normalized_messages.content.apply(lambda x: type(x) == str))
]
df_filtered = df_filtered[["timestamp_ms", "content", "sender_name"]]
text = df_filtered.apply(concatenate_rows, axis=1).str.cat(sep="")
metadata = {"source": str(p)}
return [Document(page_content=text, metadata=metadata)]

@ -0,0 +1,202 @@
"""Loader that loads iFixit data."""
from typing import List, Optional
import requests
from langchain.docstore.document import Document
from langchain.document_loaders.base import BaseLoader
from langchain.document_loaders.web_base import WebBaseLoader
IFIXIT_BASE_URL = "https://www.ifixit.com/api/2.0"
class IFixitLoader(BaseLoader):
"""Load iFixit repair guides, device wikis and answers.
iFixit is the largest, open repair community on the web. The site contains nearly
100k repair manuals, 200k Questions & Answers on 42k devices, and all the data is
licensed under CC-BY.
This loader will allow you to download the text of a repair guide, text of Q&A's
and wikis from devices on iFixit using their open APIs and web scraping.
"""
def __init__(self, web_path: str):
"""Initialize with web path."""
if not web_path.startswith("https://www.ifixit.com"):
raise ValueError("web path must start with 'https://www.ifixit.com'")
path = web_path.replace("https://www.ifixit.com", "")
allowed_paths = ["/Device", "/Guide", "/Answers", "/Teardown"]
""" TODO: Add /Wiki """
if not any(path.startswith(allowed_path) for allowed_path in allowed_paths):
raise ValueError(
"web path must start with /Device, /Guide, /Teardown or /Answers"
)
pieces = [x for x in path.split("/") if x]
"""Teardowns are just guides by a different name"""
self.page_type = pieces[0] if pieces[0] != "Teardown" else "Guide"
if self.page_type == "Guide" or self.page_type == "Answers":
self.id = pieces[2]
else:
self.id = pieces[1]
self.web_path = web_path
def load(self) -> List[Document]:
if self.page_type == "Device":
return self.load_device()
elif self.page_type == "Guide" or self.page_type == "Teardown":
return self.load_guide()
elif self.page_type == "Answers":
return self.load_questions_and_answers()
else:
raise ValueError("Unknown page type: " + self.page_type)
@staticmethod
def load_suggestions(query: str = "", doc_type: str = "all") -> List[Document]:
res = requests.get(
IFIXIT_BASE_URL + "/suggest/" + query + "?doctypes=" + doc_type
)
if res.status_code != 200:
raise ValueError(
'Could not load suggestions for "' + query + '"\n' + res.json()
)
data = res.json()
results = data["results"]
output = []
for result in results:
try:
loader = IFixitLoader(result["url"])
if loader.page_type == "Device":
output += loader.load_device(include_guides=False)
else:
output += loader.load()
except ValueError:
continue
return output
def load_questions_and_answers(
self, url_override: Optional[str] = None
) -> List[Document]:
loader = WebBaseLoader(self.web_path if url_override is None else url_override)
soup = loader.scrape()
output = []
title = soup.find("h1", "post-title").text
output.append("# " + title)
output.append(soup.select_one(".post-content .post-text").text.strip())
output.append("\n## " + soup.find("div", "post-answers-header").text.strip())
for answer in soup.select(".js-answers-list .post.post-answer"):
if answer.has_attr("itemprop") and "acceptedAnswer" in answer["itemprop"]:
output.append("\n### Accepted Answer")
elif "post-helpful" in answer["class"]:
output.append("\n### Most Helpful Answer")
else:
output.append("\n### Other Answer")
output += [
a.text.strip() for a in answer.select(".post-content .post-text")
]
output.append("\n")
text = "\n".join(output).strip()
metadata = {"source": self.web_path, "title": title}
return [Document(page_content=text, metadata=metadata)]
def load_device(
self, url_override: Optional[str] = None, include_guides: bool = True
) -> List[Document]:
documents = []
if url_override is None:
url = IFIXIT_BASE_URL + "/wikis/CATEGORY/" + self.id
else:
url = url_override
res = requests.get(url)
data = res.json()
text = "\n".join(
[
data[key]
for key in ["title", "description", "contents_raw"]
if key in data
]
).strip()
metadata = {"source": self.web_path, "title": data["title"]}
documents.append(Document(page_content=text, metadata=metadata))
if include_guides:
"""Load and return documents for each guide linked to from the device"""
guide_urls = [guide["url"] for guide in data["guides"]]
for guide_url in guide_urls:
documents.append(IFixitLoader(guide_url).load()[0])
return documents
def load_guide(self, url_override: Optional[str] = None) -> List[Document]:
if url_override is None:
url = IFIXIT_BASE_URL + "/guides/" + self.id
else:
url = url_override
res = requests.get(url)
if res.status_code != 200:
raise ValueError(
"Could not load guide: " + self.web_path + "\n" + res.json()
)
data = res.json()
doc_parts = ["# " + data["title"], data["introduction_raw"]]
doc_parts.append("\n\n###Tools Required:")
if len(data["tools"]) == 0:
doc_parts.append("\n - None")
else:
for tool in data["tools"]:
doc_parts.append("\n - " + tool["text"])
doc_parts.append("\n\n###Parts Required:")
if len(data["parts"]) == 0:
doc_parts.append("\n - None")
else:
for part in data["parts"]:
doc_parts.append("\n - " + part["text"])
for row in data["steps"]:
doc_parts.append(
"\n\n## "
+ (
row["title"]
if row["title"] != ""
else "Step {}".format(row["orderby"])
)
)
for line in row["lines"]:
doc_parts.append(line["text_raw"])
doc_parts.append(data["conclusion_raw"])
text = "\n".join(doc_parts)
metadata = {"source": self.web_path, "title": data["title"]}
return [Document(page_content=text, metadata=metadata)]

@ -0,0 +1,13 @@
"""Loader that loads image files."""
from typing import List
from langchain.document_loaders.unstructured import UnstructuredFileLoader
class UnstructuredImageLoader(UnstructuredFileLoader):
"""Loader that uses unstructured to load image files, such as PNGs and JPGs."""
def _get_elements(self) -> List:
from unstructured.partition.image import partition_image
return partition_image(filename=self.file_path)

@ -0,0 +1,109 @@
"""Loader that loads .ipynb notebook files."""
import json
from pathlib import Path
from typing import Any, List
from langchain.docstore.document import Document
from langchain.document_loaders.base import BaseLoader
def concatenate_cells(
cell: dict, include_outputs: bool, max_output_length: int, traceback: bool
) -> str:
"""Combine cells information in a readable format ready to be used."""
cell_type = cell["cell_type"]
source = cell["source"]
output = cell["outputs"]
if include_outputs and cell_type == "code" and output:
if "ename" in output[0].keys():
error_name = output[0]["ename"]
error_value = output[0]["evalue"]
if traceback:
traceback = output[0]["traceback"]
return (
f"'{cell_type}' cell: '{source}'\n, gives error '{error_name}',"
f" with description '{error_value}'\n"
f"and traceback '{traceback}'\n\n"
)
else:
return (
f"'{cell_type}' cell: '{source}'\n, gives error '{error_name}',"
f"with description '{error_value}'\n\n"
)
elif output[0]["output_type"] == "stream":
output = output[0]["text"]
min_output = min(max_output_length, len(output))
return (
f"'{cell_type}' cell: '{source}'\n with "
f"output: '{output[:min_output]}'\n\n"
)
else:
return f"'{cell_type}' cell: '{source}'\n\n"
return ""
def remove_newlines(x: Any) -> Any:
"""Remove recursively newlines, no matter the data structure they are stored in."""
import pandas as pd
if isinstance(x, str):
return x.replace("\n", "")
elif isinstance(x, list):
return [remove_newlines(elem) for elem in x]
elif isinstance(x, pd.DataFrame):
return x.applymap(remove_newlines)
else:
return x
class NotebookLoader(BaseLoader):
"""Loader that loads .ipynb notebook files."""
def __init__(
self,
path: str,
include_outputs: bool = False,
max_output_length: int = 10,
remove_newline: bool = False,
traceback: bool = False,
):
"""Initialize with path."""
self.file_path = path
self.include_outputs = include_outputs
self.max_output_length = max_output_length
self.remove_newline = remove_newline
self.traceback = traceback
def load(
self,
) -> List[Document]:
"""Load documents."""
try:
import pandas as pd
except ImportError:
raise ValueError(
"pandas is needed for Notebook Loader, "
"please install with `pip install pandas`"
)
p = Path(self.file_path)
with open(p, encoding="utf8") as f:
d = json.load(f)
data = pd.json_normalize(d["cells"])
filtered_data = data[["cell_type", "source", "outputs"]]
if self.remove_newline:
filtered_data = filtered_data.applymap(remove_newlines)
text = filtered_data.apply(
lambda x: concatenate_cells(
x, self.include_outputs, self.max_output_length, self.traceback
),
axis=1,
).str.cat(sep=" ")
metadata = {"source": str(p)}
return [Document(page_content=text, metadata=metadata)]

@ -0,0 +1,43 @@
"""Loader that loads word documents."""
import os
from typing import List
from langchain.document_loaders.unstructured import UnstructuredFileLoader
class UnstructuredWordDocumentLoader(UnstructuredFileLoader):
"""Loader that uses unstructured to load word documents."""
def _get_elements(self) -> List:
from unstructured.__version__ import __version__ as __unstructured_version__
from unstructured.file_utils.filetype import FileType, detect_filetype
unstructured_version = tuple(
[int(x) for x in __unstructured_version__.split(".")]
)
# NOTE(MthwRobinson) - magic will raise an import error if the libmagic
# system dependency isn't installed. If it's not installed, we'll just
# check the file extension
try:
import magic # noqa: F401
is_doc = detect_filetype(self.file_path) == FileType.DOC
except ImportError:
_, extension = os.path.splitext(self.file_path)
is_doc = extension == ".doc"
if is_doc and unstructured_version < (0, 4, 11):
raise ValueError(
f"You are on unstructured version {__unstructured_version__}. "
"Partitioning .doc files is only supported in unstructured>=0.4.11. "
"Please upgrade the unstructured package and try again."
)
if is_doc:
from unstructured.partition.doc import partition_doc
return partition_doc(filename=self.file_path)
else:
from unstructured.partition.docx import partition_docx
return partition_docx(filename=self.file_path)

@ -10,10 +10,13 @@ from langchain.document_loaders.base import BaseLoader
class YoutubeLoader(BaseLoader):
"""Loader that loads Youtube transcripts."""
def __init__(self, video_id: str, add_video_info: bool = False):
def __init__(
self, video_id: str, add_video_info: bool = False, language: str = "en"
):
"""Initialize with YouTube video ID."""
self.video_id = video_id
self.add_video_info = add_video_info
self.language = language
@classmethod
def from_youtube_url(cls, youtube_url: str, **kwargs: Any) -> YoutubeLoader:
@ -39,7 +42,9 @@ class YoutubeLoader(BaseLoader):
video_info = self._get_video_info()
metadata.update(video_info)
transcript_pieces = YouTubeTranscriptApi.get_transcript(self.video_id)
transcript_pieces = YouTubeTranscriptApi.get_transcript(
self.video_id, languages=(self.language,)
)
transcript = " ".join([t["text"].strip(" ") for t in transcript_pieces])
return [Document(page_content=transcript, metadata=metadata)]

@ -25,7 +25,7 @@ class CohereEmbeddings(BaseModel, Embeddings):
model: str = "large"
"""Model name to use."""
truncate: str = "NONE"
truncate: Optional[str] = None
"""Truncate embeddings that are too long from start or end ("NONE"|"START"|"END")"""
cohere_api_key: Optional[str] = None

@ -1,4 +1,5 @@
"""All index utils."""
from langchain.indexes.graph import GraphIndexCreator
from langchain.indexes.vectorstore import VectorstoreIndexCreator
__all__ = ["GraphIndexCreator"]
__all__ = ["GraphIndexCreator", "VectorstoreIndexCreator"]

@ -0,0 +1,69 @@
from typing import Any, List, Optional, Type
from pydantic import BaseModel, Extra, Field
from langchain.chains.qa_with_sources.vector_db import VectorDBQAWithSourcesChain
from langchain.chains.vector_db_qa.base import VectorDBQA
from langchain.document_loaders.base import BaseLoader
from langchain.embeddings.base import Embeddings
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.llms.base import BaseLLM
from langchain.llms.openai import OpenAI
from langchain.text_splitter import RecursiveCharacterTextSplitter, TextSplitter
from langchain.vectorstores.base import VectorStore
from langchain.vectorstores.chroma import Chroma
def _get_default_text_splitter() -> TextSplitter:
return RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
class VectorStoreIndexWrapper(BaseModel):
"""Wrapper around a vectorstore for easy access."""
vectorstore: VectorStore
class Config:
"""Configuration for this pydantic object."""
extra = Extra.forbid
arbitrary_types_allowed = True
def query(self, question: str, llm: Optional[BaseLLM] = None, **kwargs: Any) -> str:
"""Query the vectorstore."""
llm = llm or OpenAI(temperature=0)
chain = VectorDBQA.from_chain_type(llm, vectorstore=self.vectorstore, **kwargs)
return chain.run(question)
def query_with_sources(
self, question: str, llm: Optional[BaseLLM] = None, **kwargs: Any
) -> dict:
"""Query the vectorstore and get back sources."""
llm = llm or OpenAI(temperature=0)
chain = VectorDBQAWithSourcesChain.from_chain_type(
llm, vectorstore=self.vectorstore, **kwargs
)
return chain({chain.question_key: question})
class VectorstoreIndexCreator(BaseModel):
"""Logic for creating indexes."""
vectorstore_cls: Type[VectorStore] = Chroma
embedding: Embeddings = Field(default_factory=OpenAIEmbeddings)
text_splitter: TextSplitter = Field(default_factory=_get_default_text_splitter)
class Config:
"""Configuration for this pydantic object."""
extra = Extra.forbid
arbitrary_types_allowed = True
def from_loaders(self, loaders: List[BaseLoader]) -> VectorStoreIndexWrapper:
"""Create a vectorstore index from loaders."""
docs = []
for loader in loaders:
docs.extend(loader.load())
sub_docs = self.text_splitter.split_documents(docs)
vectorstore = self.vectorstore_cls.from_documents(sub_docs, self.embedding)
return VectorStoreIndexWrapper(vectorstore=vectorstore)

@ -2,28 +2,38 @@
from typing import Dict, Type
from langchain.llms.ai21 import AI21
from langchain.llms.aleph_alpha import AlephAlpha
from langchain.llms.anthropic import Anthropic
from langchain.llms.bananadev import Banana
from langchain.llms.base import BaseLLM
from langchain.llms.cerebriumai import CerebriumAI
from langchain.llms.cohere import Cohere
from langchain.llms.deepinfra import DeepInfra
from langchain.llms.forefrontai import ForefrontAI
from langchain.llms.gooseai import GooseAI
from langchain.llms.huggingface_endpoint import HuggingFaceEndpoint
from langchain.llms.huggingface_hub import HuggingFaceHub
from langchain.llms.huggingface_pipeline import HuggingFacePipeline
from langchain.llms.modal import Modal
from langchain.llms.nlpcloud import NLPCloud
from langchain.llms.openai import AzureOpenAI, OpenAI
from langchain.llms.petals import Petals
from langchain.llms.promptlayer_openai import PromptLayerOpenAI
from langchain.llms.self_hosted import SelfHostedPipeline
from langchain.llms.self_hosted_hugging_face import SelfHostedHuggingFaceLLM
from langchain.llms.stochasticai import StochasticAI
from langchain.llms.writer import Writer
__all__ = [
"Anthropic",
"AlephAlpha",
"Banana",
"CerebriumAI",
"Cohere",
"DeepInfra",
"ForefrontAI",
"GooseAI",
"Modal",
"NLPCloud",
"OpenAI",
"Petals",
@ -35,17 +45,23 @@ __all__ = [
"SelfHostedPipeline",
"SelfHostedHuggingFaceLLM",
"PromptLayerOpenAI",
"StochasticAI",
"Writer",
]
type_to_cls_dict: Dict[str, Type[BaseLLM]] = {
"ai21": AI21,
"aleph_alpha": AlephAlpha,
"anthropic": Anthropic,
"bananadev": Banana,
"cerebriumai": CerebriumAI,
"cohere": Cohere,
"deepinfra": DeepInfra,
"forefrontai": ForefrontAI,
"gooseai": GooseAI,
"huggingface_hub": HuggingFaceHub,
"huggingface_endpoint": HuggingFaceEndpoint,
"modal": Modal,
"nlpcloud": NLPCloud,
"openai": OpenAI,
"petals": Petals,
@ -53,4 +69,6 @@ type_to_cls_dict: Dict[str, Type[BaseLLM]] = {
"azure": AzureOpenAI,
"self_hosted": SelfHostedPipeline,
"self_hosted_hugging_face": SelfHostedHuggingFaceLLM,
"stochasticai": StochasticAI,
"writer": Writer,
}

@ -0,0 +1,236 @@
"""Wrapper around Aleph Alpha APIs."""
from typing import Any, Dict, List, Optional, Sequence
from pydantic import BaseModel, Extra, root_validator
from langchain.llms.base import LLM
from langchain.llms.utils import enforce_stop_tokens
from langchain.utils import get_from_dict_or_env
class AlephAlpha(LLM, BaseModel):
"""Wrapper around Aleph Alpha large language models.
To use, you should have the ``aleph_alpha_client`` python package installed, and the
environment variable ``ALEPH_ALPHA_API_KEY`` set with your API key, or pass
it as a named parameter to the constructor.
Parameters are explained more in depth here:
https://github.com/Aleph-Alpha/aleph-alpha-client/blob/c14b7dd2b4325c7da0d6a119f6e76385800e097b/aleph_alpha_client/completion.py#L10
Example:
.. code-block:: python
from langchain.llms import AlephAlpha
alpeh_alpha = AlephAlpha(aleph_alpha_api_key="my-api-key")
"""
client: Any #: :meta private:
model: Optional[str] = "luminous-base"
"""Model name to use."""
maximum_tokens: int = 64
"""The maximum number of tokens to be generated."""
temperature: float = 0.0
"""A non-negative float that tunes the degree of randomness in generation."""
top_k: int = 0
"""Number of most likely tokens to consider at each step."""
top_p: float = 0.0
"""Total probability mass of tokens to consider at each step."""
presence_penalty: float = 0.0
"""Penalizes repeated tokens."""
frequency_penalty: float = 0.0
"""Penalizes repeated tokens according to frequency."""
repetition_penalties_include_prompt: Optional[bool] = False
"""Flag deciding whether presence penalty or frequency penalty are
updated from the prompt."""
use_multiplicative_presence_penalty: Optional[bool] = False
"""Flag deciding whether presence penalty is applied
multiplicatively (True) or additively (False)."""
penalty_bias: Optional[str] = None
"""Penalty bias for the completion."""
penalty_exceptions: Optional[List[str]] = None
"""List of strings that may be generated without penalty,
regardless of other penalty settings"""
penalty_exceptions_include_stop_sequences: Optional[bool] = None
"""Should stop_sequences be included in penalty_exceptions."""
best_of: Optional[int] = None
"""returns the one with the "best of" results
(highest log probability per token)
"""
n: int = 1
"""How many completions to generate for each prompt."""
logit_bias: Optional[Dict[int, float]] = None
"""The logit bias allows to influence the likelihood of generating tokens."""
log_probs: Optional[int] = None
"""Number of top log probabilities to be returned for each generated token."""
tokens: Optional[bool] = False
"""return tokens of completion."""
disable_optimizations: Optional[bool] = False
minimum_tokens: Optional[int] = 0
"""Generate at least this number of tokens."""
echo: bool = False
"""Echo the prompt in the completion."""
use_multiplicative_frequency_penalty: bool = False
sequence_penalty: float = 0.0
sequence_penalty_min_length: int = 2
use_multiplicative_sequence_penalty: bool = False
completion_bias_inclusion: Optional[Sequence[str]] = None
completion_bias_inclusion_first_token_only: bool = False
completion_bias_exclusion: Optional[Sequence[str]] = None
completion_bias_exclusion_first_token_only: bool = False
"""Only consider the first token for the completion_bias_exclusion."""
contextual_control_threshold: Optional[float] = None
"""If set to None, attention control parameters only apply to those tokens that have
explicitly been set in the request.
If set to a non-None value, control parameters are also applied to similar tokens.
"""
control_log_additive: Optional[bool] = True
"""True: apply control by adding the log(control_factor) to attention scores.
False: (attention_scores - - attention_scores.min(-1)) * control_factor
"""
repetition_penalties_include_completion: bool = True
"""Flag deciding whether presence penalty or frequency penalty
are updated from the completion."""
raw_completion: bool = False
"""Force the raw completion of the model to be returned."""
aleph_alpha_api_key: Optional[str] = None
"""API key for Aleph Alpha API."""
stop_sequences: Optional[List[str]] = None
"""Stop sequences to use."""
class Config:
"""Configuration for this pydantic object."""
extra = Extra.forbid
@root_validator()
def validate_environment(cls, values: Dict) -> Dict:
"""Validate that api key and python package exists in environment."""
aleph_alpha_api_key = get_from_dict_or_env(
values, "aleph_alpha_api_key", "ALEPH_ALPHA_API_KEY"
)
try:
import aleph_alpha_client
values["client"] = aleph_alpha_client.Client(token=aleph_alpha_api_key)
except ImportError:
raise ValueError(
"Could not import aleph_alpha_client python package. "
"Please it install it with `pip install aleph_alpha_client`."
)
return values
@property
def _default_params(self) -> Dict[str, Any]:
"""Get the default parameters for calling the Aleph Alpha API."""
return {
"maximum_tokens": self.maximum_tokens,
"temperature": self.temperature,
"top_k": self.top_k,
"top_p": self.top_p,
"presence_penalty": self.presence_penalty,
"frequency_penalty": self.frequency_penalty,
"n": self.n,
"repetition_penalties_include_prompt": self.repetition_penalties_include_prompt, # noqa: E501
"use_multiplicative_presence_penalty": self.use_multiplicative_presence_penalty, # noqa: E501
"penalty_bias": self.penalty_bias,
"penalty_exceptions": self.penalty_exceptions,
"penalty_exceptions_include_stop_sequences": self.penalty_exceptions_include_stop_sequences, # noqa: E501
"best_of": self.best_of,
"logit_bias": self.logit_bias,
"log_probs": self.log_probs,
"tokens": self.tokens,
"disable_optimizations": self.disable_optimizations,
"minimum_tokens": self.minimum_tokens,
"echo": self.echo,
"use_multiplicative_frequency_penalty": self.use_multiplicative_frequency_penalty, # noqa: E501
"sequence_penalty": self.sequence_penalty,
"sequence_penalty_min_length": self.sequence_penalty_min_length,
"use_multiplicative_sequence_penalty": self.use_multiplicative_sequence_penalty, # noqa: E501
"completion_bias_inclusion": self.completion_bias_inclusion,
"completion_bias_inclusion_first_token_only": self.completion_bias_inclusion_first_token_only, # noqa: E501
"completion_bias_exclusion": self.completion_bias_exclusion,
"completion_bias_exclusion_first_token_only": self.completion_bias_exclusion_first_token_only, # noqa: E501
"contextual_control_threshold": self.contextual_control_threshold,
"control_log_additive": self.control_log_additive,
"repetition_penalties_include_completion": self.repetition_penalties_include_completion, # noqa: E501
"raw_completion": self.raw_completion,
}
@property
def _identifying_params(self) -> Dict[str, Any]:
"""Get the identifying parameters."""
return {**{"model": self.model}, **self._default_params}
@property
def _llm_type(self) -> str:
"""Return type of llm."""
return "alpeh_alpha"
def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
"""Call out to Aleph Alpha's completion endpoint.
Args:
prompt: The prompt to pass into the model.
stop: Optional list of stop words to use when generating.
Returns:
The string generated by the model.
Example:
.. code-block:: python
response = alpeh_alpha("Tell me a joke.")
"""
from aleph_alpha_client import CompletionRequest, Prompt
params = self._default_params
if self.stop_sequences is not None and stop is not None:
raise ValueError(
"stop sequences found in both the input and default params."
)
elif self.stop_sequences is not None:
params["stop_sequences"] = self.stop_sequences
else:
params["stop_sequences"] = stop
request = CompletionRequest(prompt=Prompt.from_text(prompt), **params)
response = self.client.complete(model=self.model, request=request)
text = response.completions[0].completion
# If stop tokens are provided, Aleph Alpha's endpoint returns them.
# In order to make this consistent with other endpoints, we strip them.
if stop is not None or self.stop_sequences is not None:
text = enforce_stop_tokens(text, params["stop_sequences"])
return text

@ -18,7 +18,7 @@ class Anthropic(LLM, BaseModel):
Example:
.. code-block:: python
import anthropic
from langchain import Anthropic
from langchain.llms import Anthropic
model = Anthropic(model="<model_name>", anthropic_api_key="my-api-key")
# Simplest invocation, automatically wrapped with HUMAN_PROMPT

@ -0,0 +1,117 @@
"""Wrapper around Banana API."""
import logging
from typing import Any, Dict, List, Mapping, Optional
from pydantic import BaseModel, Extra, Field, root_validator
from langchain.llms.base import LLM
from langchain.llms.utils import enforce_stop_tokens
from langchain.utils import get_from_dict_or_env
logger = logging.getLogger(__name__)
class Banana(LLM, BaseModel):
"""Wrapper around Banana large language models.
To use, you should have the ``banana-dev`` python package installed,
and the environment variable ``BANANA_API_KEY`` set with your API key.
Any parameters that are valid to be passed to the call can be passed
in, even if not explicitly saved on this class.
Example:
.. code-block:: python
from langchain.llms import Banana
banana = Banana(model_key="")
"""
model_key: str = ""
"""model endpoint to use"""
model_kwargs: Dict[str, Any] = Field(default_factory=dict)
"""Holds any model parameters valid for `create` call not
explicitly specified."""
banana_api_key: Optional[str] = None
class Config:
"""Configuration for this pydantic config."""
extra = Extra.forbid
@root_validator(pre=True)
def build_extra(cls, values: Dict[str, Any]) -> Dict[str, Any]:
"""Build extra kwargs from additional params that were passed in."""
all_required_field_names = {field.alias for field in cls.__fields__.values()}
extra = values.get("model_kwargs", {})
for field_name in list(values):
if field_name not in all_required_field_names:
if field_name in extra:
raise ValueError(f"Found {field_name} supplied twice.")
logger.warning(
f"""{field_name} was transfered to model_kwargs.
Please confirm that {field_name} is what you intended."""
)
extra[field_name] = values.pop(field_name)
values["model_kwargs"] = extra
return values
@root_validator()
def validate_environment(cls, values: Dict) -> Dict:
"""Validate that api key and python package exists in environment."""
banana_api_key = get_from_dict_or_env(
values, "banana_api_key", "BANANA_API_KEY"
)
values["banana_api_key"] = banana_api_key
return values
@property
def _identifying_params(self) -> Mapping[str, Any]:
"""Get the identifying parameters."""
return {
**{"model_key": self.model_key},
**{"model_kwargs": self.model_kwargs},
}
@property
def _llm_type(self) -> str:
"""Return type of llm."""
return "banana"
def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
"""Call to Banana endpoint."""
try:
import banana_dev as banana
except ImportError:
raise ValueError(
"Could not import banana-dev python package. "
"Please install it with `pip install banana-dev`."
)
params = self.model_kwargs or {}
api_key = self.banana_api_key
model_key = self.model_key
model_inputs = {
# a json specific to your model.
"prompt": prompt,
**params,
}
response = banana.run(api_key, model_key, model_inputs)
try:
text = response["modelOutputs"][0]["output"]
except (KeyError, TypeError):
returned = response["modelOutputs"][0]
raise ValueError(
"Response should be of schema: {'output': 'text'}."
f"\nResponse was: {returned}"
"\nTo fix this:"
"\n- fork the source repo of the Banana model"
"\n- modify app.py to return the above schema"
"\n- deploy that as a custom repo"
)
if stop is not None:
# I believe this is required since the stop tokens
# are not enforced by the model parameters
text = enforce_stop_tokens(text, stop)
return text

@ -22,7 +22,7 @@ class CerebriumAI(LLM, BaseModel):
Example:
.. code-block:: python
from langchain import CerebriumAI
from langchain.llms import CerebriumAI
cerebrium = CerebriumAI(endpoint_url="")
"""

@ -21,7 +21,7 @@ class Cohere(LLM, BaseModel):
Example:
.. code-block:: python
from langchain import Cohere
from langchain.llms import Cohere
cohere = Cohere(model="gptd-instruct-tft", cohere_api_key="my-api-key")
"""
@ -47,6 +47,10 @@ class Cohere(LLM, BaseModel):
presence_penalty: int = 0
"""Penalizes repeated tokens."""
truncate: Optional[str] = None
"""Specify how the client handles inputs longer than the maximum token
length: Truncate from START, END or NONE"""
cohere_api_key: Optional[str] = None
stop: Optional[List[str]] = None
@ -83,6 +87,7 @@ class Cohere(LLM, BaseModel):
"p": self.p,
"frequency_penalty": self.frequency_penalty,
"presence_penalty": self.presence_penalty,
"truncate": self.truncate,
}
@property

@ -0,0 +1,97 @@
"""Wrapper around DeepInfra APIs."""
from typing import Any, Dict, List, Mapping, Optional
import requests
from pydantic import BaseModel, Extra, root_validator
from langchain.llms.base import LLM
from langchain.llms.utils import enforce_stop_tokens
from langchain.utils import get_from_dict_or_env
DEFAULT_MODEL_ID = "google/flan-t5-xl"
class DeepInfra(LLM, BaseModel):
"""Wrapper around DeepInfra deployed models.
To use, you should have the ``requests`` python package installed, and the
environment variable ``DEEPINFRA_API_TOKEN`` set with your API token, or pass
it as a named parameter to the constructor.
Only supports `text-generation` and `text2text-generation` for now.
Example:
.. code-block:: python
from langchain.llms import DeepInfra
di = DeepInfra(model_id="google/flan-t5-xl",
deepinfra_api_token="my-api-key")
"""
model_id: str = DEFAULT_MODEL_ID
model_kwargs: Optional[dict] = None
deepinfra_api_token: Optional[str] = None
class Config:
"""Configuration for this pydantic object."""
extra = Extra.forbid
@root_validator()
def validate_environment(cls, values: Dict) -> Dict:
"""Validate that api key and python package exists in environment."""
deepinfra_api_token = get_from_dict_or_env(
values, "deepinfra_api_token", "DEEPINFRA_API_TOKEN"
)
values["deepinfra_api_token"] = deepinfra_api_token
return values
@property
def _identifying_params(self) -> Mapping[str, Any]:
"""Get the identifying parameters."""
return {
**{"model_id": self.model_id},
**{"model_kwargs": self.model_kwargs},
}
@property
def _llm_type(self) -> str:
"""Return type of llm."""
return "deepinfra"
def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
"""Call out to DeepInfra's inference API endpoint.
Args:
prompt: The prompt to pass into the model.
stop: Optional list of stop words to use when generating.
Returns:
The string generated by the model.
Example:
.. code-block:: python
response = di("Tell me a joke.")
"""
_model_kwargs = self.model_kwargs or {}
res = requests.post(
f"https://api.deepinfra.com/v1/inference/{self.model_id}",
headers={
"Authorization": f"bearer {self.deepinfra_api_token}",
"Content-Type": "application/json",
},
json={"input": prompt, **_model_kwargs},
)
if res.status_code != 200:
raise ValueError("Error raised by inference API")
text = res.json()[0]["generated_text"]
if stop is not None:
# I believe this is required since the stop tokens
# are not enforced by the model parameters
text = enforce_stop_tokens(text, stop)
return text

@ -18,7 +18,7 @@ class ForefrontAI(LLM, BaseModel):
Example:
.. code-block:: python
from langchain import ForefrontAI
from langchain.llms import ForefrontAI
forefrontai = ForefrontAI(endpoint_url="")
"""

@ -21,7 +21,7 @@ class GooseAI(LLM, BaseModel):
Example:
.. code-block:: python
from langchain import GooseAI
from langchain.llms import GooseAI
gooseai = GooseAI(model_name="gpt-neo-20b")
"""

@ -23,7 +23,7 @@ class HuggingFaceHub(LLM, BaseModel):
Example:
.. code-block:: python
from langchain import HuggingFaceHub
from langchain.llms import HuggingFaceHub
hf = HuggingFaceHub(repo_id="gpt2", huggingfacehub_api_token="my-api-key")
"""

@ -25,14 +25,14 @@ class HuggingFacePipeline(LLM, BaseModel):
Example using from_model_id:
.. code-block:: python
from langchain.llms.huggingface_pipeline import HuggingFacePipeline
from langchain.llms import HuggingFacePipeline
hf = HuggingFacePipeline.from_model_id(
model_id="gpt2", task="text-generation"
)
Example passing pipeline in directly:
.. code-block:: python
from langchain.llms.huggingface_pipeline import HuggingFacePipeline
from langchain.llms import HuggingFacePipeline
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
model_id = "gpt2"

@ -0,0 +1,92 @@
"""Wrapper around Modal API."""
import logging
from typing import Any, Dict, List, Mapping, Optional
import requests
from pydantic import BaseModel, Extra, Field, root_validator
from langchain.llms.base import LLM
from langchain.llms.utils import enforce_stop_tokens
logger = logging.getLogger(__name__)
class Modal(LLM, BaseModel):
"""Wrapper around Modal large language models.
To use, you should have the ``modal-client`` python package installed.
Any parameters that are valid to be passed to the call can be passed
in, even if not explicitly saved on this class.
Example:
.. code-block:: python
from langchain.llms import Modal
modal = Modal(endpoint_url="")
"""
endpoint_url: str = ""
"""model endpoint to use"""
model_kwargs: Dict[str, Any] = Field(default_factory=dict)
"""Holds any model parameters valid for `create` call not
explicitly specified."""
class Config:
"""Configuration for this pydantic config."""
extra = Extra.forbid
@root_validator(pre=True)
def build_extra(cls, values: Dict[str, Any]) -> Dict[str, Any]:
"""Build extra kwargs from additional params that were passed in."""
all_required_field_names = {field.alias for field in cls.__fields__.values()}
extra = values.get("model_kwargs", {})
for field_name in list(values):
if field_name not in all_required_field_names:
if field_name in extra:
raise ValueError(f"Found {field_name} supplied twice.")
logger.warning(
f"""{field_name} was transfered to model_kwargs.
Please confirm that {field_name} is what you intended."""
)
extra[field_name] = values.pop(field_name)
values["model_kwargs"] = extra
return values
@property
def _identifying_params(self) -> Mapping[str, Any]:
"""Get the identifying parameters."""
return {
**{"endpoint_url": self.endpoint_url},
**{"model_kwargs": self.model_kwargs},
}
@property
def _llm_type(self) -> str:
"""Return type of llm."""
return "modal"
def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
"""Call to Modal endpoint."""
params = self.model_kwargs or {}
response = requests.post(
url=self.endpoint_url,
headers={
"Content-Type": "application/json",
},
json={"prompt": prompt, **params},
)
try:
if prompt in response.json()["prompt"]:
response_json = response.json()
except KeyError:
raise ValueError("LangChain requires 'prompt' key in response.")
text = response_json["prompt"]
if stop is not None:
# I believe this is required since the stop tokens
# are not enforced by the model parameters
text = enforce_stop_tokens(text, stop)
return text

@ -16,7 +16,7 @@ class NLPCloud(LLM, BaseModel):
Example:
.. code-block:: python
from langchain import NLPCloud
from langchain.llms import NLPCloud
nlpcloud = NLPCloud(model="gpt-neox-20b")
"""

@ -75,7 +75,7 @@ class BaseOpenAI(BaseLLM, BaseModel):
Example:
.. code-block:: python
from langchain import OpenAI
from langchain.llms import OpenAI
openai = OpenAI(model_name="text-davinci-003")
"""
@ -251,7 +251,9 @@ class BaseOpenAI(BaseLLM, BaseModel):
prompt=_prompts, **params
):
self.callback_manager.on_llm_new_token(
stream_resp["choices"][0]["text"], verbose=self.verbose
stream_resp["choices"][0]["text"],
verbose=self.verbose,
logprobs=stream_resp["choices"][0]["logprobs"],
)
_update_response(response, stream_resp)
choices.extend(response["choices"])
@ -285,11 +287,15 @@ class BaseOpenAI(BaseLLM, BaseModel):
):
if self.callback_manager.is_async:
await self.callback_manager.on_llm_new_token(
stream_resp["choices"][0]["text"], verbose=self.verbose
stream_resp["choices"][0]["text"],
verbose=self.verbose,
logprobs=stream_resp["choices"][0]["logprobs"],
)
else:
self.callback_manager.on_llm_new_token(
stream_resp["choices"][0]["text"], verbose=self.verbose
stream_resp["choices"][0]["text"],
verbose=self.verbose,
logprobs=stream_resp["choices"][0]["logprobs"],
)
_update_response(response, stream_resp)
choices.extend(response["choices"])

@ -22,7 +22,7 @@ class Petals(LLM, BaseModel):
Example:
.. code-block:: python
from langchain import petals
from langchain.llms import petals
petals = Petals()
"""

@ -23,7 +23,7 @@ class PromptLayerOpenAI(OpenAI, BaseModel):
Example:
.. code-block:: python
from langchain import OpenAI
from langchain.llms import OpenAI
openai = OpenAI(model_name="text-davinci-003")
"""

@ -0,0 +1,130 @@
"""Wrapper around StochasticAI APIs."""
import logging
import time
from typing import Any, Dict, List, Mapping, Optional
import requests
from pydantic import BaseModel, Extra, Field, root_validator
from langchain.llms.base import LLM
from langchain.llms.utils import enforce_stop_tokens
from langchain.utils import get_from_dict_or_env
logger = logging.getLogger(__name__)
class StochasticAI(LLM, BaseModel):
"""Wrapper around StochasticAI large language models.
To use, you should have the environment variable ``STOCHASTICAI_API_KEY``
set with your API key.
Example:
.. code-block:: python
from langchain.llms import StochasticAI
stochasticai = StochasticAI(api_url="")
"""
api_url: str = ""
"""Model name to use."""
model_kwargs: Dict[str, Any] = Field(default_factory=dict)
"""Holds any model parameters valid for `create` call not
explicitly specified."""
stochasticai_api_key: Optional[str] = None
class Config:
"""Configuration for this pydantic object."""
extra = Extra.forbid
@root_validator(pre=True)
def build_extra(cls, values: Dict[str, Any]) -> Dict[str, Any]:
"""Build extra kwargs from additional params that were passed in."""
all_required_field_names = {field.alias for field in cls.__fields__.values()}
extra = values.get("model_kwargs", {})
for field_name in list(values):
if field_name not in all_required_field_names:
if field_name in extra:
raise ValueError(f"Found {field_name} supplied twice.")
logger.warning(
f"""{field_name} was transfered to model_kwargs.
Please confirm that {field_name} is what you intended."""
)
extra[field_name] = values.pop(field_name)
values["model_kwargs"] = extra
return values
@root_validator()
def validate_environment(cls, values: Dict) -> Dict:
"""Validate that api key exists in environment."""
stochasticai_api_key = get_from_dict_or_env(
values, "stochasticai_api_key", "STOCHASTICAI_API_KEY"
)
values["stochasticai_api_key"] = stochasticai_api_key
return values
@property
def _identifying_params(self) -> Mapping[str, Any]:
"""Get the identifying parameters."""
return {
**{"endpoint_url": self.api_url},
**{"model_kwargs": self.model_kwargs},
}
@property
def _llm_type(self) -> str:
"""Return type of llm."""
return "stochasticai"
def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
"""Call out to StochasticAI's complete endpoint.
Args:
prompt: The prompt to pass into the model.
stop: Optional list of stop words to use when generating.
Returns:
The string generated by the model.
Example:
.. code-block:: python
response = StochasticAI("Tell me a joke.")
"""
params = self.model_kwargs or {}
response_post = requests.post(
url=self.api_url,
json={"prompt": prompt, "params": params},
headers={
"apiKey": f"{self.stochasticai_api_key}",
"Accept": "application/json",
"Content-Type": "application/json",
},
)
response_post.raise_for_status()
response_post_json = response_post.json()
completed = False
while not completed:
response_get = requests.get(
url=response_post_json["data"]["responseUrl"],
headers={
"apiKey": f"{self.stochasticai_api_key}",
"Accept": "application/json",
"Content-Type": "application/json",
},
)
response_get.raise_for_status()
response_get_json = response_get.json()["data"]
text = response_get_json.get("completion")
completed = text is not None
time.sleep(0.5)
text = text[0]
if stop is not None:
# I believe this is required since the stop tokens
# are not enforced by the model parameters
text = enforce_stop_tokens(text, stop)
return text

@ -0,0 +1,155 @@
"""Wrapper around Writer APIs."""
from typing import Any, Dict, List, Mapping, Optional
import requests
from pydantic import BaseModel, Extra, root_validator
from langchain.llms.base import LLM
from langchain.llms.utils import enforce_stop_tokens
from langchain.utils import get_from_dict_or_env
class Writer(LLM, BaseModel):
"""Wrapper around Writer large language models.
To use, you should have the environment variable ``WRITER_API_KEY``
set with your API key.
Example:
.. code-block:: python
from langchain import Writer
writer = Writer(model_id="palmyra-base")
"""
model_id: str = "palmyra-base"
"""Model name to use."""
tokens_to_generate: int = 24
"""Max number of tokens to generate."""
logprobs: bool = False
"""Whether to return log probabilities."""
temperature: float = 1.0
"""What sampling temperature to use."""
length: int = 256
"""The maximum number of tokens to generate in the completion."""
top_p: float = 1.0
"""Total probability mass of tokens to consider at each step."""
top_k: int = 1
"""The number of highest probability vocabulary tokens to
keep for top-k-filtering."""
repetition_penalty: float = 1.0
"""Penalizes repeated tokens according to frequency."""
random_seed: int = 0
"""The model generates random results.
Changing the random seed alone will produce a different response
with similar characteristics. It is possible to reproduce results
by fixing the random seed (assuming all other hyperparameters
are also fixed)"""
beam_search_diversity_rate: float = 1.0
"""Only applies to beam search, i.e. when the beam width is >1.
A higher value encourages beam search to return a more diverse
set of candidates"""
beam_width: Optional[int] = None
"""The number of concurrent candidates to keep track of during
beam search"""
length_pentaly: float = 1.0
"""Only applies to beam search, i.e. when the beam width is >1.
Larger values penalize long candidates more heavily, thus preferring
shorter candidates"""
writer_api_key: Optional[str] = None
stop: Optional[List[str]] = None
"""Sequences when completion generation will stop"""
base_url: Optional[str] = None
"""Base url to use, if None decides based on model name."""
class Config:
"""Configuration for this pydantic object."""
extra = Extra.forbid
@root_validator()
def validate_environment(cls, values: Dict) -> Dict:
"""Validate that api key exists in environment."""
writer_api_key = get_from_dict_or_env(
values, "writer_api_key", "WRITER_API_KEY"
)
values["writer_api_key"] = writer_api_key
return values
@property
def _default_params(self) -> Mapping[str, Any]:
"""Get the default parameters for calling Writer API."""
return {
"tokens_to_generate": self.tokens_to_generate,
"stop": self.stop,
"logprobs": self.logprobs,
"temperature": self.temperature,
"top_p": self.top_p,
"top_k": self.top_k,
"repetition_penalty": self.repetition_penalty,
"random_seed": self.random_seed,
"beam_search_diversity_rate": self.beam_search_diversity_rate,
"beam_width": self.beam_width,
"length_pentaly": self.length_pentaly,
}
@property
def _identifying_params(self) -> Mapping[str, Any]:
"""Get the identifying parameters."""
return {**{"model_id": self.model_id}, **self._default_params}
@property
def _llm_type(self) -> str:
"""Return type of llm."""
return "writer"
def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
"""Call out to Writer's complete endpoint.
Args:
prompt: The prompt to pass into the model.
stop: Optional list of stop words to use when generating.
Returns:
The string generated by the model.
Example:
.. code-block:: python
response = Writer("Tell me a joke.")
"""
if self.base_url is not None:
base_url = self.base_url
else:
base_url = (
"https://api.llm.writer.com/v1/models/{self.model_id}/completions"
)
response = requests.post(
url=base_url,
headers={
"Authorization": f"Bearer {self.writer_api_key}",
"Content-Type": "application/json",
"Accept": "application/json",
},
json={"prompt": prompt, **self._default_params},
)
text = response.text
if stop is not None:
# I believe this is required since the stop tokens
# are not enforced by the model parameters
text = enforce_stop_tokens(text, stop)
return text

@ -1,12 +1,14 @@
"""BasePrompt schema definition."""
from __future__ import annotations
import json
import re
from abc import ABC, abstractmethod
from pathlib import Path
from typing import Any, Callable, Dict, List, Optional, Union
from typing import Any, Callable, Dict, List, Mapping, Optional, Union
import yaml
from pydantic import BaseModel, Extra, root_validator
from pydantic import BaseModel, Extra, Field, root_validator
from langchain.formatting import formatter
@ -117,6 +119,9 @@ class BasePromptTemplate(BaseModel, ABC):
"""A list of the names of the variables the prompt template expects."""
output_parser: Optional[BaseOutputParser] = None
"""How to parse the output of calling an LLM on this formatted prompt."""
partial_variables: Mapping[str, Union[str, Callable[[], str]]] = Field(
default_factory=dict
)
class Config:
"""Configuration for this pydantic object."""
@ -132,8 +137,38 @@ class BasePromptTemplate(BaseModel, ABC):
"Cannot have an input variable named 'stop', as it is used internally,"
" please rename."
)
if "stop" in values["partial_variables"]:
raise ValueError(
"Cannot have an partial variable named 'stop', as it is used "
"internally, please rename."
)
overall = set(values["input_variables"]).intersection(
values["partial_variables"]
)
if overall:
raise ValueError(
f"Found overlapping input and partial variables: {overall}"
)
return values
def partial(self, **kwargs: Union[str, Callable[[], str]]) -> BasePromptTemplate:
"""Return a partial of the prompt template."""
prompt_dict = self.__dict__.copy()
prompt_dict["input_variables"] = list(
set(self.input_variables).difference(kwargs)
)
prompt_dict["partial_variables"] = {**self.partial_variables, **kwargs}
return type(self)(**prompt_dict)
def _merge_partial_and_user_variables(self, **kwargs: Any) -> Dict[str, Any]:
# Get partial params:
partial_kwargs = {
k: v if isinstance(v, str) else v()
for k, v in self.partial_variables.items()
}
return {**partial_kwargs, **kwargs}
@abstractmethod
def format(self, **kwargs: Any) -> str:
"""Format the prompt with the inputs.
@ -173,6 +208,8 @@ class BasePromptTemplate(BaseModel, ABC):
prompt.save(file_path="path/prompt.yaml")
"""
if self.partial_variables:
raise ValueError("Cannot save prompt with partial variables.")
# Convert file to Path object.
if isinstance(file_path, str):
save_path = Path(file_path)

@ -8,6 +8,10 @@ from langchain.prompts.example_selector.base import BaseExampleSelector
from langchain.prompts.prompt import PromptTemplate
def _get_length_based(text: str) -> int:
return len(re.split("\n| ", text))
class LengthBasedExampleSelector(BaseExampleSelector, BaseModel):
"""Select examples based on length."""
@ -17,7 +21,7 @@ class LengthBasedExampleSelector(BaseExampleSelector, BaseModel):
example_prompt: PromptTemplate
"""Prompt template used to format the examples."""
get_text_length: Callable[[str], int] = lambda x: len(re.split("\n| ", x))
get_text_length: Callable[[str], int] = _get_length_based
"""Function to measure prompt length. Defaults to word count."""
max_length: int = 2048

Some files were not shown because too many files have changed in this diff Show More

Loading…
Cancel
Save