langchain-docker readme

linting in docker and parallel make jobs
- linting can be run in docker in parallel with `make -j4 docker.lint`
2023-03-03 22:55:44 +01:00 · 2023-03-03 22:55:44 +01:00 · 2023-03-03 22:55:44 +01:00 · 2023-03-03 22:55:44 +01:00 · 2023-03-03 22:55:44 +01:00 · 2023-03-03 22:55:44 +01:00
40 changed files with 2499 additions and 39 deletions
--- a/.dockerignore
+++ b/.dockerignore
@ -0,0 +1,144 @@
 .vscode/
 .idea/
 # Byte-compiled / optimized / DLL files
 __pycache__/
 *.py[cod]
 *$py.class
 # C extensions
 *.so
 # Distribution / packaging
 .Python
 build/
 develop-eggs/
 dist/
 downloads/
 eggs/
 .eggs/
 lib/
 lib64/
 parts/
 sdist/
 var/
 wheels/
 pip-wheel-metadata/
 share/python-wheels/
 *.egg-info/
 .installed.cfg
 *.egg
 MANIFEST
 # PyInstaller
 #  Usually these files are written by a python script from a template
 #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 *.manifest
 *.spec
 # Installer logs
 pip-log.txt
 pip-delete-this-directory.txt
 # Unit test / coverage reports
 htmlcov/
 .tox/
 .nox/
 .coverage
 .coverage.*
 .cache
 nosetests.xml
 coverage.xml
 *.cover
 *.py,cover
 .hypothesis/
 .pytest_cache/
 # Translations
 *.mo
 *.pot
 # Django stuff:
 *.log
 local_settings.py
 db.sqlite3
 db.sqlite3-journal
 # Flask stuff:
 instance/
 .webassets-cache
 # Scrapy stuff:
 .scrapy
 # Sphinx documentation
 docs/_build/
 # PyBuilder
 target/
 # Jupyter Notebook
 .ipynb_checkpoints
 notebooks/
 # IPython
 profile_default/
 ipython_config.py
 # pyenv
 .python-version
 # pipenv
 #   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
 #   However, in case of collaboration, if having platform-specific dependencies or dependencies
 #   having no cross-platform support, pipenv may install dependencies that don't work, or not
 #   install all needed dependencies.
 #Pipfile.lock
 # PEP 582; used by e.g. github.com/David-OConnor/pyflow
 __pypackages__/
 # Celery stuff
 celerybeat-schedule
 celerybeat.pid
 # SageMath parsed files
 *.sage.py
 # Environments
 .env
 .venv
 .venvs
 env/
 venv/
 ENV/
 env.bak/
 venv.bak/
 # Spyder project settings
 .spyderproject
 .spyproject
 # Rope project settings
 .ropeproject
 # mkdocs documentation
 /site
 # mypy
 .mypy_cache/
 .dmypy.json
 dmypy.json
 # Pyre type checker
 .pyre/
 # macOS display setting files
 .DS_Store
 # docker
 docker/
 !docker/assets/
 .dockerignore
 docker.build
--- a/.gitignore
+++ b/.gitignore
@ -106,6 +106,7 @@ celerybeat.pid
 # Environments
 .env
 !docker/.env
 .venv
 .venvs
 env/
@ -134,3 +135,4 @@ dmypy.json
 # macOS display setting files
 .DS_Store
 docker.build
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@ -151,6 +151,10 @@ poetry run jupyter notebook
 When you run `poetry install`, the `langchain` package is installed as editable in the virtualenv, so your new logic can be imported into the notebook.
 ## Using Docker
 Refer to [DOCKER.md](docker/DOCKER.md) for more information.
 ## Documentation
 ### Contribute Documentation
--- a/24
+++ b/24
@ -1,5 +1,8 @@
 .PHONY: all clean format lint test tests test_watch integration_tests help
 GIT_HASH ?= $(shell git rev-parse --short HEAD)
 LANGCHAIN_VERSION := $(shell grep '^version' pyproject.toml | cut -d '=' -f2 | tr -d '"')
 all: help
 coverage:
@ -31,8 +34,7 @@ lint:
 test:
 	poetry run pytest tests/unit_tests
-tests:
+tests: test
 	poetry run pytest tests/unit_tests
 test_watch:
 	poetry run ptw --now . -- tests/unit_tests
@ -46,8 +48,26 @@ help:
 	@echo 'docs_build          - build the documentation'
 	@echo 'docs_clean          - clean the documentation build artifacts'
 	@echo 'docs_linkcheck      - run linkchecker on the documentation'
 ifneq ($(shell command -v docker 2> /dev/null),)
 	@echo 'docker              - build and run the docker dev image'
 	@echo 'docker.run          - run the docker dev image'
 	@echo 'docker.jupyter      - start a jupyter notebook inside container'
 	@echo 'docker.build        - build the docker dev image'
 	@echo 'docker.force_build  - force a rebuild'
 	@echo 'docker.test         - run the unit tests in docker'
 	@echo 'docker.lint         - run the linters in docker'
 	@echo 'docker.clean        - remove the docker dev image'
 endif
 	@echo 'format              - run code formatters'
 	@echo 'lint                - run linters'
 	@echo 'test                - run unit tests'
 	@echo 'test_watch          - run unit tests in watch mode'
 	@echo 'integration_tests   - run integration tests'
 # include the following makefile if the docker executable is available
 ifeq ($(shell command -v docker 2> /dev/null),)
 	$(info Docker not found, skipping docker-related targets)
 else
 include docker/Makefile
 endif
--- a/README.md
+++ b/README.md
@ -1,11 +1,15 @@
-# 🦜️🔗 LangChain
+# 🦜️🔗 LangChain - Docker
-⚡ Building applications with LLMs through composability ⚡
+WIP: This is a fork of langchain focused on implementing a docker warpper and
 toolchain. The goal is to make it easy to use LLM chains running inside a
 container, build custom docker based tools and let agents run arbitrary
 untrusted code inside.
-[![lint](https://github.com/hwchase17/langchain/actions/workflows/lint.yml/badge.svg)](https://github.com/hwchase17/langchain/actions/workflows/lint.yml) [![test](https://github.com/hwchase17/langchain/actions/workflows/test.yml/badge.svg)](https://github.com/hwchase17/langchain/actions/workflows/test.yml) [![linkcheck](https://github.com/hwchase17/langchain/actions/workflows/linkcheck.yml/badge.svg)](https://github.com/hwchase17/langchain/actions/workflows/linkcheck.yml) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![Twitter](https://img.shields.io/twitter/url/https/twitter.com/langchainai.svg?style=social&label=Follow%20%40LangChainAI)](https://twitter.com/langchainai) [![](https://dcbadge.vercel.app/api/server/6adMQxSpJS?compact=true&style=flat)](https://discord.gg/6adMQxSpJS)
+Currently exploring the following:
-**Production Support:** As you move your LangChains into production, we'd love to offer more comprehensive support.
+-  Docker wrapper for LLMs and chains
-Please fill out [this form](https://forms.gle/57d8AmXBYp8PP8tZA) and we'll set up a dedicated support Slack channel.
+-  Creating a toolchain for building docker based LLM tools.
 -  Building agents that can run arbitrary untrusted code inside a container.
 ## Quick Install
--- a/docker/.env
+++ b/docker/.env
@ -0,0 +1,13 @@
 # python env
 PYTHON_VERSION=3.10
 # -E flag is required
 # comment the following line to only install dev dependencies
 POETRY_EXTRA_PACKAGES="-E all"
 # at least one group needed
 POETRY_DEPENDENCIES="dev,test,lint,typing"
 # langchain env. warning: these variables will be baked into the docker image !
 OPENAI_API_KEY=${OPENAI_API_KEY:-}
 SERPAPI_API_KEY=${SERPAPI_API_KEY:-}
--- a/docker/DOCKER.md
+++ b/docker/DOCKER.md
@ -0,0 +1,53 @@
 # Using Docker
 To quickly get started, run the command `make docker`.
 If docker is installed the Makefile will export extra targets in the fomrat `docker.*` to build and run the docker image. Type `make` for a list of available tasks.
 There is a basic `docker-compose.yml` in the docker directory.
 ## Building the development image
 Using `make docker` will build the dev image if it does not exist, then drops
 you inside the container with the langchain environment available in the shell.
 ### Customizing the image and installed dependencies
 The image is built with a default python version and all extras and dev
 dependencies. It can be customized by changing the variables in the [.env](/docker/.env)
 file. 
 If you don't need all the `extra` dependencies a slimmer image can be obtained by 
 commenting out `POETRY_EXTRA_PACKAGES` in the [.env](docker/.env) file.
 ### Image caching
 The Dockerfile is optimized to cache the poetry install step. A rebuild is triggered when there a change to the source code.
 ## Example Usage
 All commands from langchain's python environment are available by default in the container.
 A few examples:
 ```bash
 # run jupyter notebook
 docker run --rm -it IMG jupyter notebook
 # run ipython
 docker run --rm -it IMG ipython
 # start web server
 docker run --rm -p 8888:8888 IMG python -m http.server 8888
 ```
 ## Testing / Linting
 Tests and lints are run using your local source directory that is mounted on the volume /src.
 Run unit tests in the container with `make docker.test`.
 Run the linting and formatting checks with `make docker.lint`.
 Note: this task can run in parallel using `make -j4 docker.lint`.
--- a/docker/Dockerfile
+++ b/docker/Dockerfile
@ -0,0 +1,104 @@
 # vim: ft=dockerfile
 #
 # see also: https://github.com/python-poetry/poetry/discussions/1879
 #   - with https://github.com/bneijt/poetry-lock-docker
 # see https://github.com/thehale/docker-python-poetry
 # see https://github.com/max-pfeiffer/uvicorn-poetry
 # use by default the slim version of python
 ARG PYTHON_IMAGE_TAG=slim 
 ARG PYTHON_VERSION=${PYTHON_VERSION:-3.11.2}
 ####################
 # Base Environment
 ####################
 FROM python:$PYTHON_VERSION-$PYTHON_IMAGE_TAG AS lchain-base
 ARG UID=1000
 ARG USERNAME=lchain
 ENV USERNAME=$USERNAME
 RUN groupadd -g ${UID} $USERNAME
 RUN useradd -l -m -u ${UID} -g ${UID} $USERNAME
 # used for mounting source code
 RUN mkdir /src
 VOLUME /src
 #######################
 ## Poetry Builder Image
 #######################
 FROM lchain-base AS lchain-base-builder
 ARG POETRY_EXTRA_PACKAGES=$POETRY_EXTRA_PACKAGES
 ARG POETRY_DEPENDENCIES=$POETRY_DEPENDENCIES
 ENV HOME=/root
 ENV POETRY_HOME=/root/.poetry
 ENV POETRY_VIRTUALENVS_IN_PROJECT=false
 ENV POETRY_NO_INTERACTION=1
 ENV CACHE_DIR=$HOME/.cache
 ENV POETRY_CACHE_DIR=$CACHE_DIR/pypoetry
 ENV PATH="$POETRY_HOME/bin:$PATH"
 WORKDIR /root
 RUN apt-get update && \
    apt-get install -y \
    build-essential \
    git \
    curl
 SHELL ["/bin/bash", "-o", "pipefail", "-c"]
 RUN mkdir -p $CACHE_DIR
 ## setup poetry
 RUN curl -sSL -o $CACHE_DIR/pypoetry-installer.py https://install.python-poetry.org/
 RUN python3 $CACHE_DIR/pypoetry-installer.py
 # # Copy poetry files
 COPY poetry.* pyproject.toml ./
 RUN mkdir /pip-prefix
 RUN poetry export $POETRY_EXTRA_PACKAGES --with $POETRY_DEPENDENCIES -f requirements.txt --output requirements.txt --without-hashes && \
    pip install --no-cache-dir --disable-pip-version-check --prefix /pip-prefix -r requirements.txt
 # add custom motd message
 COPY docker/assets/etc/motd /tmp/motd
 RUN cat /tmp/motd > /etc/motd
 RUN printf "\n%s\n%s\n" "$(poetry version)" "$(python --version)" >> /etc/motd
 ###################
 ## Runtime Image
 ###################
 FROM lchain-base AS lchain
 #jupyter port
 EXPOSE 8888
 COPY docker/assets/entry.sh /entry
 RUN chmod +x /entry
 COPY --from=lchain-base-builder /etc/motd /etc/motd
 COPY --from=lchain-base-builder /usr/bin/git /usr/bin/git
 USER ${USERNAME:-lchain}
 ENV HOME /home/$USERNAME
 WORKDIR /home/$USERNAME
 COPY --chown=lchain:lchain --from=lchain-base-builder /pip-prefix $HOME/.local/
 COPY . .
 SHELL ["/bin/bash", "-o", "pipefail", "-c"]
 RUN pip install --no-deps --disable-pip-version-check --no-cache-dir -e .
 entrypoint ["/entry"]
--- a/docker/Makefile
+++ b/docker/Makefile
@ -0,0 +1,84 @@
 #do not call this makefile it is included in the main Makefile
 .PHONY: docker docker.jupyter docker.run docker.force_build docker.clean \
 	docker.test docker.lint docker.lint.mypy docker.lint.black \
 	docker.lint.isort docker.lint.flake
 # read python version from .env file ignoring comments
 PYTHON_VERSION := $(shell grep PYTHON_VERSION docker/.env | cut -d '=' -f2)
 POETRY_EXTRA_PACKAGES := $(shell grep '^[^#]*POETRY_EXTRA_PACKAGES' docker/.env | cut -d '=' -f2)
 POETRY_DEPENDENCIES := $(shell grep 'POETRY_DEPENDENCIES' docker/.env | cut -d '=' -f2)
 DOCKER_SRC := $(shell find docker -type f)
 DOCKER_IMAGE_NAME = langchain/dev
 # SRC is all files matched by the git ls-files command
 SRC := $(shell git ls-files -- '*' ':!:docker/*')
 # set DOCKER_BUILD_PROGRESS=plain to see detailed build progress
 DOCKER_BUILD_PROGRESS ?= auto
 # extra message to show when entering the docker container
 DOCKER_MOTD := docker/assets/etc/motd
 ROOTDIR := $(shell git rev-parse --show-toplevel)
 DOCKER_LINT_CMD = docker run --rm -i -u lchain -v $(ROOTDIR):/src  $(DOCKER_IMAGE_NAME):$(GIT_HASH)
 docker: docker.run
 docker.run: docker.build
 	@echo "Docker image: $(DOCKER_IMAGE_NAME):$(GIT_HASH)"
 	docker run --rm -it -u lchain -v $(ROOTDIR):/src  $(DOCKER_IMAGE_NAME):$(GIT_HASH)
 docker.jupyter: docker.build
 	docker run --rm -it -v $(ROOTDIR):/src  $(DOCKER_IMAGE_NAME):$(GIT_HASH) jupyter notebook
 docker.build: $(SRC) $(DOCKER_SRC) $(DOCKER_MOTD)
 ifdef $(DOCKER_BUILDKIT)
 	docker buildx build --build-arg PYTHON_VERSION=$(PYTHON_VERSION) \
 			--build-arg POETRY_EXTRA_PACKAGES=$(POETRY_EXTRA_PACKAGES) \
 			--build-arg POETRY_DEPENDENCIES=$(POETRY_DEPENDENCIES) \
 			--progress=$(DOCKER_BUILD_PROGRESS) \
 			$(BUILD_FLAGS) -f docker/Dockerfile -t $(DOCKER_IMAGE_NAME):$(GIT_HASH) .
 else
 	docker build --build-arg PYTHON_VERSION=$(PYTHON_VERSION) \
 			--build-arg POETRY_EXTRA_PACKAGES=$(POETRY_EXTRA_PACKAGES) \
 			--build-arg POETRY_DEPENDENCIES=$(POETRY_DEPENDENCIES) \
 			$(BUILD_FLAGS) -f docker/Dockerfile -t $(DOCKER_IMAGE_NAME):$(GIT_HASH) .
 endif
 	docker tag $(DOCKER_IMAGE_NAME):$(GIT_HASH) $(DOCKER_IMAGE_NAME):latest
 	@touch $@ # this prevents docker from rebuilding dependencies that have not 
 	@         #  changed. Remove the file `docker/docker.build` to force a rebuild.
 docker.force_build: $(DOCKER_SRC)
 	@rm -f docker.build
 	@$(MAKE) docker.build BUILD_FLAGS=--no-cache
 docker.clean:
 	docker rmi $(DOCKER_IMAGE_NAME):$(GIT_HASH) $(DOCKER_IMAGE_NAME):latest
 docker.test: docker.build
 	docker run --rm -it -u lchain -v $(ROOTDIR):/src  $(DOCKER_IMAGE_NAME):$(GIT_HASH) \
 		pytest /src/tests/unit_tests
 # this assumes that the docker image has been built 
 docker.lint: docker.lint.mypy docker.lint.black docker.lint.isort \
 	docker.lint.flake
 # these can run in parallel with -j[njobs]
 docker.lint.mypy:
 	@$(DOCKER_LINT_CMD) mypy /src
 	@printf "\t%s\n" "mypy ... "
 docker.lint.black:
 	@$(DOCKER_LINT_CMD) black /src --check
 	@printf "\t%s\n" "black ... "
 docker.lint.isort:
 	@$(DOCKER_LINT_CMD) isort /src --check
 	@printf "\t%s\n" "isort ... "
 docker.lint.flake:
 	@$(DOCKER_LINT_CMD) flake8 /src
 	@printf "\t%s\n" "flake8 ... "
--- a/docker/assets/entry.sh
+++ b/docker/assets/entry.sh
@ -0,0 +1,10 @@
 #!/usr/bin/env bash
 export PATH=$HOME/.local/bin:$PATH
 if [ -z "$1" ]; then
    cat /etc/motd
    exec /bin/bash
 fi
 exec "$@"
--- a/docker/assets/etc/motd
+++ b/docker/assets/etc/motd
@ -0,0 +1,8 @@
 All dependencies have been installed in the current shell. There is no
 virtualenv or a need for `poetry` inside the container.
 Running the command `make docker.run` at the root directory of the project will
 build the container the first time. On the next runs it will use the cached
 image. A rebuild will happen when changes are made to the source code.
 You local source directory has been mounted to the /src directory.
--- a/docker/docker-compose.yml
+++ b/docker/docker-compose.yml
@ -0,0 +1,17 @@
 version: "3.7"
 services:
  langchain:
    hostname: langchain
    image: langchain/dev:latest
    build:
      context: ../
      dockerfile: docker/Dockerfile
      args:
        PYTHON_VERSION: ${PYTHON_VERSION}
        POETRY_EXTRA_PACKAGES: ${POETRY_EXTRA_PACKAGES}
        POETRY_DEPENDENCIES: ${POETRY_DEPENDENCIES}
    restart: unless-stopped
    ports:
      - 127.0.0.1:8888:8888
--- a/docs/modules/agents.rst
+++ b/docs/modules/agents.rst
@ -2,7 +2,7 @@ Agents
 ==========================
 Some applications will require not just a predetermined chain of calls to LLMs/other tools,
-but potentially an unknown chain that depends on the user input.
+but potentially an unknown chain that depends on the user's input.
 In these types of chains, there is a “agent” which has access to a suite of tools.
 Depending on the user input, the agent can then decide which, if any, of these tools to call.
@ -12,7 +12,7 @@ The following sections of documentation are provided:
 - `Key Concepts <./agents/key_concepts.html>`_: A conceptual guide going over the various concepts related to agents.
- `How-To Guides <./agents/how_to_guides.html>`_: A collection of how-to guides. These highlight how to integrate various types of tools, how to work with different types of agent, and how to customize agents.
+- `How-To Guides <./agents/how_to_guides.html>`_: A collection of how-to guides. These highlight how to integrate various types of tools, how to work with different types of agents, and how to customize agents.
 - `Reference <../reference/modules/agents.html>`_: API reference documentation for all Agent classes.
--- a/docs/modules/agents/agents.md
+++ b/docs/modules/agents/agents.md
@ -1,7 +1,7 @@
 # Agents
 Agents use an LLM to determine which actions to take and in what order.
-An action can either be using a tool and observing its output, or returning to the user.
+An action can either be using a tool and observing its output, or returning a response to the user.
 For a list of easily loadable tools, see [here](tools.md).
 Here are the agents available in LangChain.
--- a/docs/modules/chains.rst
+++ b/docs/modules/chains.rst
@ -3,7 +3,7 @@ Chains
 Using an LLM in isolation is fine for some simple applications,
 but many more complex ones require chaining LLMs - either with each other or with other experts.
-LangChain provides a standard interface for Chains, as well as some common implementations of chains for easy use.
+LangChain provides a standard interface for Chains, as well as some common implementations of chains for ease of use.
 The following sections of documentation are provided:
--- a/docs/modules/chains/getting_started.ipynb
+++ b/docs/modules/chains/getting_started.ipynb
@ -9,13 +9,13 @@
    "In this tutorial, we will learn about creating simple chains in LangChain. We will learn how to create a chain, add components to it, and run it.\n",
    "\n",
    "In this tutorial, we will cover:\n",
-    "- Using the simple LLM chain\n",
+    "- Using a simple LLM chain\n",
    "- Creating sequential chains\n",
    "- Creating a custom chain\n",
    "\n",
    "## Why do we need chains?\n",
    "\n",
-    "Chains allow us to combine multiple components together to create a single, coherent application. For example, we can create a chain that takes user input, format it with a PromptTemplate, and then passes the formatted response to an LLM. We can build more complex chains by combining multiple chains together, or by combining chains with other components.\n"
+    "Chains allow us to combine multiple components together to create a single, coherent application. For example, we can create a chain that takes user input, formats it with a PromptTemplate, and then passes the formatted response to an LLM. We can build more complex chains by combining multiple chains together, or by combining chains with other components.\n"
   ]
  },
  {
@ -88,7 +88,7 @@
   "source": [
    "## Combine chains with the `SequentialChain`\n",
    "\n",
-    "The next step after calling a language model is make a series of calls to a language model. We can do this using sequential chains, which are chains that execute their links in a predefined order. Specifically, we will use the `SimpleSequentialChain`. This is the simplest form of sequential chains, where each step has a singular input/output, and the output of one step is the input to the next.\n",
+    "The next step after calling a language model is to make a series of calls to a language model. We can do this using sequential chains, which are chains that execute their links in a predefined order. Specifically, we will use the `SimpleSequentialChain`. This is the simplest type of a sequential chain, where each step has a single input/output, and the output of one step is the input to the next.\n",
    "\n",
    "In this tutorial, our sequential chain will:\n",
    "1. First, create a company name for a product. We will reuse the `LLMChain` we'd previously initialized to create this company name.\n",
@ -156,7 +156,7 @@
   "source": [
    "## Create a custom chain with the `Chain` class\n",
    "\n",
-    "LangChain provides many chains out of the box, but sometimes you may want to create a custom chains for your specific use case. For this example, we will create a custom chain that concatenates the outputs of 2 `LLMChain`s.\n",
+    "LangChain provides many chains out of the box, but sometimes you may want to create a custom chain for your specific use case. For this example, we will create a custom chain that concatenates the outputs of 2 `LLMChain`s.\n",
    "\n",
    "In order to create a custom chain:\n",
    "1. Start by subclassing the `Chain` class,\n",
--- a/docs/modules/document_loaders/examples/ifixit.ipynb
+++ b/docs/modules/document_loaders/examples/ifixit.ipynb
--- a/docs/modules/document_loaders/examples/image.ipynb
+++ b/docs/modules/document_loaders/examples/image.ipynb
@ -0,0 +1,145 @@
 {
 "cells": [
  {
   "cell_type": "markdown",
   "id": "f70e6118",
   "metadata": {},
   "source": [
    "# Images\n",
    "\n",
    "This covers how to load images such as JPGs PNGs into a document format that we can use downstream."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "09d64998",
   "metadata": {},
   "source": [
    "## Using Unstructured"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "0cc0cd42",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.document_loaders.image import UnstructuredImageLoader"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "082d557c",
   "metadata": {},
   "outputs": [],
   "source": [
    "loader = UnstructuredImageLoader(\"layout-parser-paper-fast.jpg\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "df11c953",
   "metadata": {},
   "outputs": [],
   "source": [
    "data = loader.load()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "4284d44c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Document(page_content=\"LayoutParser: A Unified Toolkit for Deep\\nLearning Based Document Image Analysis\\n\\n\\n‘Zxjiang Shen' (F3}, Ruochen Zhang”, Melissa Dell*, Benjamin Charles Germain\\nLeet, Jacob Carlson, and Weining LiF\\n\\n\\nsugehen\\n\\nshangthrows, et\\n\\n“Abstract. Recent advanocs in document image analysis (DIA) have been\\n‘pimarliy driven bythe application of neural networks dell roar\\n{uteomer could be aly deployed in production and extended fo farther\\n[nvetigtion. However, various factory ke lcely organize codebanee\\nsnd sophisticated modal cnigurations compat the ey ree of\\n‘erin! innovation by wide sence, Though there have been sng\\n‘Hors to improve reuablty and simplify deep lees (DL) mode\\n‘aon, sone of them ae optimized for challenge inthe demain of DIA,\\nThis roprscte a major gap in the extng fol, sw DIA i eal to\\nscademic research acon wie range of dpi in the social ssencee\\n[rary for streamlining the sage of DL in DIA research and appicn\\n‘tons The core LayoutFaraer brary comes with a sch of simple and\\nIntative interfaee or applying and eutomiing DI. odel fr Inyo de\\npltfom for sharing both protrined modes an fal document dist\\n{ation pipeline We demonutate that LayootPareer shea fr both\\nlightweight and lrgeseledgtieation pipelines in eal-word uae ces\\nThe leary pblely smal at Btspe://layost-pareergsthab So\\n\\n\\n\\n‘Keywords: Document Image Analysis» Deep Learning Layout Analysis\\n‘Character Renguition - Open Serres dary « Tol\\n\\n\\nIntroduction\\n\\n\\n‘Deep Learning(DL)-based approaches are the state-of-the-art for a wide range of\\ndoctiment image analysis (DIA) tea including document image clasiffeation [I]\\n\", lookup_str='', metadata={'source': 'layout-parser-paper-fast.jpg'}, lookup_index=0)"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "09957371",
   "metadata": {},
   "source": [
    "### Retain Elements\n",
    "\n",
    "Under the hood, Unstructured creates different \"elements\" for different chunks of text. By default we combine those together, but you can easily keep that separation by specifying `mode=\"elements\"`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "0fab833b",
   "metadata": {},
   "outputs": [],
   "source": [
    "loader = UnstructuredImageLoader(\"layout-parser-paper-fast.jpg\", mode=\"elements\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "c3e8ff1b",
   "metadata": {},
   "outputs": [],
   "source": [
    "data = loader.load()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "43c23d2d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Document(page_content='LayoutParser: A Unified Toolkit for Deep\\nLearning Based Document Image Analysis\\n', lookup_str='', metadata={'source': 'layout-parser-paper-fast.jpg', 'filename': 'layout-parser-paper-fast.jpg', 'page_number': 1, 'category': 'Title'}, lookup_index=0)"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data[0]"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
 }
--- a/docs/modules/document_loaders/how_to_guides.rst
+++ b/docs/modules/document_loaders/how_to_guides.rst
@ -59,6 +59,8 @@ There are a lot of different document loaders that LangChain supports. Below are
 `CoNLL-U <./examples/CoNLL-U.html>`_: A walkthrough of how to load data from a ConLL-U file.
 `iFixit <./examples/ifixit.html>`_: A walkthrough of how to search and load data like guides, technical Q&A's, and device wikis from iFixit.com
 .. toctree::
   :maxdepth: 1
   :glob:
--- a/docs/modules/prompts/examples/partial.ipynb
+++ b/docs/modules/prompts/examples/partial.ipynb
@ -0,0 +1,184 @@
 {
 "cells": [
  {
   "cell_type": "markdown",
   "id": "9355a547",
   "metadata": {},
   "source": [
    "# Partial Prompt Templates\n",
    "\n",
    "A prompt template is a class with a `.format` method which takes in a key-value map and returns a string (a prompt) to pass to the language model. Like other methods, it can make sense to \"partial\" a prompt template - eg pass in a subset of the required values, as to create a new prompt template which expects only the remaining subset of values.\n",
    "\n",
    "LangChain supports this in two ways: we allow for partially formatted prompts (1) with string values, (2) with functions that return string values. These two different ways support different use cases. In the documentation below we go over the motivations for both use cases as well as how to do it in LangChain.\n",
    "\n",
    "## Partial With Strings\n",
    "\n",
    "One common use case for wanting to partial a prompt template is if you get some of the variables before others. For example, suppose you have a prompt template that requires two variables, `foo` and `baz`. If you get the `foo` value early on in the chain, but the `baz` value later, it can be annoying to wait until you have both variables in the same place to pass them to the prompt template. Instead, you can partial the prompt template with the `foo` value, and then pass the partialed prompt template along and just use that. Below is an example of doing this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "643af5da",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.prompts import PromptTemplate"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "4080d8d7",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "foobaz\n"
     ]
    }
   ],
   "source": [
    "prompt = PromptTemplate(template=\"{foo}{bar}\", input_variables=[\"foo\", \"bar\"])\n",
    "partial_prompt = prompt.partial(foo=\"foo\");\n",
    "print(partial_prompt.format(bar=\"baz\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9986766e",
   "metadata": {},
   "source": [
    "You can also just initialize the prompt with the partialed variables."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "e2ce95b3",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "foobaz\n"
     ]
    }
   ],
   "source": [
    "prompt = PromptTemplate(template=\"{foo}{bar}\", input_variables=[\"bar\"], partial_variables={\"foo\": \"foo\"})\n",
    "print(prompt.format(bar=\"baz\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a9c66f83",
   "metadata": {},
   "source": [
    "## Partial With Functions\n",
    "\n",
    "The other common use is to partial with a function. The use case for this is when you have a variable you know that you always want to fetch in a common way. A prime example of this is with date or time. Imagine you have a prompt which you always want to have the current date. You can't hard code it in the prompt, and passing it along with the other input variables is a bit annoying. In this case, it's very handy to be able to partial the prompt with a function that always returns the current date."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "d0712d8a",
   "metadata": {},
   "outputs": [],
   "source": [
    "from datetime import datetime\n",
    "\n",
    "def _get_datetime():\n",
    "    now = datetime.now()\n",
    "    return now.strftime(\"%m/%d/%Y, %H:%M:%S\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "4cbcb666",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Tell me a funny joke about the day 02/27/2023, 22:15:16\n"
     ]
    }
   ],
   "source": [
    "prompt = PromptTemplate(\n",
    "    template=\"Tell me a {adjective} joke about the day {date}\", \n",
    "    input_variables=[\"adjective\", \"date\"]\n",
    ");\n",
    "partial_prompt = prompt.partial(date=_get_datetime)\n",
    "print(partial_prompt.format(adjective=\"funny\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ffed6811",
   "metadata": {},
   "source": [
    "You can also just initialize the prompt with the partialed variables, which often makes more sense in this workflow."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "96285b25",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Tell me a funny joke about the day 02/27/2023, 22:15:16\n"
     ]
    }
   ],
   "source": [
    "prompt = PromptTemplate(\n",
    "    template=\"Tell me a {adjective} joke about the day {date}\", \n",
    "    input_variables=[\"adjective\"],\n",
    "    partial_variables={\"date\": _get_datetime}\n",
    ");\n",
    "print(prompt.format(adjective=\"funny\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4bff16f7",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
 }
--- a/docs/modules/prompts/how_to_guides.rst
+++ b/docs/modules/prompts/how_to_guides.rst
@ -17,6 +17,8 @@ The user guide here shows more advanced workflows and how to use the library in
 `Few Shot Prompt Examples <./examples/few_shot_examples.html>`_: Examples of Few Shot Prompt Templates.
 `Partial Prompt Template <./examples/partial.html>`_: How to partial Prompt Templates.
 .. toctree::
--- a/docs/modules/utils/examples/docker.ipynb
+++ b/docs/modules/utils/examples/docker.ipynb
@ -0,0 +1,180 @@
 {
  "cells": [
    {
      "cell_type": "code",
      "metadata": {
        "jukit_cell_id": "O4HPx3boF0"
      },
      "source": [],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "jukit_cell_id": "hqQkbPEwTJ"
      },
      "source": [
        "# Using the DockerWrapper utility"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "jukit_cell_id": "vCepuypaFH"
      },
      "source": [
        "from langchain.utilities.docker import DockerWrapper"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "code",
      "metadata": {
        "jukit_cell_id": "BtYVqy2YtO"
      },
      "source": [
        "d = DockerWrapper(image='shell')"
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "code",
      "metadata": {
        "jukit_cell_id": "ELWWm03ptQ"
      },
      "source": [
        "query = \"\"\"\n",
        "for i in $(seq 1 10)\n",
        "do\n",
        "    echo $i\n",
        "done\n",
        "\"\"\"\n",
        "print(d.exec_run(query))"
      ],
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": "1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n"
        }
      ],
      "execution_count": 1
    },
    {
      "cell_type": "code",
      "metadata": {
        "jukit_cell_id": "lGMqLz5sDo"
      },
      "source": [
        "p = DockerWrapper(image='python')\n",
        "\n",
        "py_payload = \"\"\"\n",
        "def hello_world():\n",
        "    return 'hello world'\n",
        "\n",
        "hello_world()\n",
        "\"\"\""
      ],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "code",
      "metadata": {
        "jukit_cell_id": "X04Wd6zbrk"
      },
      "source": [
        "print(p.exec_run(py_payload))"
      ],
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": "'hello world'\n"
        }
      ],
      "execution_count": 2
    },
    {
      "cell_type": "code",
      "metadata": {
        "jukit_cell_id": "lKOfuDoJGk"
      },
      "source": [],
      "outputs": [],
      "execution_count": null
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "jukit_cell_id": "eSzXtDrpqU"
      },
      "source": [
        "## Passing custom parameters\n",
        "\n",
        "By default containers are run with a safe set of parameters. You can pass any parameters\n",
        "that are accepted by the docker python sdk to the run and exec commands.\n",
        "\n",
        "### Using networking"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "jukit_cell_id": "eWFGCxD9pv"
      },
      "source": [
        "# by default containers don't have access to the network\n",
        "print(d.run('ping -c 1 google.com'))"
      ],
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": "STDERR: Command '/bin/sh -c 'ping -c 1 google.com'' in image 'alpine:latest' returned non-zero exit status 1: b\"ping: bad address 'google.com'\\n\"\n"
        }
      ],
      "execution_count": 3
    },
    {
      "cell_type": "code",
      "metadata": {
        "jukit_cell_id": "Z0YkpuXVyL"
      },
      "source": [
        "# using the network parameter\n",
        "print(d.run('ping -c 1 google.com', network='bridge'))"
      ],
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": "PING google.com (142.250.200.110): 56 data bytes\n64 bytes from 142.250.200.110: seq=0 ttl=42 time=13.695 ms\n\n--- google.com ping statistics ---\n1 packets transmitted, 1 packets received, 0% packet loss\nround-trip min/avg/max = 13.695/13.695/13.695 ms\n"
        }
      ],
      "execution_count": 4
    },
    {
      "cell_type": "code",
      "metadata": {
        "jukit_cell_id": "3rMWzzuLHq"
      },
      "source": [],
      "outputs": [],
      "execution_count": null
    }
  ],
  "metadata": {
    "anaconda-cloud": {},
    "kernelspec": {
      "display_name": "python",
      "language": "python",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 4
 }
--- a/langchain/document_loaders/init.py
+++ b/langchain/document_loaders/init.py
@ -16,6 +16,8 @@ from langchain.document_loaders.googledrive import GoogleDriveLoader
 from langchain.document_loaders.gutenberg import GutenbergLoader
 from langchain.document_loaders.hn import HNLoader
 from langchain.document_loaders.html import UnstructuredHTMLLoader
 from langchain.document_loaders.ifixit import IFixitLoader
 from langchain.document_loaders.image import UnstructuredImageLoader
 from langchain.document_loaders.imsdb import IMSDbLoader
 from langchain.document_loaders.notebook import NotebookLoader
 from langchain.document_loaders.notion import NotionDirectoryLoader
@ -52,6 +54,7 @@ __all__ = [
    "UnstructuredPowerPointLoader",
    "UnstructuredWordDocumentLoader",
    "UnstructuredPDFLoader",
    "UnstructuredImageLoader",
    "ObsidianLoader",
    "UnstructuredDocxLoader",
    "UnstructuredEmailLoader",
@ -68,6 +71,7 @@ __all__ = [
    "IMSDbLoader",
    "AZLyricsLoader",
    "CollegeConfidentialLoader",
    "IFixitLoader",
    "GutenbergLoader",
    "PagedPDFSplitter",
    "EverNoteLoader",
--- a/langchain/document_loaders/ifixit.py
+++ b/langchain/document_loaders/ifixit.py
@ -0,0 +1,202 @@
 """Loader that loads iFixit data."""
 from typing import List, Optional
 import requests
 from langchain.docstore.document import Document
 from langchain.document_loaders.base import BaseLoader
 from langchain.document_loaders.web_base import WebBaseLoader
 IFIXIT_BASE_URL = "https://www.ifixit.com/api/2.0"
 class IFixitLoader(BaseLoader):
    """Load iFixit repair guides, device wikis and answers.
    iFixit is the largest, open repair community on the web. The site contains nearly
    100k repair manuals, 200k Questions & Answers on 42k devices, and all the data is
    licensed under CC-BY.
    This loader will allow you to download the text of a repair guide, text of Q&A's
    and wikis from devices on iFixit using their open APIs and web scraping.
    """
    def __init__(self, web_path: str):
        """Initialize with web path."""
        if not web_path.startswith("https://www.ifixit.com"):
            raise ValueError("web path must start with 'https://www.ifixit.com'")
        path = web_path.replace("https://www.ifixit.com", "")
        allowed_paths = ["/Device", "/Guide", "/Answers", "/Teardown"]
        """ TODO: Add /Wiki """
        if not any(path.startswith(allowed_path) for allowed_path in allowed_paths):
            raise ValueError(
                "web path must start with /Device, /Guide, /Teardown or /Answers"
            )
        pieces = [x for x in path.split("/") if x]
        """Teardowns are just guides by a different name"""
        self.page_type = pieces[0] if pieces[0] != "Teardown" else "Guide"
        if self.page_type == "Guide" or self.page_type == "Answers":
            self.id = pieces[2]
        else:
            self.id = pieces[1]
        self.web_path = web_path
    def load(self) -> List[Document]:
        if self.page_type == "Device":
            return self.load_device()
        elif self.page_type == "Guide" or self.page_type == "Teardown":
            return self.load_guide()
        elif self.page_type == "Answers":
            return self.load_questions_and_answers()
        else:
            raise ValueError("Unknown page type: " + self.page_type)
    @staticmethod
    def load_suggestions(query: str = "", doc_type: str = "all") -> List[Document]:
        res = requests.get(
            IFIXIT_BASE_URL + "/suggest/" + query + "?doctypes=" + doc_type
        )
        if res.status_code != 200:
            raise ValueError(
                'Could not load suggestions for "' + query + '"\n' + res.json()
            )
        data = res.json()
        results = data["results"]
        output = []
        for result in results:
            try:
                loader = IFixitLoader(result["url"])
                if loader.page_type == "Device":
                    output += loader.load_device(include_guides=False)
                else:
                    output += loader.load()
            except ValueError:
                continue
        return output
    def load_questions_and_answers(
        self, url_override: Optional[str] = None
    ) -> List[Document]:
        loader = WebBaseLoader(self.web_path if url_override is None else url_override)
        soup = loader.scrape()
        output = []
        title = soup.find("h1", "post-title").text
        output.append("# " + title)
        output.append(soup.select_one(".post-content .post-text").text.strip())
        output.append("\n## " + soup.find("div", "post-answers-header").text.strip())
        for answer in soup.select(".js-answers-list .post.post-answer"):
            if answer.has_attr("itemprop") and "acceptedAnswer" in answer["itemprop"]:
                output.append("\n### Accepted Answer")
            elif "post-helpful" in answer["class"]:
                output.append("\n### Most Helpful Answer")
            else:
                output.append("\n### Other Answer")
            output += [
                a.text.strip() for a in answer.select(".post-content .post-text")
            ]
            output.append("\n")
        text = "\n".join(output).strip()
        metadata = {"source": self.web_path, "title": title}
        return [Document(page_content=text, metadata=metadata)]
    def load_device(
        self, url_override: Optional[str] = None, include_guides: bool = True
    ) -> List[Document]:
        documents = []
        if url_override is None:
            url = IFIXIT_BASE_URL + "/wikis/CATEGORY/" + self.id
        else:
            url = url_override
        res = requests.get(url)
        data = res.json()
        text = "\n".join(
            [
                data[key]
                for key in ["title", "description", "contents_raw"]
                if key in data
            ]
        ).strip()
        metadata = {"source": self.web_path, "title": data["title"]}
        documents.append(Document(page_content=text, metadata=metadata))
        if include_guides:
            """Load and return documents for each guide linked to from the device"""
            guide_urls = [guide["url"] for guide in data["guides"]]
            for guide_url in guide_urls:
                documents.append(IFixitLoader(guide_url).load()[0])
        return documents
    def load_guide(self, url_override: Optional[str] = None) -> List[Document]:
        if url_override is None:
            url = IFIXIT_BASE_URL + "/guides/" + self.id
        else:
            url = url_override
        res = requests.get(url)
        if res.status_code != 200:
            raise ValueError(
                "Could not load guide: " + self.web_path + "\n" + res.json()
            )
        data = res.json()
        doc_parts = ["# " + data["title"], data["introduction_raw"]]
        doc_parts.append("\n\n###Tools Required:")
        if len(data["tools"]) == 0:
            doc_parts.append("\n - None")
        else:
            for tool in data["tools"]:
                doc_parts.append("\n - " + tool["text"])
        doc_parts.append("\n\n###Parts Required:")
        if len(data["parts"]) == 0:
            doc_parts.append("\n - None")
        else:
            for part in data["parts"]:
                doc_parts.append("\n - " + part["text"])
        for row in data["steps"]:
            doc_parts.append(
                "\n\n## "
                + (
                    row["title"]
                    if row["title"] != ""
                    else "Step {}".format(row["orderby"])
                )
            )
            for line in row["lines"]:
                doc_parts.append(line["text_raw"])
        doc_parts.append(data["conclusion_raw"])
        text = "\n".join(doc_parts)
        metadata = {"source": self.web_path, "title": data["title"]}
        return [Document(page_content=text, metadata=metadata)]
--- a/langchain/document_loaders/image.py
+++ b/langchain/document_loaders/image.py
@ -0,0 +1,13 @@
 """Loader that loads image files."""
 from typing import List
 from langchain.document_loaders.unstructured import UnstructuredFileLoader
 class UnstructuredImageLoader(UnstructuredFileLoader):
    """Loader that uses unstructured to load image files, such as PNGs and JPGs."""
    def _get_elements(self) -> List:
        from unstructured.partition.image import partition_image
        return partition_image(filename=self.file_path)
--- a/langchain/prompts/base.py
+++ b/langchain/prompts/base.py
@ -1,12 +1,14 @@
 """BasePrompt schema definition."""
 from __future__ import annotations
 import json
 import re
 from abc import ABC, abstractmethod
 from pathlib import Path
-from typing import Any, Callable, Dict, List, Optional, Union
+from typing import Any, Callable, Dict, List, Mapping, Optional, Union
 import yaml
-from pydantic import BaseModel, Extra, root_validator
+from pydantic import BaseModel, Extra, Field, root_validator
 from langchain.formatting import formatter
@ -117,6 +119,9 @@ class BasePromptTemplate(BaseModel, ABC):
    """A list of the names of the variables the prompt template expects."""
    output_parser: Optional[BaseOutputParser] = None
    """How to parse the output of calling an LLM on this formatted prompt."""
    partial_variables: Mapping[str, Union[str, Callable[[], str]]] = Field(
        default_factory=dict
    )
    class Config:
        """Configuration for this pydantic object."""
@ -132,8 +137,38 @@ class BasePromptTemplate(BaseModel, ABC):
                "Cannot have an input variable named 'stop', as it is used internally,"
                " please rename."
            )
        if "stop" in values["partial_variables"]:
            raise ValueError(
                "Cannot have an partial variable named 'stop', as it is used "
                "internally, please rename."
            )
        overall = set(values["input_variables"]).intersection(
            values["partial_variables"]
        )
        if overall:
            raise ValueError(
                f"Found overlapping input and partial variables: {overall}"
            )
        return values
    def partial(self, **kwargs: Union[str, Callable[[], str]]) -> BasePromptTemplate:
        """Return a partial of the prompt template."""
        prompt_dict = self.__dict__.copy()
        prompt_dict["input_variables"] = list(
            set(self.input_variables).difference(kwargs)
        )
        prompt_dict["partial_variables"] = {**self.partial_variables, **kwargs}
        return type(self)(**prompt_dict)
    def _merge_partial_and_user_variables(self, **kwargs: Any) -> Dict[str, Any]:
        # Get partial params:
        partial_kwargs = {
            k: v if isinstance(v, str) else v()
            for k, v in self.partial_variables.items()
        }
        return {**partial_kwargs, **kwargs}
    @abstractmethod
    def format(self, **kwargs: Any) -> str:
        """Format the prompt with the inputs.
@ -173,6 +208,8 @@ class BasePromptTemplate(BaseModel, ABC):
            prompt.save(file_path="path/prompt.yaml")
        """
        if self.partial_variables:
            raise ValueError("Cannot save prompt with partial variables.")
        # Convert file to Path object.
        if isinstance(file_path, str):
            save_path = Path(file_path)
--- a/langchain/prompts/few_shot.py
+++ b/langchain/prompts/few_shot.py
@ -68,7 +68,7 @@ class FewShotPromptTemplate(BasePromptTemplate, BaseModel):
            check_valid_template(
                values["prefix"] + values["suffix"],
                values["template_format"],
-                values["input_variables"],
+                values["input_variables"] + list(values["partial_variables"]),
            )
        return values
@ -101,6 +101,7 @@ class FewShotPromptTemplate(BasePromptTemplate, BaseModel):
            prompt.format(variable1="foo")
        """
        kwargs = self._merge_partial_and_user_variables(**kwargs)
        # Get the examples to use.
        examples = self._get_examples(**kwargs)
        # Format the examples.
@ -110,6 +111,7 @@ class FewShotPromptTemplate(BasePromptTemplate, BaseModel):
        # Create the overall template.
        pieces = [self.prefix, *example_strings, self.suffix]
        template = self.example_separator.join([piece for piece in pieces if piece])
        # Format the template with the input variables.
        return DEFAULT_FORMATTER_MAPPING[self.template_format](template, **kwargs)
--- a/langchain/prompts/few_shot_with_templates.py
+++ b/langchain/prompts/few_shot_with_templates.py
@ -60,15 +60,17 @@ class FewShotPromptWithTemplates(BasePromptTemplate, BaseModel):
    @root_validator()
    def template_is_valid(cls, values: Dict) -> Dict:
        """Check that prefix, suffix and input variables are consistent."""
        if values["validate_template"]:
            input_variables = values["input_variables"]
            expected_input_variables = set(values["suffix"].input_variables)
            expected_input_variables |= set(values["partial_variables"])
            if values["prefix"] is not None:
                expected_input_variables |= set(values["prefix"].input_variables)
            missing_vars = expected_input_variables.difference(input_variables)
            if missing_vars:
                raise ValueError(
-                f"Got input_variables={input_variables}, but based on prefix/suffix "
+                    f"Got input_variables={input_variables}, but based on "
-                f"expected {expected_input_variables}"
+                    f"prefix/suffix expected {expected_input_variables}"
                )
        return values
@ -101,6 +103,7 @@ class FewShotPromptWithTemplates(BasePromptTemplate, BaseModel):
            prompt.format(variable1="foo")
        """
        kwargs = self._merge_partial_and_user_variables(**kwargs)
        # Get the examples to use.
        examples = self._get_examples(**kwargs)
        # Format the examples.
--- a/langchain/prompts/prompt.py
+++ b/langchain/prompts/prompt.py
@ -60,14 +60,16 @@ class PromptTemplate(BasePromptTemplate, BaseModel):
            prompt.format(variable1="foo")
        """
        kwargs = self._merge_partial_and_user_variables(**kwargs)
        return DEFAULT_FORMATTER_MAPPING[self.template_format](self.template, **kwargs)
    @root_validator()
    def template_is_valid(cls, values: Dict) -> Dict:
        """Check that template and input variables are consistent."""
        if values["validate_template"]:
            all_inputs = values["input_variables"] + list(values["partial_variables"])
            check_valid_template(
-                values["template"], values["template_format"], values["input_variables"]
+                values["template"], values["template_format"], all_inputs
            )
        return values
--- a/langchain/utilities/docker/init.py
+++ b/langchain/utilities/docker/init.py
@ -0,0 +1,42 @@
 """Wrapper for untrusted code exectuion on docker."""
 # TODO:  pass payload to contanier via filesystem 
 # TEST:  more tests for attach to running container
 # TODO:  embed file payloads in the call to run (in LLMChain)?
 # TODO:  [doc] image selection helper
 # TODO:  LLMChain decorator ?
 import docker
 from typing import Any
 import logging
 logger = logging.getLogger(__name__)
 GVISOR_WARNING = """Warning: gVisor runtime not available for {docker_host}.
 Running untrusted code in a container without gVisor is not recommended. Docker
 containers are not isolated. They can be abused to gain access to the host
 system. To mitigate this risk, gVisor can be used to run the container in a
 sandboxed environment. see: https://gvisor.dev/ for more info.
 """
 def gvisor_runtime_available(client: Any) -> bool:
    """Verify if gVisor runtime is available."""
    logger.debug("verifying availability of gVisor runtime...")
    info = client.info()
    if 'Runtimes' in info:
        return 'runsc' in info['Runtimes']
    return False
 def _check_gvisor_runtime():
    client = docker.from_env()
    docker_host = client.api.base_url
    if not gvisor_runtime_available(docker.from_env()):
        logger.warning(GVISOR_WARNING.format(docker_host=docker_host))
 _check_gvisor_runtime()
 from .tool import DockerWrapper
--- a/langchain/utilities/docker/images.py
+++ b/langchain/utilities/docker/images.py
@ -0,0 +1,103 @@
 """This module defines template images and halpers for common docker images."""
 from enum import Enum
 from typing import Optional, List, Type, Union
 from pydantic import BaseModel, Extra, validator
 class BaseImage(BaseModel, extra=Extra.forbid):
    """Base docker image template class."""
    tty: bool = False
    stdin_open: bool = True
    name: str
    tag: Optional[str] = 'latest'
    default_command: Optional[List[str]] = None
    stdin_command: Optional[List[str]] = None
    network: str = 'none'
    def dict(self, *args, **kwargs):
        """Override the dict method to add the image name."""
        d = super().dict(*args, **kwargs)
        del d['name']
        del d['tag']
        d['image'] = self.image_name
        return d
    @property
    def image_name(self) -> str:
        """Image name."""
        return f'{self.name}:{self.tag}'
 class ShellTypes(str, Enum):
    """Enum class for shell types."""
    bash = '/bin/bash'
    sh = '/bin/sh'
    zsh = '/bin/zsh'
 class Shell(BaseImage):
    """Shell image focused on running shell commands.
    A shell image can be crated by passing a shell alias such as `sh` or `bash`
    or by passing the full path to the shell binary.
    """
    name: str = 'alpine'
    default_command: List[str] = [ShellTypes.sh.value, '-c']
    stdin_command: List[str] = [ShellTypes.sh.value, '-i']
    @validator('default_command')
    def validate_default_command(cls, value: str) -> str:
        """Validate shell type."""
        val = getattr(ShellTypes, value, None)
        if val:
            return val.value
        return value
    @validator('stdin_command')
    def validate_stdin_command(cls, value: str) -> str:
        """Validate shell type."""
        val = getattr(ShellTypes, value, None)
        if val:
            return val.value
        return value
 # example using base image to construct python image
 class Python(BaseImage):
    """Python image class.
        The python image needs to be launced using the `python3 -i` command to keep
        stdin open.
    """
    name: str = 'python'
    default_command: List[str] = ['python3', '-c']
    stdin_command: List[str] = ['python3', '-iq']
 def get_image_template(image_name: str = 'shell') -> Union[str, Type[BaseImage]]:
    """Helper to get an image template from a string.
    It tries to find a class with the same name as the image name and returns the
    class. If no class is found, it returns the image name.
 		.. code-block:: python
            >>> image = get_image_template('python')
            >>> assert type(image) == Python
    """
    import importlib
    import inspect
    classes = inspect.getmembers(importlib.import_module(__name__),
                                 lambda x: inspect.isclass(x) and x.__name__ == image_name.capitalize()
                                 )
    if classes:
        cls = classes[0][1]
        return cls
    else:
        return image_name
--- a/langchain/utilities/docker/socket_io.py
+++ b/langchain/utilities/docker/socket_io.py
@ -0,0 +1,110 @@
 """Low level socket IO for docker API."""
 import struct
 import logging
 from typing import Any
 logger = logging.getLogger(__name__)
 SOCK_BUF_SIZE = 1024
 class DockerSocket:
    """Wrapper around docker API's socket object. Can be used as a context manager."""
    _timeout: int = 5
    def __init__(self, socket, timeout: int = _timeout):
        self.socket = socket
        self.socket._sock.settimeout(timeout)
        # self.socket._sock.setblocking(False)
    def __enter__(self):
        return self
    def __exit__(self, exc_type, exc_value, traceback):
        self.close()
    def close(self):
        logger.debug("closing socket...")
        self.socket._sock.shutdown(2) # 2 = SHUT_RDWR
        self.socket._sock.close()
        self.socket.close()
    def sendall(self, data: bytes) -> None:
        self.socket._sock.sendall(data)
    def setblocking(self, flag: bool) -> None:
        self.socket._sock.setblocking(flag)
    def recv(self) -> Any:
        """Wrapper for socket.recv that does buffured read."""
        # NOTE: this is optional as a bonus
        # TODO: Recv with TTY enabled
        #
        # When the TTY setting is enabled in POST /containers/create, the stream
        # is not multiplexed. The data exchanged over the hijacked connection is
        # simply the raw data from the process PTY and client's stdin.
        # header := [8]byte{STREAM_TYPE, 0, 0, 0, SIZE1, SIZE2, SIZE3, SIZE4}
        # STREAM_TYPE can be:
        #
        # 0: stdin (is written on stdout)
        # 1: stdout
        # 2: stderr
        # SIZE1, SIZE2, SIZE3, SIZE4 are the four bytes of the uint32 size encoded as
        # big endian.
        #
        # Following the header is the payload, which is the specified number of bytes of
        # STREAM_TYPE.
        #
        # The simplest way to implement this protocol is the following:
        #
        # - Read 8 bytes.
        # - Choose stdout or stderr depending on the first byte.
        # - Extract the frame size from the last four bytes.
        # - Read the extracted size and output it on the correct output.
        # - Goto 1.
        chunks = []
        # try:
        #     self.socket._sock.recv(8)
        # except BlockingIOError as e:
        #     raise ValueError("incomplete read from container output")
        while True:
            header = b''
            try:
                # strip the header
                # the first recv is blocking to wait for the container to start
                header = self.socket._sock.recv(8)
            except BlockingIOError:
                # logger.debug("[header] blocking IO")
                break
            self.socket._sock.setblocking(False)
            if header == b'':
                break
            stream_type, size = struct.unpack("!BxxxI", header)
            payload = b''
            while size:
                chunk = b''
                try:
                    chunk = self.socket._sock.recv(min(size, SOCK_BUF_SIZE))
                except BlockingIOError:
                    # logger.debug("[body] blocking IO")
                    break
                if chunk == b'':
                    raise ValueError("incomplete read from container output")
                payload += chunk
                size -= len(chunk)
            chunks.append((stream_type, payload))
            # try:
            #     msg = self.socket._sock.recv(SOCK_BUF_SIZE)
            #     chunk += msg
            # except BlockingIOError as e:
            #     break
        return chunks
--- a/langchain/utilities/docker/tool.py
+++ b/langchain/utilities/docker/tool.py
@ -0,0 +1,449 @@
 # TODO!: using pexpect to with containers
 # TODO: add default expect pattern to image template
 # TODO: pass max reads parameters for read trials
 # NOTE: spawning with tty true or not gives slightly different stdout format
 # NOTE: echo=False works when tty is disabled and only stdin is connected
 import shlex
 import os
 import io
 import tarfile
 import time
 import pandas as pd  # type: ignore
 import docker
 import socket
 from typing import Any, Dict, Optional, Union, Type
 from pydantic import BaseModel, Extra, root_validator, Field
 from docker.errors import APIError, ContainerError  # type: ignore
 from .images import Shell, BaseImage, get_image_template
 from . import gvisor_runtime_available
 from .socket_io import DockerSocket
 import logging
 logger = logging.getLogger(__name__)
 _default_params = {
        # the only required parameter to be able to attach.
        'stdin_open': True,
        }
 def _get_command(query: str, **kwargs: Dict) -> str:
    """Build an escaped command from a query string and keyword arguments."""
    cmd = query
    if 'default_command' in kwargs:
        cmd = shlex.join([*kwargs.get('default_command'), query])  # type: ignore
    return cmd
 class DockerWrapper(BaseModel, extra=Extra.allow):
    """Executes arbitrary commands or payloads on containers and returns the output.
    Args:
        image (str | Type[BaseImage]): Docker image to use for execution. The
        image can be a string or a subclass of images.BaseImage.
        default_command (List[str]): Default command to use when creating the container.
    """
    _docker_client: docker.DockerClient = None  # type: ignore
    _params: Dict = Field(default_factory=Shell().dict(), skip=True)
    image: Union[str, Type[BaseImage]] = Field(default_factory=Shell, skip=True)
    from_env: Optional[bool] = Field(default=True, skip=True)
    # @property
    # def image_name(self) -> str:
    #     """The image name that will be used when creating a container."""
    #     return self._params.image
    #
    def __init__(self, **kwargs):
        """Initialize docker client."""
        super().__init__(**kwargs)
        if self.from_env:
            self._docker_client = docker.from_env()
            if gvisor_runtime_available(docker.from_env()):
                self._params['runtime'] = 'runsc'
        # if not isinstance(self.image, str) and issubclass(self.image, BaseImage):
        #     self._params = {**self._params, **self.image().dict()}
        #
        # # if the user defined a custom image not pre registerd already we should
        # # not use the custom command
        # elif isinstance(self.image, str):
        #     self._params = {**_default_params(), **{'image': self.image}}
    @property
    def client(self) -> docker.DockerClient:  # type: ignore
        """Docker client."""
        return self._docker_client
    @property
    def info(self) -> Any:
        """Prints docker `info`."""
        return self._docker_client.info()
    # @validator("image", pre=True, always=True)
    # def validate_image(cls, value):
    #     if value is None:
    #         raise ValueError("image is required")
    #     if isinstance(value, str) :
    #         image = get_image(value)
    #         if isinstance(image, BaseImage):
    #             return image
    #         else:
    #             #set default params to base ones
    #     if issubclass(value, BaseImage):
    #         return value
    #     else:
    #         raise ValueError("image must be a string or a subclass of images.BaseImage")
    @root_validator()
    def validate_all(cls, values: Dict) -> Dict:
        """Validate environment."""
        image = values.get("image")
        if image is None:
            raise ValueError("image is required")
        if isinstance(image, str):
            # try to get image
            _image = get_image_template(image)
            if isinstance(_image, str):
                # user wants a custom image, we should use default params
                values["_params"] = {**_default_params, **{'image': image}}
            else:
                # user wants a pre registered image, we should use the image params
                values["_params"] = _image().dict()
        # image is a BaseImage class
        elif issubclass(image.__class__, BaseImage):
            values["_params"] = image.dict()
        def field_filter(x):
            fields = cls.__fields__
            if x[0] == '_params':
                return False
            field = fields.get(x[0], None)
            if not field:
                return True
            return not field.field_info.extra.get('skip', False)
        filtered_fields: Dict[Any, Any] = dict(filter(field_filter, values.items()))  # type: ignore
        values["_params"] = {**values["_params"],
                             **filtered_fields}
        return values
    def _clean_kwargs(self, kwargs: dict) -> dict:
        kwargs.pop('default_command', None)
        kwargs.pop('stdin_command', None)
        return kwargs
    #FIX: default shell command should be different in run vs exec mode
    def run(self, query: str, **kwargs: Any) -> str:
        """Run arbitrary shell command inside a container.
        This method will concatenate the registered default command with the provided
        query.
        Args:
            query (str): The command to run.
            **kwargs: Pass extra parameters to DockerClient.container.run.
        """
        kwargs = {**self._params, **kwargs}
        args = {
                'image': self._params.get('image'),
                'command': query,
                }
        del kwargs['image']
        cmd = _get_command(query, **kwargs)
        self._clean_kwargs(kwargs)
        args['command'] = cmd
        # print(f"args: {args}")
        # print(f"kwargs: {kwargs}")
        # return
        logger.debug(f"running command {args['command']}")
        logger.debug(f"with params {kwargs}")
        try:
            result = self._docker_client.containers.run(*(args.values()),
                                                        remove=True,
                                                        **kwargs)
            return result.decode('utf-8').strip()
        except ContainerError as e:
            return f"STDERR: {e}"
        # TODO: handle docker APIError ?
        except APIError as e:
            logger.debug(f"APIError: {e}")
            return "ERROR"
    def _flush_prompt(self, _socket):
        flush = _socket.recv()
        _socket.setblocking(True)
        logger.debug(f"flushed output: {flush}")
    def _massage_output_streams(self, output):
        df = pd.DataFrame(output, columns=['stream_type', 'payload'])
        df['payload'] = df['payload'].apply(lambda x: x.decode('utf-8'))
        df['stream_type'] = df['stream_type'].apply(
                lambda x: 'stdout' if x == 1 else 'stderr')
        payload = df.groupby('stream_type')['payload'].apply(''.join).to_dict()
        logger.debug(f"payload: {payload}")
        return payload
    # TODO: document dif between run and exec_run
    def exec_run(self, query: str, timeout: int = 5,
                 delay: float = 0.5,
                 with_stderr: bool = False,
                 flush_prompt: bool = False,
                 **kwargs: Any) -> str:
        """Run a shell command inside an ephemeral container.
        This will create a container, run the command, and then remove the
        container. the input is sent to the container's stdin through a socket
        using Docker API. It effectively simulates a tty session.
        Args:
            query (str): The command to execute.
            timeout (int): The timeout for receiving from the attached stdin.
            delay (float): The delay in seconds before running the command.
            with_stderr (bool): If True, the stderr will be included in the output
            flush_prompt (bool): If True, the prompt will be flushed before running the command.
            **kwargs: Pass extra parameters to DockerClient.container.exec_run.
        """
        # it is necessary to open stdin to keep the container running after it's started
        # the attach_socket will hold the connection open until the container is stopped or
        # the socket is closed.
        # NOTE: using tty=True to be able to simulate a tty session.
        # NOTE: some images like python need to be launched with custom
        # parameters to keep stdin open. For example python image needs to be
        # started with the command `python3 -i`
        # remove local variables from kwargs
        for arg in kwargs.keys():
            if arg in locals():
                del kwargs[arg]
        kwargs = {**self._params, **kwargs}
        kwargs = self._clean_kwargs(kwargs)
        # exec_run requires flags for stdin so we use `stdin_command` as
        # a default command for creating the container 
        if 'stdin_command' in kwargs:
            assert isinstance(kwargs['stdin_command'], list)
            kwargs['command'] = shlex.join(kwargs['stdin_command'])
            del kwargs['stdin_command']
        # kwargs.pop('default_command', None)
        # kwargs['command'] = cmd
        # print(f"kwargs: {kwargs}")
        # return
        # TODO: handle both output mode for tty=True/False
        logger.debug(f"creating container with params {kwargs}")
        container = self._docker_client.containers.create(**kwargs)
        container.start()
        # get underlying socket
        # important to set 'stream' or attach API does not work
        _socket = container.attach_socket(params={'stdout': 1, 'stderr': 1,
                                                  'stdin': 1, 'stream': 1})
        # input()
        with DockerSocket(_socket, timeout=timeout) as _socket:
            # flush the output buffer (if any prompt)
            if flush_prompt: 
                self._flush_prompt(_socket)
            # TEST: make sure the container is ready ? use a blocking first call
            raw_input = f"{query}\n".encode('utf-8')
            _socket.sendall(raw_input)
            #NOTE: delay ensures that the command is executed after the input is sent
            time.sleep(delay) #this should be available as a parameter
            try:
                output = _socket.recv()
            except socket.timeout:
                return "ERROR: timeout"
        try:
            container.kill()
        except APIError:
            pass
        container.remove(force=True)
        if output is None:
            logger.warning("no output")
            return "ERROR"
        # output is stored in a list of tuples (stream_type, payload)
        payload = self._massage_output_streams(output)
        #NOTE: stderr might contain only the prompt
        if 'stdout' in payload and 'stderr' in payload and with_stderr:
            return f"STDOUT:\n {payload['stdout'].strip()}\nSTDERR:\n {payload['stderr']}"
        elif 'stderr' in payload and not 'stdout' in payload:
            return f"STDERR: {payload['stderr']}"
        else:
            return payload['stdout'].strip()
    def exec_attached(self, query: str, container: str,
                      delay: float = 0.5,
                      timeout: int = 5,
                      with_stderr: bool = False,
                      flush_prompt: bool = False,
                      **kwargs: Any) -> str:
        """Attach to container and exec query on it.
        This method is very similary to exec_run. It only differs in that it attaches to
        an already specifed container instead of creating a new one for each query.
        Args:
            query (str): The command to execute.
            container (str): The container to attach to.
            timeout (int): The timeout for receiving from the attached stdin.
            delay (float): The delay in seconds before running the command.
            with_stderr (bool): If True, the stderr will be included in the output
            flush_prompt (bool): If True, the prompt will be flushed before running the command.
            **kwargs: Pass extra parameters to DockerClient.container.exec_run.
        """
        # remove local variables from kwargs
        for arg in kwargs.keys():
            if arg in locals():
                del kwargs[arg]
        kwargs = {**self._params, **kwargs}
        kwargs = self._clean_kwargs(kwargs)
        logger.debug(f"attaching to container {container} with params {kwargs}")
        try:
            _container = self._docker_client.containers.get(container)
        except Exception as e:
            logger.error(f"container {container}: {e}")
            return "ERROR"
        _socket = _container.attach_socket(params={'stdout': 1, 'stderr': 1,
                                                  'stdin': 1, 'stream': 1})
        with DockerSocket(_socket, timeout=timeout) as _socket:
            # flush the output buffer (if any prompt)
            if flush_prompt: 
                self._flush_prompt(_socket)
            raw_input = f"{query}\n".encode('utf-8')
            _socket.sendall(raw_input)
            #NOTE: delay ensures that the command is executed after the input is sent
            time.sleep(delay) #this should be available as a parameter
            try:
                output = _socket.recv()
            except socket.timeout:
                return "ERROR: timeout"
        if output is None:
            logger.warning("no output")
            return "ERROR"
        payload = self._massage_output_streams(output)
        print(payload)
        #NOTE: stderr might contain only the prompt
        if 'stdout' in payload and 'stderr' in payload and with_stderr:
            return f"STDOUT:\n {payload['stdout'].strip()}\nSTDERR:\n {payload['stderr']}"
        elif 'stderr' in payload and not 'stdout' in payload:
            return f"STDERR: {payload['stderr']}"
        else:
            return payload['stdout'].strip()
    #WIP method that will copy the given payload to the container filesystem then
    # invoke the command on the file and return the output
    def run_file(self, payload: bytes, filename: Optional[str] = None,
                 **kwargs: Any) -> str:
        """Run arbitrary shell command inside an ephemeral container on the
        specified input payload."""
        for arg in kwargs.keys():
            if arg in locals():
                del kwargs[arg]
        kwargs = {**self._params, **kwargs}
        self._clean_kwargs(kwargs)
        kwargs['command'] = '/bin/sh'
        k_file_location = '/tmp/payload'
        if filename is not None:
            # store at /tmp/file_name
            # strip all leading path components
            file_loc = os.path.basename(filename)
            k_file_location = f'/tmp/{file_loc}'
        # print(kwargs)
        # return
        # create a container with the given payload
        # container = self._docker_client.containers.create(**kwargs)
        # container.start()
        container = self._docker_client.containers.list()[0]
        print(container.short_id)
        # copy the payload to the container
        try:
            # put the data in tar archive at the path specified by k_file_location
            archive = io.BytesIO()
            with tarfile.TarFile(fileobj=archive, mode='w') as tar:
                tarinfo = tarfile.TarInfo(name='test-archive')
                tarinfo.size = len(payload)
                tarinfo.mtime = int(time.time())
                tar.addfile(tarinfo, io.BytesIO(payload))
            archive.seek(0)
            # store archive on local host at /tmp/test
            # with open('/tmp/test', 'wb') as f:
            #     f.write(archive.read())
            container.put_archive(path='/', data=archive)
        except APIError as e:
            logger.error(f"Error: {e}")
            return "ERROR"
        #execute the command
        exit_code, out = container.exec_run(['sh', k_file_location])
        print(f"exit_code: {exit_code}")
        print(f"out: {out}")
        # try:
        #     container.kill()
        # except APIError:
        #     pass
        # container.remove(force=True)
        return ""
--- a/poetry.lock
+++ b/poetry.lock
@ -1140,6 +1140,28 @@ idna = ["idna (>=2.1,<4.0)"]
 trio = ["trio (>=0.14,<0.23)"]
 wmi = ["wmi (>=1.5.1,<2.0.0)"]
 [[package]]
 name = "docker"
 version = "6.0.1"
 description = "A Python library for the Docker Engine API."
 category = "main"
 optional = true
 python-versions = ">=3.7"
 files = [
    {file = "docker-6.0.1-py3-none-any.whl", hash = "sha256:dbcb3bd2fa80dca0788ed908218bf43972772009b881ed1e20dfc29a65e49782"},
    {file = "docker-6.0.1.tar.gz", hash = "sha256:896c4282e5c7af5c45e8b683b0b0c33932974fe6e50fc6906a0a83616ab3da97"},
 ]
 [package.dependencies]
 packaging = ">=14.0"
 pywin32 = {version = ">=304", markers = "sys_platform == \"win32\""}
 requests = ">=2.26.0"
 urllib3 = ">=1.26.0"
 websocket-client = ">=0.32.0"
 [package.extras]
 ssh = ["paramiko (>=2.4.3)"]
 [[package]]
 name = "docutils"
 version = "0.17.1"
@ -4851,7 +4873,7 @@ files = [
 name = "pywin32"
 version = "305"
 description = "Python for Window Extensions"
-category = "dev"
+category = "main"
 optional = false
 python-versions = "*"
 files = [
@ -6059,7 +6081,7 @@ files = [
 ]
 [package.dependencies]
-greenlet = {version = "!=0.4.17", markers = "python_version >= \"3\" and (platform_machine == \"aarch64\" or platform_machine == \"ppc64le\" or platform_machine == \"x86_64\" or platform_machine == \"amd64\" or platform_machine == \"AMD64\" or platform_machine == \"win32\" or platform_machine == \"WIN32\")"}
+greenlet = {version = "!=0.4.17", markers = "python_version >= \"3\" and platform_machine == \"aarch64\" or python_version >= \"3\" and platform_machine == \"ppc64le\" or python_version >= \"3\" and platform_machine == \"x86_64\" or python_version >= \"3\" and platform_machine == \"amd64\" or python_version >= \"3\" and platform_machine == \"AMD64\" or python_version >= \"3\" and platform_machine == \"win32\" or python_version >= \"3\" and platform_machine == \"WIN32\""}
 [package.extras]
 aiomysql = ["aiomysql", "greenlet (!=0.4.17)"]
@ -7183,7 +7205,7 @@ files = [
 name = "websocket-client"
 version = "1.5.1"
 description = "WebSocket client for Python with low level API options"
-category = "dev"
+category = "main"
 optional = false
 python-versions = ">=3.7"
 files = [
@ -7498,10 +7520,11 @@ docs = ["furo", "jaraco.packaging (>=9)", "jaraco.tidelift (>=1.4)", "rst.linker
 testing = ["flake8 (<5)", "func-timeout", "jaraco.functools", "jaraco.itertools", "more-itertools", "pytest (>=6)", "pytest-black (>=0.3.7)", "pytest-checkdocs (>=2.4)", "pytest-cov", "pytest-enabler (>=1.3)", "pytest-flake8", "pytest-mypy (>=0.9.1)"]
 [extras]
-all = ["anthropic", "cohere", "openai", "nlpcloud", "huggingface_hub", "manifest-ml", "elasticsearch", "opensearch-py", "google-search-results", "faiss-cpu", "sentence-transformers", "transformers", "spacy", "nltk", "wikipedia", "beautifulsoup4", "tiktoken", "torch", "jinja2", "pinecone-client", "weaviate-client", "redis", "google-api-python-client", "wolframalpha", "qdrant-client", "tensorflow-text", "pypdf", "networkx", "nomic"]
+all = ["anthropic", "cohere", "openai", "nlpcloud", "huggingface_hub", "manifest-ml", "elasticsearch", "opensearch-py", "google-search-results", "faiss-cpu", "sentence-transformers", "transformers", "spacy", "nltk", "wikipedia", "beautifulsoup4", "tiktoken", "torch", "jinja2", "pinecone-client", "weaviate-client", "redis", "google-api-python-client", "wolframalpha", "qdrant-client", "tensorflow-text", "pypdf", "networkx", "nomic", "docker"]
 docker = ["docker"]
 llms = ["anthropic", "cohere", "openai", "nlpcloud", "huggingface_hub", "manifest-ml", "torch", "transformers"]
 [metadata]
 lock-version = "2.0"
 python-versions = ">=3.8.1,<4.0"
-content-hash = "449d9958004f9b0af5667b02f866313913f9bd9c939870898873c0e3198a9cb4"
+content-hash = "e817b6b0f985c4178f4cd1bc5bea92130e79092e5fff0d41a03a5dbcfe1047cd"
--- a/pyproject.toml
+++ b/pyproject.toml
@ -1,6 +1,6 @@
 [tool.poetry]
 name = "langchain"
-version = "0.0.95"
+version = "0.0.96"
 description = "Building applications with LLMs through composability"
 authors = []
 license = "MIT"
@ -51,6 +51,7 @@ pypdf = {version = "^3.4.0", optional = true}
 networkx = {version="^2.6.3", optional = true}
 aleph-alpha-client = {version="^2.15.0", optional = true}
 deeplake = {version = "^3.2.9", optional = true}
 docker = {version = "^6.0.1", optional = true}
 [tool.poetry.group.docs.dependencies]
 autodoc_pydantic = "^1.8.0"
@ -95,8 +96,9 @@ jupyter = "^1.0.0"
 playwright = "^1.28.0"
 [tool.poetry.extras]
 docker = ["docker"]
 llms = ["anthropic", "cohere", "openai", "nlpcloud", "huggingface_hub", "manifest-ml", "torch", "transformers"]
-all = ["anthropic", "cohere", "openai", "nlpcloud", "huggingface_hub", "manifest-ml", "elasticsearch", "opensearch-py", "google-search-results", "faiss-cpu", "sentence_transformers", "transformers", "spacy", "nltk", "wikipedia", "beautifulsoup4", "tiktoken", "torch", "jinja2", "pinecone-client", "weaviate-client", "redis", "google-api-python-client", "wolframalpha", "qdrant-client", "tensorflow-text", "pypdf", "networkx", "nomic"]
+all = ["anthropic", "cohere", "openai", "nlpcloud", "huggingface_hub", "manifest-ml", "elasticsearch", "opensearch-py", "google-search-results", "faiss-cpu", "sentence_transformers", "transformers", "spacy", "nltk", "wikipedia", "beautifulsoup4", "tiktoken", "torch", "jinja2", "pinecone-client", "weaviate-client", "redis", "google-api-python-client", "wolframalpha", "qdrant-client", "tensorflow-text", "pypdf", "networkx", "nomic", "docker"]
 [tool.ruff]
 select = [
--- a/tests/integration_tests/document_loaders/init.py
+++ b/tests/integration_tests/document_loaders/init.py
@ -0,0 +1 @@
 """Test document loader integrations."""
--- a/tests/integration_tests/document_loaders/test_ifixit.py
+++ b/tests/integration_tests/document_loaders/test_ifixit.py
@ -0,0 +1,37 @@
 from langchain.document_loaders.ifixit import IFixitLoader
 def test_ifixit_loader() -> None:
    """Test iFixit loader."""
    web_path = "https://www.ifixit.com/Guide/iPad+9+Battery+Replacement/151279"
    loader = IFixitLoader(web_path)
    assert loader.page_type == "Guide"
    assert loader.id == "151279"
    assert loader.web_path == web_path
 def test_ifixit_loader_teardown() -> None:
    web_path = "https://www.ifixit.com/Teardown/Banana+Teardown/811"
    loader = IFixitLoader(web_path)
    """ Teardowns are just guides by a different name """
    assert loader.page_type == "Guide"
    assert loader.id == "811"
 def test_ifixit_loader_device() -> None:
    web_path = "https://www.ifixit.com/Device/Standard_iPad"
    loader = IFixitLoader(web_path)
    """ Teardowns are just guides by a different name """
    assert loader.page_type == "Device"
    assert loader.id == "Standard_iPad"
 def test_ifixit_loader_answers() -> None:
    web_path = (
        "https://www.ifixit.com/Answers/View/318583/My+iPhone+6+is+typing+and+"
        "opening+apps+by+itself"
    )
    loader = IFixitLoader(web_path)
    assert loader.page_type == "Answers"
    assert loader.id == "318583"
--- a/tests/unit_tests/prompts/test_few_shot.py
+++ b/tests/unit_tests/prompts/test_few_shot.py
@ -85,3 +85,92 @@ def test_few_shot_functionality() -> None:
        "Now you try to talk about party."
    )
    assert output == expected_output
 def test_partial_init_string() -> None:
    """Test prompt can be initialized with partial variables."""
    prefix = "This is a test about {content}."
    suffix = "Now you try to talk about {new_content}."
    examples = [
        {"question": "foo", "answer": "bar"},
        {"question": "baz", "answer": "foo"},
    ]
    prompt = FewShotPromptTemplate(
        suffix=suffix,
        prefix=prefix,
        input_variables=["new_content"],
        partial_variables={"content": "animals"},
        examples=examples,
        example_prompt=EXAMPLE_PROMPT,
        example_separator="\n",
    )
    output = prompt.format(new_content="party")
    expected_output = (
        "This is a test about animals.\n"
        "foo: bar\n"
        "baz: foo\n"
        "Now you try to talk about party."
    )
    assert output == expected_output
 def test_partial_init_func() -> None:
    """Test prompt can be initialized with partial variables."""
    prefix = "This is a test about {content}."
    suffix = "Now you try to talk about {new_content}."
    examples = [
        {"question": "foo", "answer": "bar"},
        {"question": "baz", "answer": "foo"},
    ]
    prompt = FewShotPromptTemplate(
        suffix=suffix,
        prefix=prefix,
        input_variables=["new_content"],
        partial_variables={"content": lambda: "animals"},
        examples=examples,
        example_prompt=EXAMPLE_PROMPT,
        example_separator="\n",
    )
    output = prompt.format(new_content="party")
    expected_output = (
        "This is a test about animals.\n"
        "foo: bar\n"
        "baz: foo\n"
        "Now you try to talk about party."
    )
    assert output == expected_output
 def test_partial() -> None:
    """Test prompt can be partialed."""
    prefix = "This is a test about {content}."
    suffix = "Now you try to talk about {new_content}."
    examples = [
        {"question": "foo", "answer": "bar"},
        {"question": "baz", "answer": "foo"},
    ]
    prompt = FewShotPromptTemplate(
        suffix=suffix,
        prefix=prefix,
        input_variables=["content", "new_content"],
        examples=examples,
        example_prompt=EXAMPLE_PROMPT,
        example_separator="\n",
    )
    new_prompt = prompt.partial(content="foo")
    new_output = new_prompt.format(new_content="party")
    expected_output = (
        "This is a test about foo.\n"
        "foo: bar\n"
        "baz: foo\n"
        "Now you try to talk about party."
    )
    assert new_output == expected_output
    output = prompt.format(new_content="party", content="bar")
    expected_output = (
        "This is a test about bar.\n"
        "foo: bar\n"
        "baz: foo\n"
        "Now you try to talk about party."
    )
    assert output == expected_output
--- a/tests/unit_tests/prompts/test_prompt.py
+++ b/tests/unit_tests/prompts/test_prompt.py
@ -108,3 +108,40 @@ def test_prompt_from_file() -> None:
    input_variables = ["question"]
    prompt = PromptTemplate.from_file(template_file, input_variables)
    assert prompt.template == "Question: {question}\nAnswer:"
 def test_partial_init_string() -> None:
    """Test prompt can be initialized with partial variables."""
    template = "This is a {foo} test."
    prompt = PromptTemplate(
        input_variables=[], template=template, partial_variables={"foo": 1}
    )
    assert prompt.template == template
    assert prompt.input_variables == []
    result = prompt.format()
    assert result == "This is a 1 test."
 def test_partial_init_func() -> None:
    """Test prompt can be initialized with partial variables."""
    template = "This is a {foo} test."
    prompt = PromptTemplate(
        input_variables=[], template=template, partial_variables={"foo": lambda: 2}
    )
    assert prompt.template == template
    assert prompt.input_variables == []
    result = prompt.format()
    assert result == "This is a 2 test."
 def test_partial() -> None:
    """Test prompt can be partialed."""
    template = "This is a {foo} test."
    prompt = PromptTemplate(input_variables=["foo"], template=template)
    assert prompt.template == template
    assert prompt.input_variables == ["foo"]
    new_prompt = prompt.partial(foo="3")
    new_result = new_prompt.format()
    assert new_result == "This is a 3 test."
    result = prompt.format(foo="foo")
    assert result == "This is a foo test."
--- a/tests/unit_tests/test_docker.py
+++ b/tests/unit_tests/test_docker.py
@ -0,0 +1,129 @@
 """Test the docker wrapper utility."""
 import pytest
 import importlib
 from langchain.utilities.docker import gvisor_runtime_available
 from langchain.utilities.docker.tool import DockerWrapper, _default_params
 from unittest.mock import MagicMock
 import subprocess
 import time
 def docker_installed() -> bool:
    """Check if docker is installed locally."""
    try:
        subprocess.run(['which', 'docker',], check=True)
    except subprocess.CalledProcessError:
        return False
    return True
 def gvisor_installed() -> bool:
    """return true if gvisor local runtime is installed"""
    try:
        docker_lib = importlib.import_module('docker')
        client = docker_lib.from_env()
        return gvisor_runtime_available(client)
    except ImportError:
        return False
    return False
 def docker_lib_installed() -> bool:
    return importlib.util.find_spec('docker') is not None
 def skip_docker_tests() -> bool:
    return not docker_installed() or not docker_lib_installed()
@pytest.mark.skipif(skip_docker_tests(), reason="docker not installed")
 class TestDockerUtility:
    def test_default_image(self) -> None:
        """Test running a command with the default alpine image."""
        docker = DockerWrapper()
        output = docker.run('cat /etc/os-release')
        assert output.find('alpine')
    def test_shell_escaping(self) -> None:
        docker = DockerWrapper()
        output = docker.run('echo "hello world" | sed "s/world/you/g"')
        assert output == 'hello you'
        # using embedded quotes
        output = docker.run("echo 'hello world' | awk '{print $2}'")
        assert output == 'world'
    def test_auto_pull_image(self) -> None:
        docker = DockerWrapper(image='golang:1.20')
        output = docker.run("go version")
        assert output.find('go1.20')
        docker._docker_client.images.remove('golang:1.20')
    def test_inner_failing_command(self) -> None:
        """Test inner command with non zero exit"""
        docker = DockerWrapper()
        output = docker.run('ls /inner-failing-command')
        assert str(output).startswith("STDERR")
    def test_entrypoint_failure(self) -> None:
        """Test inner command with non zero exit"""
        docker = DockerWrapper()
        output = docker.run('todo handle APIError')
        assert str(output).startswith("STDERR") or str(output).startswith("ERROR")
    def test_check_gvisor_runtime(self) -> None:
        """test gVisor runtime verification using a mock docker client"""
        mock_client = MagicMock()
        mock_client.info.return_value = {'Runtimes': {'runsc': {'path': 'runsc'}}}
        assert gvisor_runtime_available(mock_client)
        mock_client.info.return_value = {'Runtimes': {'runc': {'path': 'runc'}}}
        assert not gvisor_runtime_available(mock_client)
    def test_exec_attached(self) -> None:
        """Test exec with attached mode."""
        # create a test container
        d = DockerWrapper()
        cont = d._docker_client.containers.run('alpine', '/bin/sh -s',
                                               detach=True,
                                               stdin_open=True)
        cont.start()
        # make sure the prompt is ready
        time.sleep(1)
        out = d.exec_attached("cat /etc/os-release", container=cont.id)
        assert out.find('alpine') != -1
        cont.kill()
        cont.remove(force=True)
    @pytest.mark.skipif(not gvisor_installed(), reason="gvisor not installed")
    def test_run_with_runtime_runsc(self) -> None:
        docker = DockerWrapper(image='shell')
        output = docker.run('dmesg')
        assert output.find('gVisor') != -1
    def test_socket_read_timeout(self) -> None:
        """Test socket read timeout."""
        docker = DockerWrapper(image='python', default_command=['python'])
        # this query should fail as python needs to be started with python3 -i
        output = docker.exec_run("test query", timeout=1)
        assert output == "ERROR: timeout"
 def test_get_image_template() -> None:
    """Test getting an image template instance from string."""
    from langchain.utilities.docker.images import get_image_template
    image = get_image_template("python")
    assert image.__name__ == "Python" #  type: ignore
 #FIX: failing split in two tests: with and without gvisor
 def test_default_params() -> None:
    """Test default container parameters."""
    docker = DockerWrapper(image="my_custom_image")
    assert docker._params == {**_default_params, "image": "my_custom_image"}
Author	SHA1	Message	Date
blob42	84d7ad397d	langchain-docker readme	2023-03-03 22:55:44 +01:00
blob42	de551d62a8	linting in docker and parallel make jobs - linting can be run in docker in parallel with `make -j4 docker.lint`	2023-03-03 22:55:44 +01:00
blob42	d8fd0e790c	enable test + lint on docker	2023-03-03 22:55:44 +01:00
blob42	97c2b31cc5	added all extra dependencies to dev image + customized builds - downgraded to python 3.10 to accomadate installing all dependencies - by default installs all dev + extra dependencies - option to install only dev dependencies by customizing .env file	2023-03-03 22:55:44 +01:00
blob42	f1dc03d0cc	docker development image and helper makefile separate makefile and build env: - separate makefile for docker - only show docker commands when docker detected in system - only rebuild container on change - use an unpriviliged user builder image and base dev image: - fully isolated environment inside container. - all venv installed inside container shell and available as commands. - ex: `docker run IMG jupyter notebook` to launch notebook. - pure python based container without poetry. - custom motd to add a message displayed to users when they connect to container. - print environment versions (git, package, python) on login - display help message when starting container	2023-03-03 22:55:44 +01:00
Harrison Chase	f76e9eaab1	bump version (#1342 )	2023-03-03 22:55:44 +01:00
Harrison Chase	db2e9c2b0d	partial variables (#1308 )	2023-03-03 22:55:44 +01:00
Tim Asp	d22651d82a	Add new iFixit document loader (#1333 ) iFixit is a wikipedia-like site that has a huge amount of open content on how to fix things, questions/answers for common troubleshooting and "things" related content that is more technical in nature. All content is licensed under CC-BY-SA-NC 3.0 Adding docs from iFixit as context for user questions like "I dropped my phone in water, what do I do?" or "My macbook pro is making a whining noise, what's wrong with it?" can yield significantly better responses than context free response from LLMs.	2023-03-03 22:55:44 +01:00
Matt Robinson	c46478d70e	feat: document loader for image files (#1330 ) ### Summary Adds a document loader for image files such as `.jpg` and `.png` files. ### Testing Run the following using the example document from the [`unstructured` repo](https://github.com/Unstructured-IO/unstructured/tree/main/example-docs). ```python from langchain.document_loaders.image import UnstructuredImageLoader loader = UnstructuredImageLoader("layout-parser-paper-fast.jpg") loader.load() ```	2023-03-03 22:55:44 +01:00
Eugene Yurtsev	e3fcc72879	Documentation: Minor typo fixes (#1327 ) Fixing a few minor typos in the documentation (and likely introducing other ones in the process).	2023-03-03 22:55:44 +01:00
blob42	2fdb1d842b	refactoring into submodules	2023-03-03 22:55:15 +01:00
blob42	c30ef7dbc4	drop network capabilities by default, example on using networking	2023-03-03 21:59:22 +01:00
blob42	8a7871ece3	add exec_attached: attach to running container and exec cmd	2023-03-03 21:22:45 +01:00
blob42	201ecdc9ee	fix run and exec_run default commands, actually use gVisor - run and exec_run need a separate default command. Run usually executes a script while exec_run simulates an interactive session. The image templates and run funcs have been upgraded to handle both types of commands. - test: make docker tests run when docker is installed and docker lib avaialble. - test that runsc runtime is used by default when gVisor is installed. (manually removing gVisor skips the test)	2023-03-02 22:33:17 +01:00
blob42	149fe0055e	exec_run fixes to keep stdin open	2023-03-02 20:39:48 +01:00
blob42	096b82f2a1	update notebook for utility	2023-03-02 20:32:10 +01:00
blob42	87b5a84cfb	update tests and docstrings	2023-03-02 19:33:48 +01:00
blob42	ed97aa65af	exec_run: add timeout and delay params - use `delay` to wait for sent payload to finish - use `timeout` to control how long to wait for output	2023-03-02 19:11:58 +01:00
blob42	c9e6baf60d	image templates, enhanced wrapper building with custom prameters - quickly run or exec_run commands with sane defaults - wip image templates with parameters for common docker images - shell escaping logic - capture stdout+stderr for exec commands - added minimal testing	2023-03-02 04:23:59 +01:00
blob42	7cde1cbfc3	docker: attach to container's stdin - wip image helper for optimized params with common images - gVisor runtime checker - make tests skipped if docker installed	2023-02-27 18:31:06 +01:00
blob42	17213209e0	stream stdin and stdout to container through docker API's socket	2023-02-27 18:31:06 +01:00
blob42	895f862662	docker wrapper tool for untrusted execution	2023-02-27 18:31:06 +01:00