From c4341463e84fa032259ba0620c027de00263d7a8 Mon Sep 17 00:00:00 2001 From: eryk-dsai <142571618+eryk-dsai@users.noreply.github.com> Date: Tue, 17 Oct 2023 05:28:32 +0200 Subject: [PATCH] Include information on the tools for creating gbnf grammar files in the llama-cpp notebook (#11764) Hi, I recently experimented with grammar-based sampling and discovered two methods for speeding up the creation of gbnf grammar files: 1. [Online grammar generator app](https://github.com/ggerganov/llama.cpp/discussions/2494) introduced [here](https://github.com/ggerganov/llama.cpp/discussions/2494) 2. [Script](https://github.com/ggerganov/llama.cpp/blob/master/examples/json-schema-to-grammar.py) for parsing json schema to gbnf grammar I believe it is a good idea to include the information that leads to them in the `llama-cpp` notebook. *** Codespell check fails but due to the unrelated script Co-authored-by: Bagatur --- docs/docs/integrations/llms/llamacpp.ipynb | 30 ++++++++++++++++------ 1 file changed, 22 insertions(+), 8 deletions(-) diff --git a/docs/docs/integrations/llms/llamacpp.ipynb b/docs/docs/integrations/llms/llamacpp.ipynb index 45df2ac8a8..cf9fa21bb5 100644 --- a/docs/docs/integrations/llms/llamacpp.ipynb +++ b/docs/docs/integrations/llms/llamacpp.ipynb @@ -189,7 +189,8 @@ "outputs": [], "source": [ "from langchain.llms import LlamaCpp\n", - "from langchain.prompts import PromptTemplate\nfrom langchain.chains import LLMChain\n", + "from langchain.prompts import PromptTemplate\n", + "from langchain.chains import LLMChain\n", "from langchain.callbacks.manager import CallbackManager\n", "from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler" ] @@ -532,12 +533,20 @@ "source": [ "### Grammars\n", "\n", + "We can use [grammars](https://github.com/ggerganov/llama.cpp/blob/master/grammars/README.md) to constrain model outputs and sample tokens based on the rules defined in them.\n", "\n", - "We can specify [grammars](https://github.com/ggerganov/llama.cpp/blob/master/grammars/README.md) to constrain model outputs.\n", + "To demonstrate this concept, we've included [sample grammar files](https://github.com/langchain-ai/langchain/tree/master/libs/langchain/langchain/llms/grammars), that will be used in the examples below.\n", "\n", - "This will sample tokens according to the grammar.\n", - " \n", - "For example, supply the path to the specifed `json.gbnf` file in order to produce JSON." + "Creating gbnf grammar files can be time-consuming, but if you have a use-case where output schemas are important, there are two tools that can help:\n", + "- [Online grammar generator app](https://grammar.intrinsiclabs.ai/) that converts TypeScript interface definitions to gbnf file.\n", + "- [Python script](https://github.com/ggerganov/llama.cpp/blob/master/examples/json-schema-to-grammar.py) for converting json schema to gbnf file. You can for example create `pydantic` object, generate its JSON schema using `.schema_json()` method, and then use this script to convert it to gbnf file." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In the first example, supply the path to the specifed `json.gbnf` file in order to produce JSON:" ] }, { @@ -612,7 +621,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We can also supply `list.gbnf` to return a list." + "We can also supply `list.gbnf` to return a list:" ] }, { @@ -667,7 +676,7 @@ ], "metadata": { "kernelspec": { - "display_name": "Python 3 (ipykernel)", + "display_name": "Python 3.10.12 ('langchain_venv': venv)", "language": "python", "name": "python3" }, @@ -681,7 +690,12 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.9.16" + "version": "3.10.12" + }, + "vscode": { + "interpreter": { + "hash": "d1d3a3c58a58885896c5459933a599607cdbb9917d7e1ad7516c8786c51f2dd2" + } } }, "nbformat": 4,