Include information on the tools for creating gbnf grammar files in the llama-cpp notebook (#11764)

Hi,

I recently experimented with grammar-based sampling and discovered two
methods for speeding up the creation of gbnf grammar files:
1. [Online grammar generator
app](https://github.com/ggerganov/llama.cpp/discussions/2494) introduced
[here](https://github.com/ggerganov/llama.cpp/discussions/2494)
2.
[Script](https://github.com/ggerganov/llama.cpp/blob/master/examples/json-schema-to-grammar.py)
for parsing json schema to gbnf grammar

I believe it is a good idea to include the information that leads to
them in the `llama-cpp` notebook.

***

Codespell check fails but due to the unrelated script

Co-authored-by: Bagatur <baskaryan@gmail.com>
This commit is contained in:
eryk-dsai 2023-10-17 05:28:32 +02:00 committed by GitHub
parent c15701eebf
commit c4341463e8
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -189,7 +189,8 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"from langchain.llms import LlamaCpp\n", "from langchain.llms import LlamaCpp\n",
"from langchain.prompts import PromptTemplate\nfrom langchain.chains import LLMChain\n", "from langchain.prompts import PromptTemplate\n",
"from langchain.chains import LLMChain\n",
"from langchain.callbacks.manager import CallbackManager\n", "from langchain.callbacks.manager import CallbackManager\n",
"from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler" "from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler"
] ]
@ -532,12 +533,20 @@
"source": [ "source": [
"### Grammars\n", "### Grammars\n",
"\n", "\n",
"We can use [grammars](https://github.com/ggerganov/llama.cpp/blob/master/grammars/README.md) to constrain model outputs and sample tokens based on the rules defined in them.\n",
"\n", "\n",
"We can specify [grammars](https://github.com/ggerganov/llama.cpp/blob/master/grammars/README.md) to constrain model outputs.\n", "To demonstrate this concept, we've included [sample grammar files](https://github.com/langchain-ai/langchain/tree/master/libs/langchain/langchain/llms/grammars), that will be used in the examples below.\n",
"\n", "\n",
"This will sample tokens according to the grammar.\n", "Creating gbnf grammar files can be time-consuming, but if you have a use-case where output schemas are important, there are two tools that can help:\n",
" \n", "- [Online grammar generator app](https://grammar.intrinsiclabs.ai/) that converts TypeScript interface definitions to gbnf file.\n",
"For example, supply the path to the specifed `json.gbnf` file in order to produce JSON." "- [Python script](https://github.com/ggerganov/llama.cpp/blob/master/examples/json-schema-to-grammar.py) for converting json schema to gbnf file. You can for example create `pydantic` object, generate its JSON schema using `.schema_json()` method, and then use this script to convert it to gbnf file."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the first example, supply the path to the specifed `json.gbnf` file in order to produce JSON:"
] ]
}, },
{ {
@ -612,7 +621,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"We can also supply `list.gbnf` to return a list." "We can also supply `list.gbnf` to return a list:"
] ]
}, },
{ {
@ -667,7 +676,7 @@
], ],
"metadata": { "metadata": {
"kernelspec": { "kernelspec": {
"display_name": "Python 3 (ipykernel)", "display_name": "Python 3.10.12 ('langchain_venv': venv)",
"language": "python", "language": "python",
"name": "python3" "name": "python3"
}, },
@ -681,7 +690,12 @@
"name": "python", "name": "python",
"nbconvert_exporter": "python", "nbconvert_exporter": "python",
"pygments_lexer": "ipython3", "pygments_lexer": "ipython3",
"version": "3.9.16" "version": "3.10.12"
},
"vscode": {
"interpreter": {
"hash": "d1d3a3c58a58885896c5459933a599607cdbb9917d7e1ad7516c8786c51f2dd2"
}
} }
}, },
"nbformat": 4, "nbformat": 4,