docs[patch]: Improve Groq integration page (#22844)

Was bare bones and got marked by folks as unhelpful.

CC @efriis @colemccracken
pull/22843/head^2
Jacob Lee 4 weeks ago committed by GitHub
parent 12eff6a130
commit e01e5d5a91
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

@ -2,10 +2,15 @@
"cells": [
{
"cell_type": "raw",
"metadata": {},
"metadata": {
"vscode": {
"languageId": "raw"
}
},
"source": [
"---\n",
"sidebar_label: Groq\n",
"keywords: [chatgroq]\n",
"---"
]
},
@ -15,45 +20,67 @@
"source": [
"# Groq\n",
"\n",
"Install the langchain-groq package if not already installed:\n",
"\n",
"```bash\n",
"pip install langchain-groq\n",
"```\n",
"\n",
"Request an [API key](https://wow.groq.com) and set it as an environment variable:\n",
"LangChain supports integration with [Groq](https://groq.com/) chat models. Groq specializes in fast AI inference.\n",
"\n",
"```bash\n",
"export GROQ_API_KEY=<YOUR API KEY>\n",
"```\n",
"\n",
"Alternatively, you may configure the API key when you initialize ChatGroq."
"To get started, you'll first need to install the langchain-groq package:"
]
},
{
"cell_type": "markdown",
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"Import the ChatGroq class and initialize it with a model:"
"%pip install -qU langchain-groq"
]
},
{
"cell_type": "code",
"execution_count": 27,
"cell_type": "markdown",
"metadata": {},
"outputs": [],
"source": [
"from langchain_core.prompts import ChatPromptTemplate\n",
"from langchain_groq import ChatGroq"
"Request an [API key](https://wow.groq.com) and set it as an environment variable:\n",
"\n",
"```bash\n",
"export GROQ_API_KEY=<YOUR API KEY>\n",
"```\n",
"\n",
"Alternatively, you may configure the API key when you initialize ChatGroq.\n",
"\n",
"Here's an example of it in action:"
]
},
{
"cell_type": "code",
"execution_count": 28,
"execution_count": 8,
"metadata": {},
"outputs": [],
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content=\"Low latency is crucial for Large Language Models (LLMs) because it directly impacts the user experience, model performance, and overall efficiency. Here are some reasons why low latency is essential for LLMs:\\n\\n1. **Real-time Interaction**: LLMs are often used in applications that require real-time interaction, such as chatbots, virtual assistants, and language translation. Low latency ensures that the model responds quickly to user input, providing a seamless and engaging experience.\\n2. **Conversational Flow**: In conversational AI, latency can disrupt the natural flow of conversation. Low latency helps maintain a smooth conversation, allowing users to respond quickly and naturally, without feeling like they're waiting for the model to catch up.\\n3. **Model Performance**: High latency can lead to increased error rates, as the model may struggle to keep up with the input pace. Low latency enables the model to process information more efficiently, resulting in better accuracy and performance.\\n4. **Scalability**: As the number of users and requests increases, low latency becomes even more critical. It allows the model to handle a higher volume of requests without sacrificing performance, making it more scalable and efficient.\\n5. **Resource Utilization**: Low latency can reduce the computational resources required to process requests. By minimizing latency, you can optimize resource allocation, reduce costs, and improve overall system efficiency.\\n6. **User Experience**: High latency can lead to frustration, abandonment, and a poor user experience. Low latency ensures that users receive timely responses, which is essential for building trust and satisfaction.\\n7. **Competitive Advantage**: In applications like customer service or language translation, low latency can be a key differentiator. It can provide a competitive advantage by offering a faster and more responsive experience, setting your application apart from others.\\n8. **Edge Computing**: With the increasing adoption of edge computing, low latency is critical for processing data closer to the user. This reduces latency even further, enabling real-time processing and analysis of data.\\n9. **Real-time Analytics**: Low latency enables real-time analytics and insights, which are essential for applications like sentiment analysis, trend detection, and anomaly detection.\\n10. **Future-Proofing**: As LLMs continue to evolve and become more complex, low latency will become even more critical. By prioritizing low latency now, you'll be better prepared to handle the demands of future LLM applications.\\n\\nIn summary, low latency is vital for LLMs because it ensures a seamless user experience, improves model performance, and enables efficient resource utilization. By prioritizing low latency, you can build more effective, scalable, and efficient LLM applications that meet the demands of real-time interaction and processing.\", response_metadata={'token_usage': {'completion_tokens': 541, 'prompt_tokens': 33, 'total_tokens': 574, 'completion_time': 1.499777658, 'prompt_time': 0.008344704, 'queue_time': None, 'total_time': 1.508122362}, 'model_name': 'llama3-70b-8192', 'system_fingerprint': 'fp_87cbfbbc4d', 'finish_reason': 'stop', 'logprobs': None}, id='run-49dad960-ace8-4cd7-90b3-2db99ecbfa44-0')"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chat = ChatGroq(temperature=0, model_name=\"mixtral-8x7b-32768\")"
"from langchain_core.prompts import ChatPromptTemplate\n",
"from langchain_groq import ChatGroq\n",
"\n",
"chat = ChatGroq(\n",
" temperature=0,\n",
" model=\"llama3-70b-8192\",\n",
" # api_key=\"\" # Optional if not set as an environment variable\n",
")\n",
"\n",
"system = \"You are a helpful assistant.\"\n",
"human = \"{text}\"\n",
"prompt = ChatPromptTemplate.from_messages([(\"system\", system), (\"human\", human)])\n",
"\n",
"chain = prompt | chat\n",
"chain.invoke({\"text\": \"Explain the importance of low latency for LLMs.\"})"
]
},
{
@ -62,97 +89,206 @@
"source": [
"You can view the available models [here](https://console.groq.com/docs/models).\n",
"\n",
"If you do not want to set your API key in the environment, you can pass it directly to the client:\n",
"```python\n",
"chat = ChatGroq(temperature=0, groq_api_key=\"YOUR_API_KEY\", model_name=\"mixtral-8x7b-32768\")\n",
"## Tool calling\n",
"\n",
"```"
"Groq chat models support [tool calling](/docs/how_to/tool_calling/) to generate output matching a specific schema. The model may choose to call multiple tools or the same tool multiple times if appropriate.\n",
"\n",
"Here's an example:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[{'name': 'get_current_weather',\n",
" 'args': {'location': 'San Francisco', 'unit': 'Celsius'},\n",
" 'id': 'call_pydj'},\n",
" {'name': 'get_current_weather',\n",
" 'args': {'location': 'Tokyo', 'unit': 'Celsius'},\n",
" 'id': 'call_jgq3'}]"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from typing import Optional\n",
"\n",
"from langchain_core.tools import tool\n",
"\n",
"\n",
"@tool\n",
"def get_current_weather(location: str, unit: Optional[str]):\n",
" \"\"\"Get the current weather in a given location\"\"\"\n",
" return \"Cloudy with a chance of rain.\"\n",
"\n",
"\n",
"tool_model = chat.bind_tools([get_current_weather], tool_choice=\"auto\")\n",
"\n",
"res = tool_model.invoke(\"What is the weather like in San Francisco and Tokyo?\")\n",
"\n",
"res.tool_calls"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Write a prompt and invoke ChatGroq to create completions:"
"### `.with_structured_output()`\n",
"\n",
"You can also use the convenience [`.with_structured_output()`](/docs/how_to/structured_output/#the-with_structured_output-method) method to coerce `ChatGroq` into returning a structured output.\n",
"Here is an example:"
]
},
{
"cell_type": "code",
"execution_count": 29,
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content='Low Latency Large Language Models (LLMs) are a type of artificial intelligence model that can understand and generate human-like text. The term \"low latency\" refers to the model\\'s ability to process and respond to inputs quickly, with minimal delay.\\n\\nThe importance of low latency in LLMs can be explained through the following points:\\n\\n1. Improved user experience: In real-time applications such as chatbots, virtual assistants, and interactive games, users expect quick and responsive interactions. Low latency LLMs can provide instant feedback and responses, creating a more seamless and engaging user experience.\\n\\n2. Better decision-making: In time-sensitive scenarios, such as financial trading or autonomous vehicles, low latency LLMs can quickly process and analyze vast amounts of data, enabling faster and more informed decision-making.\\n\\n3. Enhanced accessibility: For individuals with disabilities, low latency LLMs can help create more responsive and inclusive interfaces, such as voice-controlled assistants or real-time captioning systems.\\n\\n4. Competitive advantage: In industries where real-time data analysis and decision-making are crucial, low latency LLMs can provide a competitive edge by enabling businesses to react more quickly to market changes, customer needs, or emerging opportunities.\\n\\n5. Scalability: Low latency LLMs can efficiently handle a higher volume of requests and interactions, making them more suitable for large-scale applications and services.\\n\\nIn summary, low latency is an essential aspect of LLMs, as it significantly impacts user experience, decision-making, accessibility, competitiveness, and scalability. By minimizing delays and response times, low latency LLMs can unlock new possibilities and applications for artificial intelligence in various industries and scenarios.')"
"Joke(setup='Why did the cat join a band?', punchline='Because it wanted to be the purr-cussionist!', rating=None)"
]
},
"execution_count": 29,
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"system = \"You are a helpful assistant.\"\n",
"human = \"{text}\"\n",
"prompt = ChatPromptTemplate.from_messages([(\"system\", system), (\"human\", human)])\n",
"from langchain_core.pydantic_v1 import BaseModel, Field\n",
"\n",
"chain = prompt | chat\n",
"chain.invoke({\"text\": \"Explain the importance of low latency LLMs.\"})"
"\n",
"class Joke(BaseModel):\n",
" \"\"\"Joke to tell user.\"\"\"\n",
"\n",
" setup: str = Field(description=\"The setup of the joke\")\n",
" punchline: str = Field(description=\"The punchline to the joke\")\n",
" rating: Optional[int] = Field(description=\"How funny the joke is, from 1 to 10\")\n",
"\n",
"\n",
"structured_llm = chat.with_structured_output(Joke)\n",
"\n",
"structured_llm.invoke(\"Tell me a joke about cats\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## `ChatGroq` also supports async and streaming functionality:"
"Behind the scenes, this takes advantage of the above tool calling functionality.\n",
"\n",
"## Async"
]
},
{
"cell_type": "code",
"execution_count": 32,
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content=\"There's a star that shines up in the sky,\\nThe Sun, that makes the day bright and spry.\\nIt rises and sets,\\nIn a daily, predictable bet,\\nGiving life to the world, oh my!\")"
"AIMessage(content='Here is a limerick about the sun:\\n\\nThere once was a sun in the sky,\\nWhose warmth and light caught the eye,\\nIt shone bright and bold,\\nWith a fiery gold,\\nAnd brought life to all, as it flew by.', response_metadata={'token_usage': {'completion_tokens': 51, 'prompt_tokens': 18, 'total_tokens': 69, 'completion_time': 0.144614022, 'prompt_time': 0.00585394, 'queue_time': None, 'total_time': 0.150467962}, 'model_name': 'llama3-70b-8192', 'system_fingerprint': 'fp_2f30b0b571', 'finish_reason': 'stop', 'logprobs': None}, id='run-e42340ba-f0ad-4b54-af61-8308d8ec8256-0')"
]
},
"execution_count": 32,
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chat = ChatGroq(temperature=0, model_name=\"mixtral-8x7b-32768\")\n",
"chat = ChatGroq(temperature=0, model=\"llama3-70b-8192\")\n",
"prompt = ChatPromptTemplate.from_messages([(\"human\", \"Write a Limerick about {topic}\")])\n",
"chain = prompt | chat\n",
"await chain.ainvoke({\"topic\": \"The Sun\"})"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Streaming"
]
},
{
"cell_type": "code",
"execution_count": 33,
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The moon's gentle glow\n",
"Illuminates the night sky\n",
"Peaceful and serene"
"Silvery glow bright\n",
"Luna's gentle light shines down\n",
"Midnight's gentle queen"
]
}
],
"source": [
"chat = ChatGroq(temperature=0, model_name=\"llama2-70b-4096\")\n",
"chat = ChatGroq(temperature=0, model=\"llama3-70b-8192\")\n",
"prompt = ChatPromptTemplate.from_messages([(\"human\", \"Write a haiku about {topic}\")])\n",
"chain = prompt | chat\n",
"for chunk in chain.stream({\"topic\": \"The Moon\"}):\n",
" print(chunk.content, end=\"\", flush=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Passing custom parameters\n",
"\n",
"You can pass other Groq-specific parameters using the `model_kwargs` argument on initialization. Here's an example of enabling JSON mode:"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content='{ \"response\": \"That\\'s a tough question! There are eight species of bears found in the world, and each one is unique and amazing in its own way. However, if I had to pick one, I\\'d say the giant panda is a popular favorite among many people. Who can resist those adorable black and white markings?\", \"followup_question\": \"Would you like to know more about the giant panda\\'s habitat and diet?\" }', response_metadata={'token_usage': {'completion_tokens': 89, 'prompt_tokens': 50, 'total_tokens': 139, 'completion_time': 0.249032839, 'prompt_time': 0.011134497, 'queue_time': None, 'total_time': 0.260167336}, 'model_name': 'llama3-70b-8192', 'system_fingerprint': 'fp_2f30b0b571', 'finish_reason': 'stop', 'logprobs': None}, id='run-558ce67e-8c63-43fe-a48f-6ecf181bc922-0')"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chat = ChatGroq(\n",
" model=\"llama3-70b-8192\", model_kwargs={\"response_format\": {\"type\": \"json_object\"}}\n",
")\n",
"\n",
"system = \"\"\"\n",
"You are a helpful assistant.\n",
"Always respond with a JSON object with two string keys: \"response\" and \"followup_question\".\n",
"\"\"\"\n",
"human = \"{question}\"\n",
"prompt = ChatPromptTemplate.from_messages([(\"system\", system), (\"human\", human)])\n",
"\n",
"chain = prompt | chat\n",
"\n",
"chain.invoke({\"question\": \"what bear is best?\"})"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
@ -171,7 +307,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.13"
"version": "3.10.5"
}
},
"nbformat": 4,

@ -141,7 +141,7 @@ CHAT_MODEL_TEMPLATE = """\
---
sidebar_position: 0
sidebar_class_name: hidden
keywords: [compatibility, bind_tools, tool calling, function calling, structured output, with_structured_output, json mode, local model]
keywords: [compatibility]
custom_edit_url:
hide_table_of_contents: true
---

Loading…
Cancel
Save