docs[patch]: Lower temperature in chatbot usecase notebooks for consistency (#16750)

CC @baskaryan
5 months ago · 4a027e622f
parent 12d2b2ebcf
commit 4a027e622f
2 changed files with 74 additions and 74 deletions
--- a/docs/docs/use_cases/chatbots/quickstart.ipynb
+++ b/docs/docs/use_cases/chatbots/quickstart.ipynb
@ -87,7 +87,7 @@
   "source": [
    "from langchain_openai import ChatOpenAI\n",
    "\n",
-    "chat = ChatOpenAI(model=\"gpt-3.5-turbo-1106\")"
+    "chat = ChatOpenAI(model=\"gpt-3.5-turbo-1106\", temperature=0.2)"
   ]
  },
  {
@ -105,7 +105,7 @@
    {
     "data": {
      "text/plain": [
-       "AIMessage(content=\"J'aime programmer.\")"
+       "AIMessage(content=\"J'adore la programmation.\")"
      ]
     },
     "execution_count": 3,
@ -140,7 +140,7 @@
    {
     "data": {
      "text/plain": [
-       "AIMessage(content='I said: \"What did you just say?\"')"
+       "AIMessage(content='I said, \"What did you just say?\"')"
      ]
     },
     "execution_count": 4,
@ -316,7 +316,7 @@
    {
     "data": {
      "text/plain": [
-       "AIMessage(content='\"I love programming\" translates to \"J\\'adore programmer\" in French.')"
+       "AIMessage(content='The translation of \"I love programming\" in French is \"J\\'adore la programmation.\"')"
      ]
     },
     "execution_count": 9,
@ -342,7 +342,7 @@
    {
     "data": {
      "text/plain": [
-       "AIMessage(content='I said, \"I love programming\" translates to \"J\\'adore programmer\" in French.')"
+       "AIMessage(content='I said \"J\\'adore la programmation,\" which means \"I love programming\" in French.')"
      ]
     },
     "execution_count": 10,
@ -535,7 +535,7 @@
    {
     "data": {
      "text/plain": [
-       "'LangSmith can assist with testing in several ways. Firstly, it simplifies the construction of datasets for testing and evaluation, allowing you to quickly edit examples and add them to datasets. This helps expand the surface area of your evaluation sets and fine-tune your model for improved quality or reduced costs.\\n\\nAdditionally, LangSmith can be used to monitor your application in much the same way as it is used for debugging. You can log all traces, visualize latency and token usage statistics, and troubleshoot specific issues as they arise. This monitoring capability can be invaluable for testing the performance and reliability of your application as it moves towards production.'"
+       "'LangSmith can help with testing in several ways:\\n\\n1. Dataset Expansion: LangSmith allows you to quickly edit examples and add them to datasets, which expands the surface area of your evaluation sets. This helps in testing a wider range of scenarios and inputs.\\n\\n2. Model Fine-Tuning: You can use LangSmith to fine-tune a model for improved quality or reduced costs, which is essential for testing changes and ensuring optimized performance.\\n\\n3. Monitoring: LangSmith can be used to monitor your application by logging all traces, visualizing latency and token usage statistics, and troubleshooting specific issues as they arise. This monitoring capability is crucial for testing and evaluating the performance of your application in production.\\n\\n4. Construction of Datasets: LangSmith simplifies the process of constructing datasets, either by using existing datasets or by hand-crafting small datasets for rigorous testing of changes.\\n\\nOverall, LangSmith provides tools and capabilities that support thorough testing of applications and models, ultimately contributing to the reliability and performance of your systems.'"
      ]
     },
     "execution_count": 17,
@ -606,7 +606,7 @@
       "  Document(page_content='inputs, and see what happens. At some point though, our application is performing\\nwell and we want to be more rigorous about testing changes. We can use a dataset\\nthat we’ve constructed along the way (see above). Alternatively, we could spend some\\ntime constructing a small dataset by hand. For these situations, LangSmith simplifies', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'}),\n",
       "  Document(page_content='Skip to main content🦜️🛠️ LangSmith DocsPython DocsJS/TS DocsSearchGo to AppLangSmithOverviewTracingTesting & EvaluationOrganizationsHubLangSmith CookbookOverviewOn this pageLangSmith Overview and User GuideBuilding reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.Over the past two months, we at LangChain', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'}),\n",
       "  Document(page_content='have been building and using LangSmith with the goal of bridging this gap. This is our tactical user guide to outline effective ways to use LangSmith and maximize its benefits.On by default\\u200bAt LangChain, all of us have LangSmith’s tracing running in the background by default. On the Python side, this is achieved by setting environment variables, which we establish whenever we launch a virtual environment or open our bash shell and leave them set. The same principle applies to most JavaScript', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'})],\n",
-       " 'answer': 'LangSmith can aid in testing by providing the ability to quickly edit examples and add them to datasets, thereby expanding the range of evaluation sets. This feature enables you to fine-tune a model for improved quality or reduced costs. Additionally, LangSmith simplifies the process of constructing small datasets by hand, offering an alternative approach for rigorous testing of changes. It also facilitates monitoring of application performance, allowing you to log all traces, visualize latency and token usage statistics, and troubleshoot specific issues as they arise.'}"
+       " 'answer': 'LangSmith can help with testing by simplifying the process of constructing and using datasets for testing and evaluation. It allows you to quickly edit examples and add them to datasets, thereby expanding the surface area of your evaluation sets. This can be useful for fine-tuning a model for improved quality or reduced costs. Additionally, LangSmith enables monitoring of your application by allowing you to log all traces, visualize latency and token usage statistics, and troubleshoot specific issues as they arise. This helps in rigorously testing changes and ensuring that your application performs well in production.'}"
      ]
     },
     "execution_count": 19,
@ -633,13 +633,13 @@
     "data": {
      "text/plain": [
       "{'messages': [HumanMessage(content='how can langsmith help with testing?'),\n",
-       "  AIMessage(content='LangSmith can aid in testing by providing the ability to quickly edit examples and add them to datasets, thereby expanding the range of evaluation sets. This feature enables you to fine-tune a model for improved quality or reduced costs. Additionally, LangSmith simplifies the process of constructing small datasets by hand, offering an alternative approach for rigorous testing of changes. It also facilitates monitoring of application performance, allowing you to log all traces, visualize latency and token usage statistics, and troubleshoot specific issues as they arise.'),\n",
+       "  AIMessage(content='LangSmith can help with testing by simplifying the process of constructing and using datasets for testing and evaluation. It allows you to quickly edit examples and add them to datasets, thereby expanding the surface area of your evaluation sets. This can be useful for fine-tuning a model for improved quality or reduced costs. Additionally, LangSmith enables monitoring of your application by allowing you to log all traces, visualize latency and token usage statistics, and troubleshoot specific issues as they arise. This helps in rigorously testing changes and ensuring that your application performs well in production.'),\n",
       "  HumanMessage(content='tell me more about that!')],\n",
       " 'context': [Document(page_content='however, there is still no complete substitute for human review to get the utmost quality and reliability from your application.', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'}),\n",
       "  Document(page_content='You can also quickly edit examples and add them to datasets to expand the surface area of your evaluation sets or to fine-tune a model for improved quality or reduced costs.Monitoring\\u200bAfter all this, your app might finally ready to go in production. LangSmith can also be used to monitor your application in much the same way that you used for debugging. You can log all traces, visualize latency and token usage statistics, and troubleshoot specific issues as they arise. Each run can also be', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'}),\n",
       "  Document(page_content=\"against these known issues.Why is this so impactful? When building LLM applications, it’s often common to start without a dataset of any kind. This is part of the power of LLMs! They are amazing zero-shot learners, making it possible to get started as easily as possible. But this can also be a curse -- as you adjust the prompt, you're wandering blind. You don’t have any examples to benchmark your changes against.LangSmith addresses this problem by including an “Add to Dataset” button for each\", metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'}),\n",
-       "  Document(page_content='environments through process.env1.The benefit here is that all calls to LLMs, chains, agents, tools, and retrievers are logged to LangSmith. Around 90% of the time we don’t even look at the traces, but the 10% of the time that we do… it’s so helpful. We can use LangSmith to debug:An unexpected end resultWhy an agent is loopingWhy a chain was slower than expectedHow many tokens an agent usedDebugging\\u200bDebugging LLMs, chains, and agents can be tough. LangSmith helps solve the following pain', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'})],\n",
-       " 'answer': 'LangSmith offers a feature that allows users to quickly edit examples and add them to datasets, which can help expand the surface area of evaluation sets or fine-tune a model for improved quality or reduced costs. This enables users to build small datasets by hand, providing a means for rigorous testing of changes or improvements to their application. Additionally, LangSmith facilitates the monitoring of application performance, allowing users to log all traces, visualize latency and token usage statistics, and troubleshoot specific issues as they arise. This comprehensive approach ensures thorough testing and monitoring of your application.'}"
+       "  Document(page_content='playground. Here, you can modify the prompt and re-run it to observe the resulting changes to the output - as many times as needed!Currently, this feature supports only OpenAI and Anthropic models and works for LLM and Chat Model calls. We plan to extend its functionality to more LLM types, chains, agents, and retrievers in the future.What is the exact sequence of events?\\u200bIn complicated chains and agents, it can often be hard to understand what is going on under the hood. What calls are being', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'})],\n",
+       " 'answer': 'LangSmith facilitates testing by providing the capability to edit examples and add them to datasets, which in turn expands the range of scenarios for evaluating your application. This feature enables you to fine-tune your model for better quality and cost-effectiveness. Additionally, LangSmith allows for monitoring applications by logging traces, visualizing latency and token usage statistics, and troubleshooting specific issues as they arise. This comprehensive testing and monitoring functionality ensure that your application is robust and performs optimally in a production environment.'}"
      ]
     },
     "execution_count": 20,
@ -676,7 +676,7 @@
    {
     "data": {
      "text/plain": [
-       "'LangSmith offers the capability to edit examples and add them to datasets, which is beneficial for expanding the range of evaluation sets. This feature allows for the fine-tuning of models, enabling improved quality and reduced costs. Additionally, LangSmith simplifies the process of constructing small datasets by hand, providing an alternative approach for rigorous testing of changes. It also supports monitoring of application performance, including logging traces, visualizing latency and token usage statistics, and troubleshooting specific issues as they arise. This comprehensive functionality contributes to the effectiveness of testing and quality assurance processes.'"
+       "'LangSmith provides a convenient way to modify prompts and re-run them to observe the resulting changes to the output. This \"Add to Dataset\" feature allows you to iteratively adjust the prompt and observe the impact on the output, enabling you to test and refine your prompts as many times as needed. This is particularly valuable when working with language model applications, as it helps address the challenge of starting without a dataset and allows for benchmarking changes against existing examples. Currently, this feature supports OpenAI and Anthropic models for LLM and Chat Model calls, with plans to extend its functionality to more LLM types, chains, agents, and retrievers in the future. Overall, LangSmith\\'s testing capabilities provide a valuable tool for refining and optimizing language model applications.'"
      ]
     },
     "execution_count": 21,
@ -742,7 +742,7 @@
       "[Document(page_content='however, there is still no complete substitute for human review to get the utmost quality and reliability from your application.', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'}),\n",
       " Document(page_content='You can also quickly edit examples and add them to datasets to expand the surface area of your evaluation sets or to fine-tune a model for improved quality or reduced costs.Monitoring\\u200bAfter all this, your app might finally ready to go in production. LangSmith can also be used to monitor your application in much the same way that you used for debugging. You can log all traces, visualize latency and token usage statistics, and troubleshoot specific issues as they arise. Each run can also be', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'}),\n",
       " Document(page_content=\"against these known issues.Why is this so impactful? When building LLM applications, it’s often common to start without a dataset of any kind. This is part of the power of LLMs! They are amazing zero-shot learners, making it possible to get started as easily as possible. But this can also be a curse -- as you adjust the prompt, you're wandering blind. You don’t have any examples to benchmark your changes against.LangSmith addresses this problem by including an “Add to Dataset” button for each\", metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'}),\n",
-       " Document(page_content='environments through process.env1.The benefit here is that all calls to LLMs, chains, agents, tools, and retrievers are logged to LangSmith. Around 90% of the time we don’t even look at the traces, but the 10% of the time that we do… it’s so helpful. We can use LangSmith to debug:An unexpected end resultWhy an agent is loopingWhy a chain was slower than expectedHow many tokens an agent usedDebugging\\u200bDebugging LLMs, chains, and agents can be tough. LangSmith helps solve the following pain', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'})]"
+       " Document(page_content='playground. Here, you can modify the prompt and re-run it to observe the resulting changes to the output - as many times as needed!Currently, this feature supports only OpenAI and Anthropic models and works for LLM and Chat Model calls. We plan to extend its functionality to more LLM types, chains, agents, and retrievers in the future.What is the exact sequence of events?\\u200bIn complicated chains and agents, it can often be hard to understand what is going on under the hood. What calls are being', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'})]"
      ]
     },
     "execution_count": 23,
@ -763,7 +763,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 27,
+   "execution_count": 24,
   "metadata": {},
   "outputs": [],
   "source": [
@ -772,7 +772,7 @@
    "\n",
    "# We need a prompt that we can pass into an LLM to generate a transformed search query\n",
    "\n",
-    "chat = ChatOpenAI(model=\"gpt-3.5-turbo-1106\")\n",
+    "chat = ChatOpenAI(model=\"gpt-3.5-turbo-1106\", temperature=0.2)\n",
    "\n",
    "query_transform_prompt = ChatPromptTemplate.from_messages(\n",
    "    [\n",
@ -805,7 +805,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 28,
+   "execution_count": 25,
   "metadata": {},
   "outputs": [],
   "source": [
@ -829,22 +829,22 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 29,
+   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'messages': [HumanMessage(content='how can langsmith help with testing?'),\n",
-       "  AIMessage(content='LangSmith can help with testing by allowing you to quickly edit examples and add them to datasets to expand the surface area of your evaluation sets. This means you can fine-tune a model for improved quality or reduced costs. Additionally, LangSmith provides monitoring capabilities to log all traces, visualize latency and token usage statistics, and troubleshoot specific issues as they arise during testing. This allows you to monitor your application and ensure that it is performing as expected, making it easier to identify and address any issues that may arise during testing.')],\n",
+       "  AIMessage(content='LangSmith can help with testing by providing tracing capabilities that allow you to monitor and log all traces, visualize latency, and track token usage statistics. This can be invaluable for testing the performance and reliability of your prompts, chains, and agents. Additionally, LangSmith enables you to troubleshoot specific issues as they arise during testing, allowing for more effective debugging and optimization of your applications.')],\n",
       " 'context': [Document(page_content='You can also quickly edit examples and add them to datasets to expand the surface area of your evaluation sets or to fine-tune a model for improved quality or reduced costs.Monitoring\\u200bAfter all this, your app might finally ready to go in production. LangSmith can also be used to monitor your application in much the same way that you used for debugging. You can log all traces, visualize latency and token usage statistics, and troubleshoot specific issues as they arise. Each run can also be', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'}),\n",
       "  Document(page_content='Skip to main content🦜️🛠️ LangSmith DocsPython DocsJS/TS DocsSearchGo to AppLangSmithOverviewTracingTesting & EvaluationOrganizationsHubLangSmith CookbookOverviewOn this pageLangSmith Overview and User GuideBuilding reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.Over the past two months, we at LangChain', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'}),\n",
       "  Document(page_content='LangSmith Overview and User Guide | 🦜️🛠️ LangSmith', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'}),\n",
       "  Document(page_content='have been building and using LangSmith with the goal of bridging this gap. This is our tactical user guide to outline effective ways to use LangSmith and maximize its benefits.On by default\\u200bAt LangChain, all of us have LangSmith’s tracing running in the background by default. On the Python side, this is achieved by setting environment variables, which we establish whenever we launch a virtual environment or open our bash shell and leave them set. The same principle applies to most JavaScript', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'})],\n",
-       " 'answer': 'LangSmith can help with testing by allowing you to quickly edit examples and add them to datasets to expand the surface area of your evaluation sets. This means you can fine-tune a model for improved quality or reduced costs. Additionally, LangSmith provides monitoring capabilities to log all traces, visualize latency and token usage statistics, and troubleshoot specific issues as they arise during testing. This allows you to monitor your application and ensure that it is performing as expected, making it easier to identify and address any issues that may arise during testing.'}"
+       " 'answer': 'LangSmith can help with testing by providing tracing capabilities that allow you to monitor and log all traces, visualize latency, and track token usage statistics. This can be invaluable for testing the performance and reliability of your prompts, chains, and agents. Additionally, LangSmith enables you to troubleshoot specific issues as they arise during testing, allowing for more effective debugging and optimization of your applications.'}"
      ]
     },
-     "execution_count": 29,
+     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
@ -863,23 +863,23 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 30,
+   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'messages': [HumanMessage(content='how can langsmith help with testing?'),\n",
-       "  AIMessage(content='LangSmith can help with testing by allowing you to quickly edit examples and add them to datasets to expand the surface area of your evaluation sets. This means you can fine-tune a model for improved quality or reduced costs. Additionally, LangSmith provides monitoring capabilities to log all traces, visualize latency and token usage statistics, and troubleshoot specific issues as they arise during testing. This allows you to monitor your application and ensure that it is performing as expected, making it easier to identify and address any issues that may arise during testing.'),\n",
+       "  AIMessage(content='LangSmith can help with testing by providing tracing capabilities that allow you to monitor and log all traces, visualize latency, and track token usage statistics. This can be invaluable for testing the performance and reliability of your prompts, chains, and agents. Additionally, LangSmith enables you to troubleshoot specific issues as they arise during testing, allowing for more effective debugging and optimization of your applications.'),\n",
       "  HumanMessage(content='tell me more about that!')],\n",
       " 'context': [Document(page_content='You can also quickly edit examples and add them to datasets to expand the surface area of your evaluation sets or to fine-tune a model for improved quality or reduced costs.Monitoring\\u200bAfter all this, your app might finally ready to go in production. LangSmith can also be used to monitor your application in much the same way that you used for debugging. You can log all traces, visualize latency and token usage statistics, and troubleshoot specific issues as they arise. Each run can also be', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'}),\n",
-       "  Document(page_content='inputs, and see what happens. At some point though, our application is performing\\nwell and we want to be more rigorous about testing changes. We can use a dataset\\nthat we’ve constructed along the way (see above). Alternatively, we could spend some\\ntime constructing a small dataset by hand. For these situations, LangSmith simplifies', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'}),\n",
-       "  Document(page_content='datasets\\u200bLangSmith makes it easy to curate datasets. However, these aren’t just useful inside LangSmith; they can be exported for use in other contexts. Notable applications include exporting for use in OpenAI Evals or fine-tuning, such as with FireworksAI.To set up tracing in Deno, web browsers, or other runtime environments without access to the environment, check out the FAQs.↩PreviousLangSmithNextTracingOn by defaultDebuggingWhat was the exact input to the LLM?If I edit the prompt, how does', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'}),\n",
-       "  Document(page_content='LangSmith makes it easy to manually review and annotate runs through annotation queues.These queues allow you to select any runs based on criteria like model type or automatic evaluation scores, and queue them up for human review. As a reviewer, you can then quickly step through the runs, viewing the input, output, and any existing tags before adding your own feedback.We often use this for a couple of reasons:To assess subjective qualities that automatic evaluators struggle with, like', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'})],\n",
-       " 'answer': 'Certainly! LangSmith simplifies the process of constructing and curating datasets, which can be used for testing purposes. You can use the datasets constructed in LangSmith or spend some time manually constructing a small dataset by hand. These datasets can then be used to rigorously test changes in your application.\\n\\nFurthermore, LangSmith allows for manual review and annotation of runs through annotation queues. This feature enables you to select runs based on criteria like model type or automatic evaluation scores and queue them up for human review. As a reviewer, you can quickly step through the runs, view the input, output, and any existing tags, and add your own feedback. This is particularly useful for assessing subjective qualities that automatic evaluators may struggle with during testing.\\n\\nOverall, LangSmith provides a comprehensive set of tools to aid in testing, from dataset construction to manual review and annotation, making the testing process more efficient and effective.'}"
+       "  Document(page_content='environments through process.env1.The benefit here is that all calls to LLMs, chains, agents, tools, and retrievers are logged to LangSmith. Around 90% of the time we don’t even look at the traces, but the 10% of the time that we do… it’s so helpful. We can use LangSmith to debug:An unexpected end resultWhy an agent is loopingWhy a chain was slower than expectedHow many tokens an agent usedDebugging\\u200bDebugging LLMs, chains, and agents can be tough. LangSmith helps solve the following pain', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'}),\n",
+       "  Document(page_content='Skip to main content🦜️🛠️ LangSmith DocsPython DocsJS/TS DocsSearchGo to AppLangSmithOverviewTracingTesting & EvaluationOrganizationsHubLangSmith CookbookOverviewOn this pageLangSmith Overview and User GuideBuilding reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.Over the past two months, we at LangChain', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'}),\n",
+       "  Document(page_content='have been building and using LangSmith with the goal of bridging this gap. This is our tactical user guide to outline effective ways to use LangSmith and maximize its benefits.On by default\\u200bAt LangChain, all of us have LangSmith’s tracing running in the background by default. On the Python side, this is achieved by setting environment variables, which we establish whenever we launch a virtual environment or open our bash shell and leave them set. The same principle applies to most JavaScript', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'})],\n",
+       " 'answer': \"Certainly! LangSmith's tracing capabilities allow you to monitor and log all traces of your application, providing visibility into the behavior and performance of your prompts, chains, and agents during testing. This can help you identify any unexpected end results, performance bottlenecks, or excessive token usage. By visualizing latency and token usage statistics, you can gain insights into the efficiency and resource consumption of your application, enabling you to make informed decisions about optimization and fine-tuning. Additionally, LangSmith's troubleshooting features empower you to address specific issues that may arise during testing, ultimately contributing to the reliability and quality of your applications.\"}"
      ]
     },
-     "execution_count": 30,
+     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
--- a/docs/docs/use_cases/chatbots/retrieval.ipynb
+++ b/docs/docs/use_cases/chatbots/retrieval.ipynb
@ -71,7 +71,7 @@
   "source": [
    "from langchain_openai import ChatOpenAI\n",
    "\n",
-    "chat = ChatOpenAI(model=\"gpt-3.5-turbo-1106\")"
+    "chat = ChatOpenAI(model=\"gpt-3.5-turbo-1106\", temperature=0.2)"
   ]
  },
  {
@ -229,7 +229,7 @@
    {
     "data": {
      "text/plain": [
-       "'Yes, LangSmith can help test and evaluate your LLM applications. It simplifies the initial setup, and you can also quickly edit examples and add them to datasets to expand the surface area of your evaluation sets or to fine-tune a model for improved quality or reduced costs. Additionally, LangSmith can be used to monitor your application, log all traces, visualize latency and token usage statistics, and troubleshoot specific issues as they arise.'"
+       "'Yes, LangSmith can help test and evaluate your LLM applications. It simplifies the initial setup, and you can use it to monitor your application, log all traces, visualize latency and token usage statistics, and troubleshoot specific issues as they arise.'"
      ]
     },
     "execution_count": 8,
@ -265,7 +265,7 @@
    {
     "data": {
      "text/plain": [
-       "\"I don't have enough information to answer your question.\""
+       "\"I don't know about LangSmith's specific capabilities for testing LLM applications. It's best to directly reach out to LangSmith or check their website for information on their services related to LLM application testing.\""
      ]
     },
     "execution_count": 9,
@ -339,7 +339,7 @@
       "  Document(page_content='LangSmith Overview and User Guide | 🦜️🛠️ LangSmith', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'}),\n",
       "  Document(page_content='You can also quickly edit examples and add them to datasets to expand the surface area of your evaluation sets or to fine-tune a model for improved quality or reduced costs.Monitoring\\u200bAfter all this, your app might finally ready to go in production. LangSmith can also be used to monitor your application in much the same way that you used for debugging. You can log all traces, visualize latency and token usage statistics, and troubleshoot specific issues as they arise. Each run can also be', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'}),\n",
       "  Document(page_content=\"does that affect the output?\\u200bSo you notice a bad output, and you go into LangSmith to see what's going on. You find the faulty LLM call and are now looking at the exact input. You want to try changing a word or a phrase to see what happens -- what do you do?We constantly ran into this issue. Initially, we copied the prompt to a playground of sorts. But this got annoying, so we built a playground of our own! When examining an LLM call, you can click the Open in Playground button to access this\", metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'})],\n",
-       " 'answer': 'Yes, LangSmith can help test and evaluate your LLM applications. It simplifies the initial setup, and you can use it to monitor your application, log all traces, visualize latency and token usage statistics, and troubleshoot specific issues as they arise.'}"
+       " 'answer': 'Yes, LangSmith can help test and evaluate your LLM applications. It simplifies the initial setup and provides features for monitoring your application, logging all traces, visualizing latency and token usage statistics, and troubleshooting specific issues. It can also be used to edit examples and add them to datasets to expand the surface area of your evaluation sets or to fine-tune a model for improved quality or reduced costs.'}"
      ]
     },
     "execution_count": 11,
@ -537,7 +537,7 @@
       "  Document(page_content='Skip to main content🦜️🛠️ LangSmith DocsPython DocsJS/TS DocsSearchGo to AppLangSmithOverviewTracingTesting & EvaluationOrganizationsHubLangSmith CookbookOverviewOn this pageLangSmith Overview and User GuideBuilding reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.Over the past two months, we at LangChain', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'}),\n",
       "  Document(page_content='You can also quickly edit examples and add them to datasets to expand the surface area of your evaluation sets or to fine-tune a model for improved quality or reduced costs.Monitoring\\u200bAfter all this, your app might finally ready to go in production. LangSmith can also be used to monitor your application in much the same way that you used for debugging. You can log all traces, visualize latency and token usage statistics, and troubleshoot specific issues as they arise. Each run can also be', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'}),\n",
       "  Document(page_content='datasets\\u200bLangSmith makes it easy to curate datasets. However, these aren’t just useful inside LangSmith; they can be exported for use in other contexts. Notable applications include exporting for use in OpenAI Evals or fine-tuning, such as with FireworksAI.To set up tracing in Deno, web browsers, or other runtime environments without access to the environment, check out the FAQs.↩PreviousLangSmithNextTracingOn by defaultDebuggingWhat was the exact input to the LLM?If I edit the prompt, how does', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'})],\n",
-       " 'answer': 'Yes, LangSmith offers testing and evaluation features to help you assess and improve the performance of your LLM applications. It can assist in monitoring your application, logging traces, visualizing latency and token usage statistics, and troubleshooting specific issues as they arise.'}"
+       " 'answer': 'Yes, LangSmith can help test and evaluate your LLM applications. It provides tools for monitoring your application, logging traces, visualizing latency and token usage statistics, and troubleshooting specific issues as they arise. Additionally, you can quickly edit examples and add them to datasets to expand the surface area of your evaluation sets or to fine-tune a model for improved quality or reduced costs.'}"
      ]
     },
     "execution_count": 16,
@ -570,7 +570,7 @@
       "  Document(page_content='You can also quickly edit examples and add them to datasets to expand the surface area of your evaluation sets or to fine-tune a model for improved quality or reduced costs.Monitoring\\u200bAfter all this, your app might finally ready to go in production. LangSmith can also be used to monitor your application in much the same way that you used for debugging. You can log all traces, visualize latency and token usage statistics, and troubleshoot specific issues as they arise. Each run can also be', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'}),\n",
       "  Document(page_content='Skip to main content🦜️🛠️ LangSmith DocsPython DocsJS/TS DocsSearchGo to AppLangSmithOverviewTracingTesting & EvaluationOrganizationsHubLangSmith CookbookOverviewOn this pageLangSmith Overview and User GuideBuilding reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.Over the past two months, we at LangChain', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'}),\n",
       "  Document(page_content='LangSmith makes it easy to manually review and annotate runs through annotation queues.These queues allow you to select any runs based on criteria like model type or automatic evaluation scores, and queue them up for human review. As a reviewer, you can then quickly step through the runs, viewing the input, output, and any existing tags before adding your own feedback.We often use this for a couple of reasons:To assess subjective qualities that automatic evaluators struggle with, like', metadata={'description': 'Building reliable LLM applications can be challenging. LangChain simplifies the initial setup, but there is still work needed to bring the performance of prompts, chains and agents up the level where they are reliable enough to be used in production.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/overview', 'title': 'LangSmith Overview and User Guide | 🦜️🛠️ LangSmith'})],\n",
-       " 'answer': 'LangSmith simplifies the initial setup for building reliable LLM applications, but it acknowledges that there is still work needed to bring the performance of prompts, chains, and agents up to the level where they are reliable enough to be used in production. It also provides the ability to manually review and annotate runs through annotation queues, allowing you to select runs based on criteria like model type or automatic evaluation scores, and queue them up for human review. As a reviewer, you can step through the runs, view the input, output, and any existing tags before adding your own feedback. This feature is particularly useful for assessing subjective qualities that automatic evaluators struggle with.'}"
+       " 'answer': 'LangSmith simplifies the initial setup for building reliable LLM applications. It provides features for manually reviewing and annotating runs through annotation queues, allowing you to select runs based on criteria like model type or automatic evaluation scores, and queue them up for human review. As a reviewer, you can quickly step through the runs, view the input, output, and any existing tags before adding your own feedback. This can be especially useful for assessing subjective qualities that automatic evaluators struggle with.'}"
      ]
     },
     "execution_count": 17,
@ -628,50 +628,16 @@
      "{'answer': ' L'}\n",
      "{'answer': 'LM'}\n",
      "{'answer': ' applications'}\n",
-      "{'answer': ','}\n",
-      "{'answer': ' but'}\n",
-      "{'answer': ' there'}\n",
-      "{'answer': ' is'}\n",
-      "{'answer': ' still'}\n",
-      "{'answer': ' work'}\n",
-      "{'answer': ' needed'}\n",
-      "{'answer': ' to'}\n",
-      "{'answer': ' bring'}\n",
-      "{'answer': ' the'}\n",
-      "{'answer': ' performance'}\n",
-      "{'answer': ' of'}\n",
-      "{'answer': ' prompts'}\n",
-      "{'answer': ','}\n",
-      "{'answer': ' chains'}\n",
-      "{'answer': ','}\n",
-      "{'answer': ' and'}\n",
-      "{'answer': ' agents'}\n",
-      "{'answer': ' up'}\n",
-      "{'answer': ' to'}\n",
-      "{'answer': ' the'}\n",
-      "{'answer': ' level'}\n",
-      "{'answer': ' where'}\n",
-      "{'answer': ' they'}\n",
-      "{'answer': ' are'}\n",
-      "{'answer': ' reliable'}\n",
-      "{'answer': ' enough'}\n",
-      "{'answer': ' to'}\n",
-      "{'answer': ' be'}\n",
-      "{'answer': ' used'}\n",
-      "{'answer': ' in'}\n",
-      "{'answer': ' production'}\n",
      "{'answer': '.'}\n",
-      "{'answer': ' Lang'}\n",
-      "{'answer': 'Smith'}\n",
-      "{'answer': ' also'}\n",
+      "{'answer': ' It'}\n",
      "{'answer': ' provides'}\n",
-      "{'answer': ' the'}\n",
-      "{'answer': ' capability'}\n",
-      "{'answer': ' to'}\n",
+      "{'answer': ' features'}\n",
+      "{'answer': ' for'}\n",
      "{'answer': ' manually'}\n",
-      "{'answer': ' review'}\n",
+      "{'answer': ' reviewing'}\n",
      "{'answer': ' and'}\n",
-      "{'answer': ' annotate'}\n",
+      "{'answer': ' annot'}\n",
+      "{'answer': 'ating'}\n",
      "{'answer': ' runs'}\n",
      "{'answer': ' through'}\n",
      "{'answer': ' annotation'}\n",
@ -692,13 +658,47 @@
      "{'answer': ' automatic'}\n",
      "{'answer': ' evaluation'}\n",
      "{'answer': ' scores'}\n",
+      "{'answer': ','}\n",
+      "{'answer': ' and'}\n",
+      "{'answer': ' queue'}\n",
+      "{'answer': ' them'}\n",
+      "{'answer': ' up'}\n",
      "{'answer': ' for'}\n",
      "{'answer': ' human'}\n",
      "{'answer': ' review'}\n",
      "{'answer': '.'}\n",
+      "{'answer': ' As'}\n",
+      "{'answer': ' a'}\n",
+      "{'answer': ' reviewer'}\n",
+      "{'answer': ','}\n",
+      "{'answer': ' you'}\n",
+      "{'answer': ' can'}\n",
+      "{'answer': ' quickly'}\n",
+      "{'answer': ' step'}\n",
+      "{'answer': ' through'}\n",
+      "{'answer': ' the'}\n",
+      "{'answer': ' runs'}\n",
+      "{'answer': ','}\n",
+      "{'answer': ' view'}\n",
+      "{'answer': ' the'}\n",
+      "{'answer': ' input'}\n",
+      "{'answer': ','}\n",
+      "{'answer': ' output'}\n",
+      "{'answer': ','}\n",
+      "{'answer': ' and'}\n",
+      "{'answer': ' any'}\n",
+      "{'answer': ' existing'}\n",
+      "{'answer': ' tags'}\n",
+      "{'answer': ' before'}\n",
+      "{'answer': ' adding'}\n",
+      "{'answer': ' your'}\n",
+      "{'answer': ' own'}\n",
+      "{'answer': ' feedback'}\n",
+      "{'answer': '.'}\n",
      "{'answer': ' This'}\n",
-      "{'answer': ' feature'}\n",
-      "{'answer': ' is'}\n",
+      "{'answer': ' can'}\n",
+      "{'answer': ' be'}\n",
+      "{'answer': ' particularly'}\n",
      "{'answer': ' useful'}\n",
      "{'answer': ' for'}\n",
      "{'answer': ' assessing'}\n",