I made a couple of improvements to the Comet tracker:
* The Comet project name is configurable in various ways (code,
environment variable or file), having a default value in code meant that
users couldn't set the project name in an environment variable or in a
file.
* I added error catching when the `flush_tracker` is called in order to
avoid crashing the whole process. Instead we are gonna display a warning
or error log message (`extra={"show_traceback": True}` is an internal
convention to force the display of the traceback when using our own
logger).
I decided to add the error catching after seeing the following error in
the third example of the notebook:
```
COMET ERROR: Failed to export agent or LLM to Comet
Traceback (most recent call last):
File "/home/lothiraldan/project/cometml/langchain/langchain/callbacks/comet_ml_callback.py", line 484, in _log_model
langchain_asset.save(langchain_asset_path)
File "/home/lothiraldan/project/cometml/langchain/langchain/agents/agent.py", line 591, in save
raise ValueError(
ValueError: Saving not supported for agent executors. If you are trying to save the agent, please use the `.save_agent(...)`
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/lothiraldan/project/cometml/langchain/langchain/callbacks/comet_ml_callback.py", line 449, in flush_tracker
self._log_model(langchain_asset)
File "/home/lothiraldan/project/cometml/langchain/langchain/callbacks/comet_ml_callback.py", line 488, in _log_model
langchain_asset.save_agent(langchain_asset_path)
File "/home/lothiraldan/project/cometml/langchain/langchain/agents/agent.py", line 599, in save_agent
return self.agent.save(file_path)
File "/home/lothiraldan/project/cometml/langchain/langchain/agents/agent.py", line 145, in save
agent_dict = self.dict()
File "/home/lothiraldan/project/cometml/langchain/langchain/agents/agent.py", line 119, in dict
_dict = super().dict()
File "pydantic/main.py", line 449, in pydantic.main.BaseModel.dict
File "pydantic/main.py", line 868, in _iter
File "pydantic/main.py", line 743, in pydantic.main.BaseModel._get_value
File "/home/lothiraldan/project/cometml/langchain/langchain/schema.py", line 381, in dict
output_parser_dict["_type"] = self._type
File "/home/lothiraldan/project/cometml/langchain/langchain/schema.py", line 376, in _type
raise NotImplementedError
NotImplementedError
```
I still need to investigate and try to fix it, it looks related to
saving an agent to a file.
## Use `index_id` over `app_id`
We made a major update to index + retrieve based on Metal Indexes
(instead of apps). With this change, we accept an index instead of an
app in each of our respective core apis. [More details
here](https://docs.getmetal.io/api-reference/core/indexing).
## What is this PR for:
* This PR adds a commented line of code in the documentation that shows
how someone can use the Pinecone client with an already existing
Pinecone index
* The documentation currently only shows how to create a pinecone index
from langchain documents but not how to load one that already exists
Sometimes the LLM response (generated code) tends to miss the ending
ticks "```". Therefore causing the text parsing to fail due to not
enough values to unpack.
The 2 extra `_` don't add value and can cause errors. Suggest to simply
update the `_, action, _` to just `action` then with index.
Fixes issue #3057
This pull request addresses the need to share a single `chromadb.Client`
instance across multiple instances of the `Chroma` class. By
implementing a shared client, we can maintain consistency and reduce
resource usage when multiple instances of the `Chroma` classes are
created. This is especially relevant in a web app, where having multiple
`Chroma` instances with a `persist_directory` leads to these clients not
being synced.
This PR implements this option while keeping the rest of the
architecture unchanged.
**Changes:**
1. Add a client attribute to the `Chroma` class to store the shared
`chromadb.Client` instance.
2. Modify the `from_documents` method to accept an optional client
parameter.
3. Update the `from_documents` method to use the shared client if
provided or create a new client if not provided.
Let me know if anything needs to be modified - thanks again for your
work on this incredible repo
This PR extends upon @jzluo 's PR #2748 which addressed dialect-specific
issues with SQL prompts, and adds a prompt that uses backticks for
column names when querying BigQuery. See [GoogleSQL quoted
identifiers](https://cloud.google.com/bigquery/docs/reference/standard-sql/lexical#quoted_identifiers).
Additionally, the SQL agent currently uses a generic prompt. Not sure
how best to adopt the same optional dialect-specific prompts as above,
but will consider making an issue and PR for that too. See
[langchain/agents/agent_toolkits/sql/prompt.py](langchain/agents/agent_toolkits/sql/prompt.py).
### Description
Pass kwargs to get OpenSearch client from `from_texts` function
### Issues Resolved
https://github.com/hwchase17/langchain/issues/2819
Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
`langchain.prompts.PromptTemplate` is unable to infer `input_variables`
from jinja2 template.
```python
# Using langchain v0.0.141
template_string = """\
Hello world
Your variable: {{ var }}
{# This will not get rendered #}
{% if verbose %}
Congrats! You just turned on verbose mode and got extra messages!
{% endif %}
"""
template = PromptTemplate.from_template(template_string, template_format="jinja2")
print(template.input_variables) # Output ['# This will not get rendered #', '% endif %', '% if verbose %']
```
---------
Co-authored-by: engkheng <ongengkheng929@example.com>
- Updated `langchain/docs/modules/models/llms/integrations/` notebooks:
added links to the original sites, the install information, etc.
- Added the `nlpcloud` notebook.
- Removed "Example" from Titles of some notebooks, so all notebook
titles are consistent.
### https://github.com/hwchase17/langchain/issues/2997
Replaced `conversation.memory.store` to
`conversation.memory.entity_store.store`
As conversation.memory.store doesn't exist and re-ran the whole file.
allows the user to catch the issue and handle it rather than failing
hard.
This happens more than you'd expect when using output parsers with
chatgpt, especially if the temp is anything but 0. Sometimes it doesn't
want to listen and just does its own thing.
Not sure what happened here but some of the file got overwritten by
#2859 which broke filtering logic.
Here is it fixed back to normal.
@hwchase17 can we expedite this if possible :-)
---------
Co-authored-by: Altay Sansal <altay.sansal@tgs.com>
- Most important - fixes the relevance_fn name in the notebook to align
with the docs
- Updates comments for the summary:
<img width="787" alt="image"
src="https://user-images.githubusercontent.com/130414180/232520616-2a99e8c3-a821-40c2-a0d5-3f3ea196c9bb.png">
- The new conversation is a bit better, still unfortunate they try to
schedule a followup.
- Rm the max dialogue turns argument to the conversation function
Add a time-weighted memory retriever and a notebook that approximates a
Generative Agent from https://arxiv.org/pdf/2304.03442.pdf
The "daily plan" components are removed for now since they are less
useful without a virtual world, but the memory is an interesting
component to build off.
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
### Background
Continuing to implement all the interface methods defined by the
`VectorStore` class. This PR pertains to implementation of the
`max_marginal_relevance_search` method.
### Changes
- a `max_marginal_relevance_search` method implementation has been added
in `weaviate.py`
- tests have been added to the the new method
- vcr cassettes have been added for the weaviate tests
### Test Plan
Added tests for the `max_marginal_relevance_search` implementation
### Change Safety
- [x] I have added tests to cover my changes
- Modify SVMRetriever class to add an optional relevancy_threshold
- Modify SVMRetriever.get_relevant_documents method to filter out
documents with similarity scores below the relevancy threshold
- Normalized the similarities to be between 0 and 1 so the
relevancy_threshold makes more sense
- The number of results are limited to the top k documents or the
maximum number of relevant documents above the threshold, whichever is
smaller
This code will now return the top self.k results (or less, if there are
not enough results that meet the self.relevancy_threshold criteria).
The svm.LinearSVC implementation in scikit-learn is non-deterministic,
which means
SVMRetriever.from_texts(["bar", "world", "foo", "hello", "foo bar"])
could return [3 0 5 4 2 1] instead of [0 3 5 4 2 1] with a query of
"foo".
If you pass in multiple "foo" texts, the order could be different each
time. Here, we only care if the 0 is the first element, otherwise it
will offset the text and similarities.
Example:
```python
retriever = SVMRetriever.from_texts(
["foo", "bar", "world", "hello", "foo bar"],
OpenAIEmbeddings(),
k=4,
relevancy_threshold=.25
)
result = retriever.get_relevant_documents("foo")
```
yields
```python
[Document(page_content='foo', metadata={}), Document(page_content='foo bar', metadata={})]
```
---------
Co-authored-by: Brandon Sandoval <52767641+account00001@users.noreply.github.com>
re
https://github.com/hwchase17/langchain/issues/439#issuecomment-1510442791
I think it's not polite for a library to use the root logger
both of these forms are also used:
```
logger = logging.getLogger(__name__)
logger = logging.getLogger(__file__)
```
I am not sure if there is any reason behind one vs the other? (...I am
guessing maybe just contributed by different people)
it seems to me it'd be better to consistently use
`logging.getLogger(__name__)`
this makes it easier for consumers of the library to set up log
handlers, e.g. for everything with `langchain.` prefix