The detailed walkthrough of the Weaviate wrapper was pointing to the
getting-started notebook. Fixed it to point to the Weaviable notebook in
the examples folder.
This pull request adds a ChatGPT document loader to the document loaders
module in `langchain/document_loaders/chatgpt.py`. Additionally, it
includes an example Jupyter notebook in
`docs/modules/indexes/document_loaders/examples/chatgpt_loader.ipynb`
which uses fake sample data based on the original structure of the
`conversations.json` file.
The following files were added/modified:
- `langchain/document_loaders/__init__.py`
- `langchain/document_loaders/chatgpt.py`
- `docs/modules/indexes/document_loaders/examples/chatgpt_loader.ipynb`
-
`docs/modules/indexes/document_loaders/examples/example_data/fake_conversations.json`
This pull request was made in response to the recent release of ChatGPT
data exports by email:
https://help.openai.com/en/articles/7260999-how-do-i-export-my-chatgpt-history
Hi there!
I'm excited to open this PR to add support for using a fully Postgres
syntax compatible database 'AnalyticDB' as a vector.
As AnalyticDB has been proved can be used with AutoGPT,
ChatGPT-Retrieve-Plugin, and LLama-Index, I think it is also good for
you.
AnalyticDB is a distributed Alibaba Cloud-Native vector database. It
works better when data comes to large scale. The PR includes:
- [x] A new memory: AnalyticDBVector
- [x] A suite of integration tests verifies the AnalyticDB integration
I have read your [contributing
guidelines](72b7d76d79/.github/CONTRIBUTING.md).
And I have passed the tests below
- [x] make format
- [x] make lint
- [x] make coverage
- [x] make test
First cut of a supabase vectorstore loosely patterned on the langchainjs
equivalent. Doesn't support async operations which is a limitation of
the supabase python client.
---------
Co-authored-by: Daniel Chalef <daniel.chalef@private.org>
I have noticed a typo error in the `custom_mrkl_agents.ipynb` document
while trying the example from the documentation page. As a result, I
have opened a pull request (PR) to address this minor issue, even though
it may seem insignificant 😂.
The following calls were throwing an exception:
575b717d10/docs/use_cases/evaluation/agent_vectordb_sota_pg.ipynb (L192)575b717d10/docs/use_cases/evaluation/agent_vectordb_sota_pg.ipynb (L239)
Exception:
```
---------------------------------------------------------------------------
ValidationError Traceback (most recent call last)
Cell In[14], line 1
----> 1 chain_sota = RetrievalQA.from_chain_type(llm=OpenAI(temperature=0), chain_type="stuff", retriever=vectorstore_sota, input_key="question")
File ~/github/langchain/venv/lib/python3.9/site-packages/langchain/chains/retrieval_qa/base.py:89, in BaseRetrievalQA.from_chain_type(cls, llm, chain_type, chain_type_kwargs, **kwargs)
85 _chain_type_kwargs = chain_type_kwargs or {}
86 combine_documents_chain = load_qa_chain(
87 llm, chain_type=chain_type, **_chain_type_kwargs
88 )
---> 89 return cls(combine_documents_chain=combine_documents_chain, **kwargs)
File ~/github/langchain/venv/lib/python3.9/site-packages/pydantic/main.py:341, in pydantic.main.BaseModel.__init__()
ValidationError: 1 validation error for RetrievalQA
retriever
instance of BaseRetriever expected (type=type_error.arbitrary_type; expected_arbitrary_type=BaseRetriever)
```
The vectorstores had to be converted to retrievers:
`vectorstore_sota.as_retriever()` and `vectorstore_pg.as_retriever()`.
The PR also:
- adds the file `paul_graham_essay.txt` referenced by this notebook
- adds to gitignore *.pkl and *.bin files that are generated by this
notebook
Interestingly enough, the performance of the prediction greatly
increased (new version of langchain or ne version of OpenAI models since
the last run of the notebook): from 19/33 correct to 28/33 correct!
- Remove dynamic model creation in the `args()` property. _Only infer
for the decorator (and add an argument to NOT infer if someone wishes to
only pass as a string)_
- Update the validation example to make it less likely to be
misinterpreted as a "safe" way to run a repl
There is one example of "Multi-argument tools" in the custom_tools.ipynb
from yesterday, but we could add more. The output parsing for the base
MRKL agent hasn't been adapted to handle structured args at this point
in time
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
## Use `index_id` over `app_id`
We made a major update to index + retrieve based on Metal Indexes
(instead of apps). With this change, we accept an index instead of an
app in each of our respective core apis. [More details
here](https://docs.getmetal.io/api-reference/core/indexing).
## What is this PR for:
* This PR adds a commented line of code in the documentation that shows
how someone can use the Pinecone client with an already existing
Pinecone index
* The documentation currently only shows how to create a pinecone index
from langchain documents but not how to load one that already exists
- Updated `langchain/docs/modules/models/llms/integrations/` notebooks:
added links to the original sites, the install information, etc.
- Added the `nlpcloud` notebook.
- Removed "Example" from Titles of some notebooks, so all notebook
titles are consistent.
### https://github.com/hwchase17/langchain/issues/2997
Replaced `conversation.memory.store` to
`conversation.memory.entity_store.store`
As conversation.memory.store doesn't exist and re-ran the whole file.
- Most important - fixes the relevance_fn name in the notebook to align
with the docs
- Updates comments for the summary:
<img width="787" alt="image"
src="https://user-images.githubusercontent.com/130414180/232520616-2a99e8c3-a821-40c2-a0d5-3f3ea196c9bb.png">
- The new conversation is a bit better, still unfortunate they try to
schedule a followup.
- Rm the max dialogue turns argument to the conversation function
Add a time-weighted memory retriever and a notebook that approximates a
Generative Agent from https://arxiv.org/pdf/2304.03442.pdf
The "daily plan" components are removed for now since they are less
useful without a virtual world, but the memory is an interesting
component to build off.
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Use numexpr evaluate instead of the python REPL to avoid malicious code
injection.
Tested against the (limited) math dataset and got the same score as
before.
For more permissive tools (like the REPL tool itself), other approaches
ought to be provided (some combination of Sanitizer + Restricted python
+ unprivileged-docker + ...), but for a calculator tool, only
mathematical expressions should be permitted.
See https://github.com/hwchase17/langchain/issues/814
Last week I added the `PDFMinerPDFasHTMLLoader`. I am adding some
example code in the notebook to serve as a tutorial for how that loader
can be used to create snippets of a pdf that are structured within
sections. All the other loaders only provide the `Document` objects
segmented by pages but that's pretty loose given the amount of other
metadata that can be extracted.
With the new loader, one can leverage font-size of the text to decide
when a new sections starts and can segment the text more semantically as
shown in the tutorial notebook. The cell shows that we are able to find
the content of entire section under **Related Work** for the example pdf
which is spread across 2 pages and hence is stored as two separate
documents by other loaders
Add SVM retriever class, based on
https://github.com/karpathy/randomfun/blob/master/knn_vs_svm.ipynb.
Testing still WIP, but the logic is correct (I have a local
implementation outside of Langchain working).
---------
Co-authored-by: Lance Martin <122662504+PineappleExpress808@users.noreply.github.com>
Co-authored-by: rlm <31treehaus@31s-MacBook-Pro.local>
Minor cosmetic changes
- Activeloop environment cred authentication in notebooks with
`getpass.getpass` (instead of CLI which not always works)
- much faster tests with Deep Lake pytest mode on
- Deep Lake kwargs pass
Notes
- I put pytest environment creds inside `vectorstores/conftest.py`, but
feel free to suggest a better location. For context, if I put in
`test_deeplake.py`, `ruff` doesn't let me to set them before import
deeplake
---------
Co-authored-by: Davit Buniatyan <d@activeloop.ai>
Note to self: Always run integration tests, even on "that last minute
change you thought would be safe" :)
---------
Co-authored-by: Mike Lambert <mike.lambert@anthropic.com>
**About**
Specify encoding to avoid UnicodeDecodeError when reading .txt for users
who are following the tutorial.
**Reference**
```
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 1205: character maps to <undefined>
```
**Environment**
OS: Win 11
Python: 3.8
Allows users to specify what files should be loaded instead of
indiscriminately loading the entire repo.
extends #2851
NOTE: for reviewers, `hide whitespace` option recommended since I
changed the indentation of an if-block to use `continue` instead so it
looks less like a Christmas tree :)
Mendable Seach Integration is Finally here!
Hey yall,
After various requests for Mendable in Python docs, we decided to get
our hands dirty and try to implement it.
Here is a version where we implement our **floating button** that sits
on the bottom right of the screen that once triggered (via press or CMD
K) will work the same as the js langchain docs.
Super excited about this and hopefully the community will be too.
@hwchase17 will send you the admin details via dm etc. The anon_key is
fine to be public.
Let me know if you need any further customization. I added the langchain
logo to it.
Fixes linting issue from #2835
Adds a loader for Slack Exports which can be a very valuable source of
knowledge to use for internal QA bots and other use cases.
```py
# Export data from your Slack Workspace first.
from langchain.document_loaders import SLackDirectoryLoader
SLACK_WORKSPACE_URL = "https://awesome.slack.com"
loader = ("Slack_Exports", SLACK_WORKSPACE_URL)
docs = loader.load()
```
Have seen questions about whether or not the `SQLDatabaseChain` supports
more than just sqlite, which was unclear in the docs, so tried to
clarify that and how to connect to other dialects.