This PR introduces a Blob data type and a Blob loader interface.
This is the first of a sequence of PRs that follows this proposal:
https://github.com/hwchase17/langchain/pull/2833
The primary goals of these abstraction are:
* Decouple content loading from content parsing code.
* Help duplicated content loading code from document loaders.
* Make lazy loading a default for langchain.
### Summary
Updates the `UnstructuredURLLoader` to include a "elements" mode that
retains additional metadata from `unstructured`. This makes
`UnstructuredURLLoader` consistent with other unstructured loaders,
which also support "elements" mode. Patched mode into the existing
`UnstructuredURLLoader` class instead of inheriting from
`UnstructuredBaseLoader` because it significantly simplified the
implementation.
### Testing
This should still work and show the url in the source for the metadata
```python
from langchain.document_loaders import UnstructuredURLLoader
urls = ["https://www.understandingwar.org/sites/default/files/Russian%20Offensive%20Campaign%20Assessment%2C%20April%2011%2C%202023.pdf"]
loader = UnstructuredURLLoader(urls=urls, headers={"Accept": "application/json"}, strategy="fast")
docs = loader.load()
print(docs[0].page_content[:1000])
docs[0].metadata
```
This should now work and show additional metadata from `unstructured`.
This should still work and show the url in the source for the metadata
```python
from langchain.document_loaders import UnstructuredURLLoader
urls = ["https://www.understandingwar.org/sites/default/files/Russian%20Offensive%20Campaign%20Assessment%2C%20April%2011%2C%202023.pdf"]
loader = UnstructuredURLLoader(urls=urls, headers={"Accept": "application/json"}, strategy="fast", mode="elements")
docs = loader.load()
print(docs[0].page_content[:1000])
docs[0].metadata
```
This PR
* Adds `clear` method for `BaseCache` and implements it for various
caches
* Adds the default `init_func=None` and fixes gptcache integtest
* Since right now integtest is not running in CI, I've verified the
changes by running `docs/modules/models/llms/examples/llm_caching.ipynb`
(until proper e2e integtest is done in CI)
This fixes the error when calling AzureOpenAI of gpt-35-turbo model.
The error is:
InvalidRequestError: logprobs, best_of and echo parameters are not
available on gpt-35-turbo model. Please remove the parameter and try
again. For more details, see
https://go.microsoft.com/fwlink/?linkid=2227346.
## Background
fixes#2695
## Changes
The `add_text` method uses the internal embedding function if one was
passes to the `Weaviate` constructor.
NOTE: the latest merge on the `Weaviate` class made the specification of
a `weaviate_api_key` mandatory which might not be desirable for all
users and connection methods (for example weaviate also support Embedded
Weaviate which I am happy to add support to here if people think it's
desirable). I wrapped the fetching of the api key into a try catch in
order to allow the `weaviate_api_key` to be unspecified. Do let me know
if this is unsatisfactory.
## Test Plan
added test for `add_texts` method.
This notebook showcases how to implement a multi-agent simulation
without a fixed schedule for who speaks when. Instead the agents decide
for themselves who speaks. We can implement this by having each agent
bid to speak. Whichever agent's bid is the highest gets to speak.
We will show how to do this in the example below that showcases a
fictitious presidential debate.
It makes sense to use `arxiv` as another source of the documents for
downloading.
- Added the `arxiv` document_loader, based on the
`utilities/arxiv.py:ArxivAPIWrapper`
- added tests
- added an example notebook
- sorted `__all__` in `__init__.py` (otherwise it is hard to find a
class in the very long list)
Tools for Bing, DDG and Google weren't consistent even though the
underlying implementations were.
All three services now have the same tools and implementations to easily
switch and experiment when building chains.
The following error gets returned when trying to launch
langchain-server:
ERROR: The Compose file
'/opt/homebrew/lib/python3.11/site-packages/langchain/docker-compose.yaml'
is invalid because:
services.langchain-db.expose is invalid: should be of the format
'PORT[/PROTOCOL]'
Solution:
Change line 28 from - 5432:5432 to - 5432
One of our users noticed a bug when calling streaming models. This is
because those models return an iterator. So, I've updated the Replicate
`_call` code to join together the output. The other advantage of this
fix is that if you requested multiple outputs you would get them all –
previously I was just returning output[0].
I also adjusted the demo docs to use dolly, because we're featuring that
model right now and it's always hot, so people won't have to wait for
the model to boot up.
The error that this fixes:
```
> llm = Replicate(model=“replicate/flan-t5-xl:eec2f71c986dfa3b7a5d842d22e1130550f015720966bec48beaae059b19ef4c”)
> llm(“hello”)
> Traceback (most recent call last):
File "/Users/charlieholtz/workspace/dev/python/main.py", line 15, in <module>
print(llm(prompt))
File "/opt/homebrew/lib/python3.10/site-packages/langchain/llms/base.py", line 246, in __call__
return self.generate([prompt], stop=stop).generations[0][0].text
File "/opt/homebrew/lib/python3.10/site-packages/langchain/llms/base.py", line 140, in generate
raise e
File "/opt/homebrew/lib/python3.10/site-packages/langchain/llms/base.py", line 137, in generate
output = self._generate(prompts, stop=stop)
File "/opt/homebrew/lib/python3.10/site-packages/langchain/llms/base.py", line 324, in _generate
text = self._call(prompt, stop=stop)
File "/opt/homebrew/lib/python3.10/site-packages/langchain/llms/replicate.py", line 108, in _call
return outputs[0]
TypeError: 'generator' object is not subscriptable
```
- added a few missing annotation for complex local variables.
- auto formatted.
- I also went through all other files in agent directory. no seeing any
other missing piece. (there are several prompt strings not annotated,
but I think it’s trivial. Also adding annotation will make it harder to
read in terms of indents.) Anyway, I think this is the last PR in
agent/annotation.
The sentence transformers was a dup of the HF one.
This is a breaking change (model_name vs. model) for anyone using
`SentenceTransformerEmbeddings(model="some/nondefault/model")`, but
since it was landed only this week it seems better to do this now rather
than doing a wrapper.
This notebook shows how the DialogueAgent and DialogueSimulator class
make it easy to extend the [Two-Player Dungeons & Dragons
example](https://python.langchain.com/en/latest/use_cases/agent_simulations/two_player_dnd.html)
to multiple players.
The main difference between simulating two players and multiple players
is in revising the schedule for when each agent speaks
To this end, we augment DialogueSimulator to take in a custom function
that determines the schedule of which agent speaks. In the example
below, each character speaks in round-robin fashion, with the
storyteller interleaved between each player.
Often an LLM will output a requests tool input argument surrounded by
single quotes. This triggers an exception in the requests library. Here,
we add a simple clean url function that strips any leading and trailing
single and double quotes before passing the URL to the underlying
requests library.
Co-authored-by: James Brotchie <brotchie@google.com>
I would like to contribute with a jupyter notebook example
implementation of an AI Sales Agent using `langchain`.
The bot understands the conversation stage (you can define your own
stages fitting your needs)
using two chains:
1. StageAnalyzerChain - takes context and LLM decides what part of sales
conversation is one in
2. SalesConversationChain - generate next message
Schema:
https://images-genai.s3.us-east-1.amazonaws.com/architecture2.png
my original repo: https://github.com/filip-michalsky/SalesGPT
This example creates a sales person named Ted Lasso who is trying to
sell you mattresses.
Happy to update based on your feedback.
Thanks, Filip
https://twitter.com/FilipMichalsky
Simplifies the [Two Agent
D&D](https://python.langchain.com/en/latest/use_cases/agent_simulations/two_player_dnd.html)
example with a cleaner, simpler interface that is extensible for
multiple agents.
`DialogueAgent`:
- `send()`: applies the chatmodel to the message history and returns the
message string
- `receive(name, message)`: adds the `message` spoken by `name` to
message history
The `DialogueSimulator` class takes a list of agents. At each step, it
performs the following:
1. Select the next speaker
2. Calls the next speaker to send a message
3. Broadcasts the message to all other agents
4. Update the step counter.
The selection of the next speaker can be implemented as any function,
but in this case we simply loop through the agents.
Update Alchemy Key URL in Blockchain Document Loader. I want to say
thank you for the incredible work the LangChain library creators have
done.
I am amazed at how seamlessly the Loader integrates with Ethereum
Mainnet, Ethereum Testnet, Polygon Mainnet, and Polygon Testnet, and I
am excited to see how this technology can be extended in the future.
@hwchase17 - Please let me know if I can improve or if I have missed any
community guidelines in making the edit? Thank you again for your hard
work and dedication to the open source community.
Ran into this issue In vectorstores/redis.py when trying to use the
AutoGPT agent with redis vector store. The error I received was
`
langchain/experimental/autonomous_agents/autogpt/agent.py", line 134, in
run
self.memory.add_documents([Document(page_content=memory_to_add)])
AttributeError: 'RedisVectorStoreRetriever' object has no attribute
'add_documents'
`
Added the needed function to the class RedisVectorStoreRetriever which
did not have the functionality like the base VectorStoreRetriever in
vectorstores/base.py that, for example, vectorstores/faiss.py has
This commit adds a new unit test for the _merge_splits function in the
text splitter. The new test verifies that the function merges text into
chunks of the correct size and overlap, using a specified separator. The
test passes on the current implementation of the function.
The Pandas agent fails to pass callback_manager forward, making it
impossible to use custom callbacks with it. Fix that.
Co-authored-by: Sami Liedes <sami.liedes@rocket-science.ch>