# Fix bilibili api import error
bilibili-api package is depracated and there is no sync module.
<!--
Thank you for contributing to LangChain! Your PR will appear in our next
release under the title you set. Please make sure it highlights your
valuable contribution.
Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.
After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.
-->
<!-- Remove if not applicable -->
Fixes#2673#2724
## Before submitting
<!-- If you're adding a new integration, include an integration test and
an example notebook showing its use! -->
## Who can review?
Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:
@vowelparrot @liaokongVFX
<!-- For a quicker response, figure out the right person to tag with @
@hwchase17 - project lead
Tracing / Callbacks
- @agola11
Async
- @agola11
DataLoaders
- @eyurtsev
Models
- @hwchase17
- @agola11
Agents / Tools / Toolkits
- @vowelparrot
VectorStores / Retrievers / Memory
- @dev2049
-->
# TextLoader auto detect encoding and enhanced exception handling
- Add an option to enable encoding detection on `TextLoader`.
- The detection is done using `chardet`
- The loading is done by trying all detected encodings by order of
confidence or raise an exception otherwise.
### New Dependencies:
- `chardet`
Fixes#4479
## Before submitting
<!-- If you're adding a new integration, include an integration test and
an example notebook showing its use! -->
## Who can review?
Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:
- @eyurtsev
---------
Co-authored-by: blob42 <spike@w530>
# Load specific file types from Google Drive (issue #4878)
Add the possibility to define what file types you want to load from
Google Drive.
```
loader = GoogleDriveLoader(
folder_id="1yucgL9WGgWZdM1TOuKkeghlPizuzMYb5",
file_types=["document", "pdf"]
recursive=False
)
```
Fixes ##4878
## Who can review?
Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:
DataLoaders
- @eyurtsev
Twitter: [@UmerHAdil](https://twitter.com/@UmerHAdil) | Discord:
RicChilligerDude#7589
---------
Co-authored-by: UmerHA <40663591+UmerHA@users.noreply.github.com>
#docs: text splitters improvements
Changes are only in the Jupyter notebooks.
- added links to the source packages and a short description of these
packages
- removed " Text Splitters" suffixes from the TOC elements (they made
the list of the text splitters messy)
- moved text splitters, based on the length function into a separate
list. They can be mixed with any classes from the "Text Splitters", so
it is a different classification.
## Who can review?
@hwchase17 - project lead
@eyurtsev
@vowelparrot
NOTE: please, check out the results of the `Python code` text splitter
example (text_splitters/examples/python.ipynb). It looks suboptimal.
# Added another helpful way for developers who want to set OpenAI API
Key dynamically
Previous methods like exporting environment variables are good for
project-wide settings.
But many use cases need to assign API keys dynamically, recently.
```python
from langchain.llms import OpenAI
llm = OpenAI(openai_api_key="OPENAI_API_KEY")
```
## Before submitting
```bash
export OPENAI_API_KEY="..."
```
Or,
```python
import os
os.environ["OPENAI_API_KEY"] = "..."
```
<hr>
Thank you.
Cheers,
Bongsang
# Documentation for Azure OpenAI embeddings model
- OPENAI_API_VERSION environment variable is needed for the endpoint
- The constructor does not work with model, it works with deployment.
I fixed it in the notebook.
(This is my first contribution)
## Who can review?
@hwchase17
@agola
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
# Docs: improvements in the `retrievers/examples/` notebooks
Its primary purpose is to make the Jupyter notebook examples
**consistent** and more suitable for first-time viewers.
- add links to the integration source (if applicable) with a short
description of this source;
- removed `_retriever` suffix from the file names (where it existed) for
consistency;
- removed ` retriever` from the notebook title (where it existed) for
consistency;
- added code to install necessary Python package(s);
- added code to set up the necessary API Key.
- very small fixes in notebooks from other folders (for consistency):
- docs/modules/indexes/vectorstores/examples/elasticsearch.ipynb
- docs/modules/indexes/vectorstores/examples/pinecone.ipynb
- docs/modules/models/llms/integrations/cohere.ipynb
- fixed misspelling in langchain/retrievers/time_weighted_retriever.py
comment (sorry, about this change in a .py file )
## Who can review
@dev2049
# Fixed typos (issues #4818 & #4668 & more typos)
- At some places, it said `model = ChatOpenAI(model='gpt-3.5-turbo')`
but should be `model = ChatOpenAI(model_name='gpt-3.5-turbo')`
- Fixes some other typos
Fixes#4818, #4668
## Who can review?
Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:
Models
- @hwchase17
- @agola11
Agents / Tools / Toolkits
- @vowelparrot
# Removed usage of deprecated methods
Replaced `SQLDatabaseChain` deprecated direct initialisation with
`from_llm` method
## Who can review?
@hwchase17
@agola11
---------
Co-authored-by: imeckr <chandanroutray2012@gmail.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
- Installation of non-colab packages
- Get API keys
# Added dependencies to make notebook executable on hosted notebooks
## Who can review?
Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:
@hwchase17
@vowelparrot
- Installation of non-colab packages
- Get API keys
- Get rid of warnings
# Cleanup and added dependencies to make notebook executable on hosted
notebooks
@hwchase17
@vowelparrot
The current example in
https://python.langchain.com/en/latest/modules/agents/plan_and_execute.html
has inconsistent reasoning step (observing 28 years and thinking it's 26
years):
```
Observation: 28 years
Thought:Based on my search, Gigi Hadid's current age is 26 years old.
Action:
{
"action": "Final Answer",
"action_input": "Gigi Hadid's current age is 26 years old."
}
```
Guessing this is model noise. Rerunning seems to give correct answer of
28 years.
# Fix Telegram API loader + add tests.
I was testing this integration and it was broken with next error:
```python
message_threads = loader._get_message_threads(df)
KeyError: False
```
Also, this particular loader didn't have any tests / related group in
poetry, so I added those as well.
@hwchase17 / @eyurtsev please take a look on this fix PR.
---------
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
# Cassandra support for chat history
### Description
- Store chat messages in cassandra
### Dependency
- cassandra-driver - Python Module
## Before submitting
- Added Integration Test
## Who can review?
@hwchase17
@agola11
# Your PR Title (What it does)
<!--
Thank you for contributing to LangChain! Your PR will appear in our next
release under the title you set. Please make sure it highlights your
valuable contribution.
Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.
After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.
-->
<!-- Remove if not applicable -->
Fixes # (issue)
## Before submitting
<!-- If you're adding a new integration, include an integration test and
an example notebook showing its use! -->
## Who can review?
Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:
<!-- For a quicker response, figure out the right person to tag with @
@hwchase17 - project lead
Tracing / Callbacks
- @agola11
Async
- @agola11
DataLoaders
- @eyurtsev
Models
- @hwchase17
- @agola11
Agents / Tools / Toolkits
- @vowelparrot
VectorStores / Retrievers / Memory
- @dev2049
-->
Co-authored-by: Jinto Jose <129657162+jj701@users.noreply.github.com>
# Add GraphQL Query Support
This PR introduces a GraphQL API Wrapper tool that allows LLM agents to
query GraphQL databases. The tool utilizes the httpx and gql Python
packages to interact with GraphQL APIs and provides a simple interface
for running queries with LLM agents.
@vowelparrot
---------
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
### Adds a document loader for Docugami
Specifically:
1. Adds a data loader that talks to the [Docugami](http://docugami.com)
API to download processed documents as semantic XML
2. Parses the semantic XML into chunks, with additional metadata
capturing chunk semantics
3. Adds a detailed notebook showing how you can use additional metadata
returned by Docugami for techniques like the [self-querying
retriever](https://python.langchain.com/en/latest/modules/indexes/retrievers/examples/self_query_retriever.html)
4. Adds an integration test, and related documentation
Here is an example of a result that is not possible without the
capabilities added by Docugami (from the notebook):
<img width="1585" alt="image"
src="https://github.com/hwchase17/langchain/assets/749277/bb6c1ce3-13dc-4349-a53b-de16681fdd5b">
---------
Co-authored-by: Taqi Jaffri <tjaffri@docugami.com>
Co-authored-by: Taqi Jaffri <tjaffri@gmail.com>
[OpenWeatherMapAPIWrapper](f70e18a5b3/docs/modules/agents/tools/examples/openweathermap.ipynb)
works wonderfully, but the _tool_ itself can't be used in master branch.
- added OpenWeatherMap **tool** to the public api, to be loadable with
`load_tools` by using "openweathermap-api" tool name (that name is used
in the existing
[docs](aff33d52c5/docs/modules/agents/tools/getting_started.md),
at the bottom of the page)
- updated OpenWeatherMap tool's **description** to make the input format
match what the API expects (e.g. `London,GB` instead of `'London,GB'`)
- added [ecosystem documentation page for
OpenWeatherMap](f9c41594fe/docs/ecosystem/openweathermap.md)
- added tool usage example to [OpenWeatherMap's
notebook](f9c41594fe/docs/modules/agents/tools/examples/openweathermap.ipynb)
Let me know if there's something I missed or something needs to be
updated! Or feel free to make edits yourself if that makes it easier for
you 🙂
[RELLM](https://github.com/r2d4/rellm) is a library that wraps local
HuggingFace pipeline models for structured decoding.
RELLM works by generating tokens one at a time. At each step, it masks
tokens that don't conform to the provided partial regular expression.
[JSONFormer](https://github.com/1rgs/jsonformer) is a bit different, where it sequentially adds the keys then decodes each value directly
**Problem statement:** the
[document_loaders](https://python.langchain.com/en/latest/modules/indexes/document_loaders.html#)
section is too long and hard to comprehend.
**Proposal:** group document_loaders by 3 classes: (see `Files changed`
tab)
UPDATE: I've completely reworked the document_loader classification.
Now this PR changes only one file!
FYI @eyurtsev @hwchase17
[Text Generation
Inference](https://github.com/huggingface/text-generation-inference) is
a Rust, Python and gRPC server for generating text using LLMs.
This pull request add support for self hosted Text Generation Inference
servers.
feature: #4280
---------
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
# Add option to `load_huggingface_tool`
Expose a method to load a huggingface Tool from the HF hub
---------
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
Thanks to @anna-charlotte and @jupyterjazz for the contribution! Made
few small changes to get it across the finish line
---------
Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Signed-off-by: jupyterjazz <saba.sturua@jina.ai>
Co-authored-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Co-authored-by: jupyterjazz <saba.sturua@jina.ai>
Co-authored-by: Saba Sturua <45267439+jupyterjazz@users.noreply.github.com>
# ODF File Loader
Adds a data loader for handling Open Office ODT files. Requires
`unstructured>=0.6.3`.
### Testing
The following should work using the `fake.odt` example doc from the
[`unstructured` repo](https://github.com/Unstructured-IO/unstructured).
```python
from langchain.document_loaders import UnstructuredODTLoader
loader = UnstructuredODTLoader(file_path="fake.odt", mode="elements")
loader.load()
loader = UnstructuredODTLoader(file_path="fake.odt", mode="single")
loader.load()
```
- added `Wikipedia` retriever. It is effectively a wrapper for
`WikipediaAPIWrapper`. It wrapps load() into get_relevant_documents()
- sorted `__all__` in the `retrievers/__init__`
- added integration tests for the WikipediaRetriever
- added an example (as Jupyter notebook) for the WikipediaRetriever
# Minor Wording Documentation Change
```python
agent_chain.run("When's my friend Eric's surname?")
# Answer with 'Zhu'
```
is change to
```python
agent_chain.run("What's my friend Eric's surname?")
# Answer with 'Zhu'
```
I think when is a residual of the old query that was "When’s my friends
Eric`s birthday?".
# Fix grammar in Text Splitters docs
Just a small fix of grammar in the documentation:
"That means there two different axes" -> "That means there are two
different axes"
Related: #4028, I opened a new PR because (1) I was unable to unstage
mistakenly committed files (I'm not familiar with git enough to resolve
this issue), (2) I felt closing the original PR and opening a new PR
would be more appropriate if I changed the class name.
This PR creates HumanInputLLM(HumanLLM in #4028), a simple LLM wrapper
class that returns user input as the response. I also added a simple
Jupyter notebook regarding how and why to use this LLM wrapper. In the
notebook, I went over how to use this LLM wrapper and showed example of
testing `WikipediaQueryRun` using HumanInputLLM.
I believe this LLM wrapper will be useful especially for debugging,
educational or testing purpose.
- Added the `Wikipedia` document loader. It is based on the existing
`unilities/WikipediaAPIWrapper`
- Added a respective ut-s and example notebook
- Sorted list of classes in __init__
- made notebooks consistent: titles, service/format descriptions.
- corrected short names to full names, for example, `Word` -> `Microsoft
Word`
- added missed descriptions
- renamed notebook files to make ToC correctly sorted
This implements a loader of text passages in JSON format. The `jq`
syntax is used to define a schema for accessing the relevant contents
from the JSON file. This requires dependency on the `jq` package:
https://pypi.org/project/jq/.
---------
Signed-off-by: Aivin V. Solatorio <avsolatorio@gmail.com>
In the example for creating a Python REPL tool under the Agent module,
the ".run" was omitted in the example. I believe this is required when
defining a Tool.
In the section `Get Message Completions from a Chat Model` of the quick
start guide, the HumanMessage doesn't need to include `Translate this
sentence from English to French.` when there is a system message.
Simplify HumanMessages in these examples can further demonstrate the
power of LLM.
* implemented arun, results, and aresults. Reuses aiosession if
available.
* helper tools GoogleSerperRun and GoogleSerperResults
* support for Google Images, Places and News (examples given) and
filtering based on time (e.g. past hour)
* updated docs
Single edit to: models/text_embedding/examples/openai.ipynb - Line 88:
changed from: "embeddings = OpenAIEmbeddings(model_name=\"ada\")" to
"embeddings = OpenAIEmbeddings()" as model_name is no longer part of the
OpenAIEmbeddings class.
Seems the pyllamacpp package is no longer the supported bindings from
gpt4all. Tested that this works locally.
Given that the older models weren't very performant, I think it's better
to migrate now without trying to include a lot of try / except blocks
---------
Co-authored-by: Nissan Pow <npow@users.noreply.github.com>
Co-authored-by: Nissan Pow <pownissa@amazon.com>
Modified Modern Treasury and Strip slightly so credentials don't have to
be passed in explicitly. Thanks @mattgmarcus for adding Modern Treasury!
---------
Co-authored-by: Matt Marcus <matt.g.marcus@gmail.com>
Haven't gotten to all of them, but this:
- Updates some of the tools notebooks to actually instantiate a tool
(many just show a 'utility' rather than a tool. More changes to come in
separate PR)
- Move the `Tool` and decorator definitions to `langchain/tools/base.py`
(but still export from `langchain.agents`)
- Add scene explain to the load_tools() function
- Add unit tests for public apis for the langchain.tools and langchain.agents modules
I have added a reddit document loader which fetches the text from the
Posts of Subreddits or Reddit users, using the `praw` Python package. I
have also added an example notebook reddit.ipynb in order to guide users
to use this dataloader.
This code was made in format similar to twiiter document loader. I have
run code formating, linting and also checked the code myself for
different scenarios.
This is my first contribution to an open source project and I am really
excited about this. If you want to suggest some improvements in my code,
I will be happy to do it. :)
Co-authored-by: Taaha Bajwa <taaha.s.bajwa@gmail.com>
This PR includes some minor alignment updates, including:
- metadata object extended to support contractAddress, blockchainType,
and tokenId
- notebook doc better aligned to standard langchain format
- startToken changed from int to str to support multiple hex value types
on the Alchemy API
The updated metadata will look like the below. It's possible for a
single contractAddress to exist across multiple blockchains (e.g.
Ethereum, Polygon, etc.) so it's important to include the
blockchainType.
```
metadata = {"source": self.contract_address,
"blockchain": self.blockchainType,
"tokenId": tokenId}
```
- Added links to the vectorstore providers
- Added installation code (it is not clear that we have to go to the
`LangChan Ecosystem` page to get installation instructions.)
Add other File Utilities, include
- List Directory
- Search for file
- Move
- Copy
- Remove file
Bundle as toolkit
Add a notebook that connects to the Chat Agent, which somewhat supports
multi-arg input tools
Update original read/write files to return the original dir paths and
better handle unsupported file paths.
Add unit tests
Adds a PlayWright web browser toolkit with the following tools:
- NavigateTool (navigate_browser) - navigate to a URL
- NavigateBackTool (previous_page) - wait for an element to appear
- ClickTool (click_element) - click on an element (specified by
selector)
- ExtractTextTool (extract_text) - use beautiful soup to extract text
from the current web page
- ExtractHyperlinksTool (extract_hyperlinks) - use beautiful soup to
extract hyperlinks from the current web page
- GetElementsTool (get_elements) - select elements by CSS selector
- CurrentPageTool (current_page) - get the current page URL
Alternate implementation of #3452 that relies on a generic query
constructor chain and language and then has vector store-specific
translation layer. Still refactoring and updating examples but general
structure is there and seems to work s well as #3452 on exampels
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>