I was testing out the WhatsApp Document loader, and noticed that
sometimes the date is of the following format (notice the additional
underscore):
```
3/24/23, 1:54_PM - +91 99999 99999 joined using this group's invite link
3/24/23, 6:29_PM - +91 99999 99999: When are we starting then?
```
Wierdly, the underscore is visible in Vim, but not on editors like
VSCode. I presume it is some unusual character/line terminator.
Nevertheless, I think handling this edge case will make the document
loader more robust.
Adds a loader for Slack Exports which can be a very valuable source of
knowledge to use for internal QA bots and other use cases.
```py
# Export data from your Slack Workspace first.
from langchain.document_loaders import SLackDirectoryLoader
SLACK_WORKSPACE_URL = "https://awesome.slack.com"
loader = ("Slack_Exports", SLACK_WORKSPACE_URL)
docs = loader.load()
```
---------
Co-authored-by: Mikhail Dubov <mikhail@chattermill.io>
This PR proposes
- An NLAToolkit method to instantiate from an AI Plugin URL
- A notebook that shows how to use that alongside an example of using a
Retriever object to lookup specs and route queries to them on the fly
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
I've added a bilibili loader, bilibili is a very active video site in
China and I think we need this loader.
Example:
```python
from langchain.document_loaders.bilibili import BiliBiliLoader
loader = BiliBiliLoader(
["https://www.bilibili.com/video/BV1xt411o7Xu/",
"https://www.bilibili.com/video/av330407025/"]
)
docs = loader.load()
```
Co-authored-by: 了空 <568250549@qq.com>
This PR adds a LangChain implementation of CAMEL role-playing example:
https://github.com/lightaime/camel.
I am sorry that I am not that familiar with LangChain. So I only
implement it in a naive way. There may be a better way to implement it.
**Description**
Add custom vector field name and text field name while indexing and
querying for OpenSearch
**Issues**
https://github.com/hwchase17/langchain/issues/2500
Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
Hi,
just wanted to mention that I added `langchain` to
[conda-forge](https://github.com/conda-forge/langchain-feedstock), so
that it can be installed with `conda`/`mamba` etc.
This makes it available to some corporate users with custom
conda-servers and people who like to manage their python envs with
conda.
Took me a bit to find the proper places to get the API keys. The link
earlier provided to setup search is still good, but why not provide
direct link to the Google cloud tools that give you ability to create
keys?
`combine_docs` does not go through the standard chain call path which
means that chain callbacks won't be triggered, meaning QA chains won't
be traced properly, this fixes that.
Also fix several errors in the chat_vector_db notebook
Adds a new pdf loader using the existing dependency on PDFMiner.
The new loader can be helpful for chunking texts semantically into
sections as the output html content can be parsed via `BeautifulSoup` to
get more structured and rich information about font size, page numbers,
pdf headers/footers, etc. which may not be available otherwise with
other pdf loaders
Improvements to Deep Lake Vector Store
- much faster view loading of embeddings after filters with
`fetch_chunks=True`
- 2x faster ingestion
- use np.float32 for embeddings to save 2x storage, LZ4 compression for
text and metadata storage (saves up to 4x storage for text data)
- user defined functions as filters
Docs
- Added retriever full example for analyzing twitter the-algorithm
source code with GPT4
- Added a use case for code analysis (please let us know your thoughts
how we can improve it)
---------
Co-authored-by: Davit Buniatyan <d@activeloop.ai>
## Why this PR?
Fixes#2624
There's a missing import statement in AzureOpenAI embeddings example.
## What's new in this PR?
- Import `OpenAIEmbeddings` before creating it's object.
## How it's tested?
- By running notebook and creating embedding object.
Signed-off-by: letmerecall <girishsharma001@gmail.com>
Right now, eval chains require an answer for every question. It's
cumbersome to collect this ground truth so getting around this issue
with 2 things:
* Adding a context param in `ContextQAEvalChain` and simply evaluating
if the question is answered accurately from context
* Adding chain of though explanation prompting to improve the accuracy
of this w/o GT.
This also gets to feature parity with openai/evals which has the same
contextual eval w/o GT.
TODO in follow-up:
* Better prompt inheritance. No need for seperate prompt for CoT
reasoning. How can we merge them together
---------
Co-authored-by: Vashisht Madhavan <vashishtmadhavan@Vashs-MacBook-Pro.local>
Evaluation so far has shown that agents do a reasonable job of emitting
`json` blocks as arguments when cued (instead of typescript), and `json`
permits the `strict=False` flag to permit control characters, which are
likely to appear in the response in particular.
This PR makes this change to the request and response synthesizer
chains, and fixes the temperature to the OpenAI agent in the eval
notebook. It also adds a `raise_error = False` flag in the notebook to
facilitate debugging
This still doesn't handle the following
- non-JSON media types
- anyOf, allOf, oneOf's
And doesn't emit the typescript definitions for referred types yet, but
that can be saved for a separate PR.
Also, we could have better support for Swagger 2.0 specs and OpenAPI
3.0.3 (can use the same lib for the latter) recommend offline conversion
for now.
### Features include
- Metadata based embedding search
- Choice of distance metric function (`L2` for Euclidean, `L1` for
Nuclear, `max` L-infinity distance, `cos` for cosine similarity, 'dot'
for dot product. Defaults to `L2`
- Returning scores
- Max Marginal Relevance Search
- Deleting samples from the dataset
### Notes
- Added numerous tests, let me know if you would like to shorten them or
make smarter
---------
Co-authored-by: Davit Buniatyan <d@activeloop.ai>
The specs used in chat-gpt plugins have only a few endpoints and have
unrealistically small specifications. By contrast, a spec like spotify's
has 60+ endpoints and is comprised 100k+ tokens.
Here are some impressive traces from gpt-4 that string together
non-trivial sequences of API calls. As noted in `planner.py`, gpt-3 is
not as robust but can be improved with i) better retry, self-reflect,
etc. logic and ii) better few-shots iii) etc. This PR's just a first
attempt probing a few different directions that eventually can be made
more core.
`make me a playlist with songs from kind of blue. call it machine
blues.`
```
> Entering new AgentExecutor chain...
Action: api_planner
Action Input: I need to find the right API calls to create a playlist with songs from Kind of Blue and name it Machine Blues
Observation: 1. GET /search to find the album ID for "Kind of Blue".
2. GET /albums/{id}/tracks to get the tracks from the "Kind of Blue" album.
3. GET /me to get the current user's ID.
4. POST /users/{user_id}/playlists to create a new playlist named "Machine Blues" for the current user.
5. POST /playlists/{playlist_id}/tracks to add the tracks from "Kind of Blue" to the newly created "Machine Blues" playlist.
Thought:I have a plan to create the playlist. Now, I will execute the API calls.
Action: api_controller
Action Input: 1. GET /search to find the album ID for "Kind of Blue".
2. GET /albums/{id}/tracks to get the tracks from the "Kind of Blue" album.
3. GET /me to get the current user's ID.
4. POST /users/{user_id}/playlists to create a new playlist named "Machine Blues" for the current user.
5. POST /playlists/{playlist_id}/tracks to add the tracks from "Kind of Blue" to the newly created "Machine Blues" playlist.
> Entering new AgentExecutor chain...
Action: requests_get
Action Input: {"url": "https://api.spotify.com/v1/search?q=Kind%20of%20Blue&type=album", "output_instructions": "Extract the id of the first album in the search results"}
Observation: 1weenld61qoidwYuZ1GESA
Thought:Action: requests_get
Action Input: {"url": "https://api.spotify.com/v1/albums/1weenld61qoidwYuZ1GESA/tracks", "output_instructions": "Extract the ids of all the tracks in the album"}
Observation: ["7q3kkfAVpmcZ8g6JUThi3o"]
Thought:Action: requests_get
Action Input: {"url": "https://api.spotify.com/v1/me", "output_instructions": "Extract the id of the current user"}
Observation: 22rhrz4m4kvpxlsb5hezokzwi
Thought:Action: requests_post
Action Input: {"url": "https://api.spotify.com/v1/users/22rhrz4m4kvpxlsb5hezokzwi/playlists", "data": {"name": "Machine Blues"}, "output_instructions": "Extract the id of the newly created playlist"}
Observation: 48YP9TMcEtFu9aGN8n10lg
Thought:Action: requests_post
Action Input: {"url": "https://api.spotify.com/v1/playlists/48YP9TMcEtFu9aGN8n10lg/tracks", "data": {"uris": ["spotify:track:7q3kkfAVpmcZ8g6JUThi3o"]}, "output_instructions": "Confirm that the tracks were added to the playlist"}
Observation: The tracks were added to the playlist. The snapshot_id is "Miw4NTdmMWUxOGU5YWMxMzVmYmE3ZWE5MWZlYWNkMTc2NGVmNTI1ZjY5".
Thought:I am finished executing the plan.
Final Answer: The tracks from the "Kind of Blue" album have been added to the newly created "Machine Blues" playlist. The playlist ID is 48YP9TMcEtFu9aGN8n10lg.
> Finished chain.
Observation: The tracks from the "Kind of Blue" album have been added to the newly created "Machine Blues" playlist. The playlist ID is 48YP9TMcEtFu9aGN8n10lg.
Thought:I am finished executing the plan and have created the playlist with songs from Kind of Blue, named Machine Blues.
Final Answer: I have created a playlist called "Machine Blues" with songs from the "Kind of Blue" album. The playlist ID is 48YP9TMcEtFu9aGN8n10lg.
> Finished chain.
```
or
`give me a song in the style of tobe nwige`
```
> Entering new AgentExecutor chain...
Action: api_planner
Action Input: I need to find the right API calls to get a song in the style of Tobe Nwigwe
Observation: 1. GET /search to find the artist ID for Tobe Nwigwe.
2. GET /artists/{id}/related-artists to find similar artists to Tobe Nwigwe.
3. Pick one of the related artists and use their artist ID in the next step.
4. GET /artists/{id}/top-tracks to get the top tracks of the chosen related artist.
Thought:
I'm ready to execute the API calls.
Action: api_controller
Action Input: 1. GET /search to find the artist ID for Tobe Nwigwe.
2. GET /artists/{id}/related-artists to find similar artists to Tobe Nwigwe.
3. Pick one of the related artists and use their artist ID in the next step.
4. GET /artists/{id}/top-tracks to get the top tracks of the chosen related artist.
> Entering new AgentExecutor chain...
Action: requests_get
Action Input: {"url": "https://api.spotify.com/v1/search?q=Tobe%20Nwigwe&type=artist", "output_instructions": "Extract the artist id for Tobe Nwigwe"}
Observation: 3Qh89pgJeZq6d8uM1bTot3
Thought:Action: requests_get
Action Input: {"url": "https://api.spotify.com/v1/artists/3Qh89pgJeZq6d8uM1bTot3/related-artists", "output_instructions": "Extract the ids and names of the related artists"}
Observation: [
{
"id": "75WcpJKWXBV3o3cfluWapK",
"name": "Lute"
},
{
"id": "5REHfa3YDopGOzrxwTsPvH",
"name": "Deante' Hitchcock"
},
{
"id": "6NL31G53xThQXkFs7lDpL5",
"name": "Rapsody"
},
{
"id": "5MbNzCW3qokGyoo9giHA3V",
"name": "EARTHGANG"
},
{
"id": "7Hjbimq43OgxaBRpFXic4x",
"name": "Saba"
},
{
"id": "1ewyVtTZBqFYWIcepopRhp",
"name": "Mick Jenkins"
}
]
Thought:Action: requests_get
Action Input: {"url": "https://api.spotify.com/v1/artists/75WcpJKWXBV3o3cfluWapK/top-tracks?country=US", "output_instructions": "Extract the ids and names of the top tracks"}
Observation: [
{
"id": "6MF4tRr5lU8qok8IKaFOBE",
"name": "Under The Sun (with J. Cole & Lute feat. DaBaby)"
}
]
Thought:I am finished executing the plan.
Final Answer: The top track of the related artist Lute is "Under The Sun (with J. Cole & Lute feat. DaBaby)" with the track ID "6MF4tRr5lU8qok8IKaFOBE".
> Finished chain.
Observation: The top track of the related artist Lute is "Under The Sun (with J. Cole & Lute feat. DaBaby)" with the track ID "6MF4tRr5lU8qok8IKaFOBE".
Thought:I am finished executing the plan and have the information the user asked for.
Final Answer: The song "Under The Sun (with J. Cole & Lute feat. DaBaby)" by Lute is in the style of Tobe Nwigwe.
> Finished chain.
```
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
This PR updates Qdrant to 1.1.1 and introduces local mode, so there is
no need to spin up the Qdrant server. By that occasion, the Qdrant
example notebooks also got updated, covering more cases and answering
some commonly asked questions. All the Qdrant's integration tests were
switched to local mode, so no Docker container is required to launch
them.
This pull request adds an enum class for the various types of agents
used in the project, located in the `agent_types.py` file. Currently,
the project is using hardcoded strings for the initialization of these
agents, which can lead to errors and make the code harder to maintain.
With the introduction of the new enums, the code will be more readable
and less error-prone.
The new enum members include:
- ZERO_SHOT_REACT_DESCRIPTION
- REACT_DOCSTORE
- SELF_ASK_WITH_SEARCH
- CONVERSATIONAL_REACT_DESCRIPTION
- CHAT_ZERO_SHOT_REACT_DESCRIPTION
- CHAT_CONVERSATIONAL_REACT_DESCRIPTION
In this PR, I have also replaced the hardcoded strings with the
appropriate enum members throughout the codebase, ensuring a smooth
transition to the new approach.