langchain

mirror of https://github.com/hwchase17/langchain synced 2024-11-06 03:20:49 +00:00

Author	SHA1	Message	Date
Harrison Chase	15cdfa9e7f	Harrison/table index (#2526 ) Co-authored-by: Alvaro Sevilla <alvaro@chainalysis.com>	2023-04-06 23:03:09 -07:00
Harrison Chase	704b0feb38	Harrison/allow org none (#2527 )	2023-04-06 23:00:42 -07:00
Alex Iribarren	aecd1c8ee3	Gitbook enhancements (#2279 ) The gitbook importer had some issues while trying to ingest a particular site, these commits allowed it to work as expected. The last commit (`06017ff`) is to open the door to extending this class for other documentation formats (which will come in a future PR).	2023-04-06 22:55:07 -07:00
Harrison Chase	58a93f88da	Harrison/entity store (#2525 ) Co-authored-by: Alex Iribarren <alex.iribarren@gmail.com>	2023-04-06 22:54:38 -07:00
Vashisht Madhavan	aa439ac2ff	Adding an in-context QA evaluation chain + chain of thought reasoning chain for improved accuracy (#2444 ) Right now, eval chains require an answer for every question. It's cumbersome to collect this ground truth so getting around this issue with 2 things: * Adding a context param in `ContextQAEvalChain` and simply evaluating if the question is answered accurately from context * Adding chain of though explanation prompting to improve the accuracy of this w/o GT. This also gets to feature parity with openai/evals which has the same contextual eval w/o GT. TODO in follow-up: * Better prompt inheritance. No need for seperate prompt for CoT reasoning. How can we merge them together --------- Co-authored-by: Vashisht Madhavan <vashishtmadhavan@Vashs-MacBook-Pro.local>	2023-04-06 22:32:41 -07:00
AeroXi	e131156805	set default embedding max token size (#2330 ) #991 has already implemented this convenient feature to prevent exceeding max token limit in embedding model. > By default, this function is deactivated so as not to change the previous behavior. If you specify something like 8191 here, it will work as desired. According to the author, this is not set by default. Until now, the default model in OpenAIEmbeddings's max token size is 8191 tokens, no other openai model has a larger token limit. So I believe it will be better to set this as default value, other wise users may encounter this error and hard to solve it.	2023-04-06 22:32:24 -07:00
Fabian Venturini Cabau	0316900d2f	feat: implements similarity_search_by_vector on Weaviate (#2522 ) This PR implements `similarity_search_by_vector` in the Weaviate vectorstore.	2023-04-06 22:27:47 -07:00
Harrison Chase	5c64b86ba3	Harrison/weaviate retriever (#2524 ) Co-authored-by: Erika Cardenas <110841617+erika-cardenas@users.noreply.github.com>	2023-04-06 22:27:37 -07:00
Tiago De Gaspari	c2f21a519f	Add support to set up openai organizations (#2514 ) Add support for defining the organization of OpenAI, similarly to what is done in the reference code below: ``` import os import openai openai.organization = os.getenv("OPENAI_ORGANIZATION") openai.api_key = os.getenv("OPENAI_API_KEY") ```	2023-04-06 22:23:16 -07:00
William FH	629fda3957	Use JSON rather than JSON5 (#2520 ) Evaluation so far has shown that agents do a reasonable job of emitting `json` blocks as arguments when cued (instead of typescript), and `json` permits the `strict=False` flag to permit control characters, which are likely to appear in the response in particular. This PR makes this change to the request and response synthesizer chains, and fixes the temperature to the OpenAI agent in the eval notebook. It also adds a `raise_error = False` flag in the notebook to facilitate debugging	2023-04-06 21:14:12 -07:00
William FH	f8e4048cd8	Add an Example Evaluation Notebook for the API Chain (#2516 ) Taking the Klarna API as an example, uses evaluation chain's to judge the quality of the request and response synthesizers based on a small set of curated queries. Also updates intermediate steps for chain to emit a dict so each step can be keyed for lookup ![image](https://user-images.githubusercontent.com/13333726/230505771-5cdb4de4-6fe7-4f54-b944-f29d438fa42c.png)	2023-04-06 15:58:41 -07:00
Alex Rad	bd780a8223	Add support for rwkv (#2422 ) This adds support for running RWKV with pytorch. https://github.com/hwchase17/langchain/issues/2398 This does not yet support rwkv.cpp	2023-04-06 14:41:06 -07:00
Harrison Chase	7149d33c71	max time limit for agent (#2513 )	2023-04-06 14:38:34 -07:00
William FH	f240651bd8	Add Request body (#2507 ) This still doesn't handle the following - non-JSON media types - anyOf, allOf, oneOf's And doesn't emit the typescript definitions for referred types yet, but that can be saved for a separate PR. Also, we could have better support for Swagger 2.0 specs and OpenAPI 3.0.3 (can use the same lib for the latter) recommend offline conversion for now.	2023-04-06 13:02:42 -07:00
Zach Jones	13d1df2140	Feature: AgentExecutor execution time limit (#2399 ) `AgentExecutor` already has support for limiting the number of iterations. But the amount of time taken for each iteration can vary quite a bit, so it is difficult to place limits on the execution time. This PR adds a new field `max_execution_time` to the `AgentExecutor` model. When called asynchronously, the agent loop is wrapped in an `asyncio.timeout()` context which triggers the early stopping response if the time limit is reached. When called synchronously, the agent loop checks for both the max_iteration limit and the time limit after each iteration. When used asynchronously `max_execution_time` gives really tight control over the max time for an execution chain. When used synchronously, the chain can unfortunately exceed max_execution_time, but it still gives more control than trying to estimate the number of max_iterations needed to cap the execution time. --------- Co-authored-by: Zachary Jones <zjones@zetaglobal.com>	2023-04-06 12:54:32 -07:00
qued	5b34931948	docs: update unstructured detectron install instructions (#2498 ) Updated recommended `detectron2` version to install for use with `unstructured`. Should now match version in [Unstructured README](https://github.com/Unstructured-IO/unstructured/blob/main/README.md#eight_pointed_black_star-quick-start).	2023-04-06 12:48:19 -07:00
Timon Ruban	f0926bad9f	Fix docstring in indexes/getting-started (#2452 ) Fixed a letter. That's all.	2023-04-06 12:48:08 -07:00
Davit Buniatyan	b4914888a7	Deep Lake upgrade to include attribute search, distance metrics, returning scores and MMR (#2455 ) ### Features include - Metadata based embedding search - Choice of distance metric function (`L2` for Euclidean, `L1` for Nuclear, `max` L-infinity distance, `cos` for cosine similarity, 'dot' for dot product. Defaults to `L2` - Returning scores - Max Marginal Relevance Search - Deleting samples from the dataset ### Notes - Added numerous tests, let me know if you would like to shorten them or make smarter --------- Co-authored-by: Davit Buniatyan <d@activeloop.ai>	2023-04-06 12:47:33 -07:00
Sam Weaver	2ffb90b161	Extend opensearch to better support existing instances (#2500 ) (#2509 ) Closes #2500.	2023-04-06 12:45:56 -07:00
Matt Royer	ad87584c35	Fix 'embeddings is not defined' (#2468 ) Nothing major. The docs just give an error when you try to use `embeddings` instead of `llama`.	2023-04-06 12:45:45 -07:00
leo-gan	fd69cc7e42	Removed duplicate BaseModel dependencies (#2471 ) Removed duplicate BaseModel dependencies in class inheritances. Also, sorted imports by `isort`.	2023-04-06 12:45:16 -07:00
felix-wang	b6a101d121	fix: add jina jupyter notebook (#2477 ) As the title, add the missing link to the example notebook.	2023-04-06 12:42:01 -07:00
Tim Ellison	6f47133d8a	Minor doc typo (#2492 )	2023-04-06 12:41:40 -07:00
Jimmy Comfort	1dfb6a2a44	Update gpt4all example with model param (#2499 ) I am pretty sure that the documentation here should point to `model` instead of `model_path` based on the documentation here: https://github.com/hwchase17/langchain/blob/master/langchain/llms/gpt4all.py#L26	2023-04-06 12:38:26 -07:00
Matt Robinson	270384fb44	fix: pass unstructured kwargs down in all unstructured loaders (#2506 ) ### Summary #1667 updated several Unstructured loaders to accept `unstructured_kwargs` in the `__init__` function. However, the previous PR did not add this functionality to every Unstructured loader. This PR ensures `unstructured_kwargs` are passed in all remaining Unstructured loaders.	2023-04-06 12:29:52 -07:00
Harrison Chase	c913acdb4c	bump version to 133 (#2503 )	2023-04-06 09:53:57 -07:00
Harrison Chase	1e19e004af	Harrison/openapi spec (#2474 ) Co-authored-by: William Fu-Hinthorn <13333726+hinthornw@users.noreply.github.com>	2023-04-06 09:47:37 -07:00
Luk Regarde	60c837c58a	Fix WhatsAppChatLoader regex pattern for 24 hour time format (#2458 ) Fix for 24 hour time format bug. Now whatsapp regex is able to parse either 12 or 24 hours time format. Linked [issue](https://github.com/hwchase17/langchain/issues/2457).	2023-04-06 09:45:14 -07:00
Rostyslav Kinash	3acf423de0	Simple typo fix in openapi agent toolkit (#2502 ) Just typo fix	2023-04-06 09:44:26 -07:00
Harrison Chase	26314d7004	Harrison/openapi parser (#2461 ) Co-authored-by: William FH <13333726+hinthornw@users.noreply.github.com>	2023-04-05 22:19:09 -07:00
Harrison Chase	a9e637b8f5	rfc: multi action agent (#2362 )	2023-04-05 15:28:48 -07:00
Matt Robinson	1140bd79a0	feat: adds support for MSFT Outlook files in `UnstructuredEmailLoader` (#2450 ) ### Summary Adds support for MSFT Outlook emails saved in `.msg` format to `UnstructuredEmailLoader`. Works if the user has `unstructured>=0.5.8` installed. ### Testing The following tests use the example files under `example-docs` in the Unstructured repo. ```python from langchain.document_loaders import UnstructuredEmailLoader loader = UnstructuredEmailLoader("fake-email.eml") loader.load() loader = UnstructuredEmailLoader("fake-email.msg") loader.load() ```	2023-04-05 15:28:14 -07:00
William FH	007babb363	Add a mock server (#2443 ) It's useful to evaluate API Chains against a mock server. This PR makes an example "robot" server that exposes endpoints for the following: - Path, Query, and Request Body argument passing - GET, PUT, and DELETE endpoints exposed OpenAPI spec. Relies on FastAPI + Uvicorn - I could add to the dev dependencies list if you'd like	2023-04-05 10:35:46 -07:00
William FH	c9ae0c5808	Add lint_diff command (#2449 ) It's helpful for developers to run the linter locally on just the changed files. This PR adds support for a `lint_diff` command. Ruff is still run over the entire directory since it's very fast.	2023-04-05 09:34:24 -07:00
Harrison Chase	3d871853df	bump version to 132 (#2441 )	2023-04-05 07:54:01 -07:00
Harrison Chase	00bc8df640	Harrison/tfidf retriever (#2440 )	2023-04-05 07:36:49 -07:00
researchonly	a63cfad558	fixed typo Teplate -> Template (#2433 ) fixed a typo in the documentation	2023-04-05 06:56:51 -07:00
Bill Chambers	f0d4f36219	Documentation Error - Typo in Docs - Update custom_mrkl_agent.ipynb (#2437 ) Just a small typo in the documentation.	2023-04-05 06:56:39 -07:00
sergerdn	b410dc76aa	fix: elasticsearch (#2402 ) - Create a new docker-compose file to start an Elasticsearch instance for integration tests. - Add new tests to `test_elasticsearch.py` to verify Elasticsearch functionality. - Include an optional group `test_integration` in the `pyproject.toml` file. This group should contain dependencies for integration tests and can be installed using the command `poetry install --with test_integration`. Any new dependencies should be added by running `poetry add some_new_deps --group "test_integration" ` Note: New tests running in live mode, which involve end-to-end testing of the OpenAI API. In the future, adding `pytest-vcr` to record and replay all API requests would be a nice feature for testing process.More info: https://pytest-vcr.readthedocs.io/en/latest/ Fixes https://github.com/hwchase17/langchain/issues/2386	2023-04-05 06:51:32 -07:00
Ankush Gola	4d730a9bbc	improve `AsyncCallbackManager` (#2410 )	2023-04-05 09:31:42 +02:00
Harrison Chase	af7f20fa42	Harrison/elastic search (#2419 )	2023-04-04 21:29:06 -07:00
Adam Gutglick	659c67e896	Don't create a new Pinecone index if doesn't exist (#2414 ) In the case no pinecone index is specified, or a wrong one is, do not create a new one. Creating new indexes can cause unexpected costs to users, and some code paths could cause a new one to be created on each invocation. This PR solves #2413.	2023-04-04 20:42:27 -07:00
Andrei	e519a81a05	Update LlamaCpp parameters (#2411 ) Add `n_batch` and `last_n_tokens_size` parameters to the LlamaCpp class. These parameters (epecially `n_batch`) significantly effect performance. There's also a `verbose` flag that prints system timings on the `Llama` class but I wasn't sure where to add this as it conflicts with (should be pulled from?) the LLM base class.	2023-04-04 19:52:33 -07:00
jerwelborn	b026a62bc4	hierarchical planning agent for multi-step queries against larger openapi specs (#2170 ) The specs used in chat-gpt plugins have only a few endpoints and have unrealistically small specifications. By contrast, a spec like spotify's has 60+ endpoints and is comprised 100k+ tokens. Here are some impressive traces from gpt-4 that string together non-trivial sequences of API calls. As noted in `planner.py`, gpt-3 is not as robust but can be improved with i) better retry, self-reflect, etc. logic and ii) better few-shots iii) etc. This PR's just a first attempt probing a few different directions that eventually can be made more core. `make me a playlist with songs from kind of blue. call it machine blues.` ``` > Entering new AgentExecutor chain... Action: api_planner Action Input: I need to find the right API calls to create a playlist with songs from Kind of Blue and name it Machine Blues Observation: 1. GET /search to find the album ID for "Kind of Blue". 2. GET /albums/{id}/tracks to get the tracks from the "Kind of Blue" album. 3. GET /me to get the current user's ID. 4. POST /users/{user_id}/playlists to create a new playlist named "Machine Blues" for the current user. 5. POST /playlists/{playlist_id}/tracks to add the tracks from "Kind of Blue" to the newly created "Machine Blues" playlist. Thought:I have a plan to create the playlist. Now, I will execute the API calls. Action: api_controller Action Input: 1. GET /search to find the album ID for "Kind of Blue". 2. GET /albums/{id}/tracks to get the tracks from the "Kind of Blue" album. 3. GET /me to get the current user's ID. 4. POST /users/{user_id}/playlists to create a new playlist named "Machine Blues" for the current user. 5. POST /playlists/{playlist_id}/tracks to add the tracks from "Kind of Blue" to the newly created "Machine Blues" playlist. > Entering new AgentExecutor chain... Action: requests_get Action Input: {"url": "https://api.spotify.com/v1/search?q=Kind%20of%20Blue&type=album", "output_instructions": "Extract the id of the first album in the search results"} Observation: 1weenld61qoidwYuZ1GESA Thought:Action: requests_get Action Input: {"url": "https://api.spotify.com/v1/albums/1weenld61qoidwYuZ1GESA/tracks", "output_instructions": "Extract the ids of all the tracks in the album"} Observation: ["7q3kkfAVpmcZ8g6JUThi3o"] Thought:Action: requests_get Action Input: {"url": "https://api.spotify.com/v1/me", "output_instructions": "Extract the id of the current user"} Observation: 22rhrz4m4kvpxlsb5hezokzwi Thought:Action: requests_post Action Input: {"url": "https://api.spotify.com/v1/users/22rhrz4m4kvpxlsb5hezokzwi/playlists", "data": {"name": "Machine Blues"}, "output_instructions": "Extract the id of the newly created playlist"} Observation: 48YP9TMcEtFu9aGN8n10lg Thought:Action: requests_post Action Input: {"url": "https://api.spotify.com/v1/playlists/48YP9TMcEtFu9aGN8n10lg/tracks", "data": {"uris": ["spotify:track:7q3kkfAVpmcZ8g6JUThi3o"]}, "output_instructions": "Confirm that the tracks were added to the playlist"} Observation: The tracks were added to the playlist. The snapshot_id is "Miw4NTdmMWUxOGU5YWMxMzVmYmE3ZWE5MWZlYWNkMTc2NGVmNTI1ZjY5". Thought:I am finished executing the plan. Final Answer: The tracks from the "Kind of Blue" album have been added to the newly created "Machine Blues" playlist. The playlist ID is 48YP9TMcEtFu9aGN8n10lg. > Finished chain. Observation: The tracks from the "Kind of Blue" album have been added to the newly created "Machine Blues" playlist. The playlist ID is 48YP9TMcEtFu9aGN8n10lg. Thought:I am finished executing the plan and have created the playlist with songs from Kind of Blue, named Machine Blues. Final Answer: I have created a playlist called "Machine Blues" with songs from the "Kind of Blue" album. The playlist ID is 48YP9TMcEtFu9aGN8n10lg. > Finished chain. ``` or `give me a song in the style of tobe nwige` ``` > Entering new AgentExecutor chain... Action: api_planner Action Input: I need to find the right API calls to get a song in the style of Tobe Nwigwe Observation: 1. GET /search to find the artist ID for Tobe Nwigwe. 2. GET /artists/{id}/related-artists to find similar artists to Tobe Nwigwe. 3. Pick one of the related artists and use their artist ID in the next step. 4. GET /artists/{id}/top-tracks to get the top tracks of the chosen related artist. Thought: I'm ready to execute the API calls. Action: api_controller Action Input: 1. GET /search to find the artist ID for Tobe Nwigwe. 2. GET /artists/{id}/related-artists to find similar artists to Tobe Nwigwe. 3. Pick one of the related artists and use their artist ID in the next step. 4. GET /artists/{id}/top-tracks to get the top tracks of the chosen related artist. > Entering new AgentExecutor chain... Action: requests_get Action Input: {"url": "https://api.spotify.com/v1/search?q=Tobe%20Nwigwe&type=artist", "output_instructions": "Extract the artist id for Tobe Nwigwe"} Observation: 3Qh89pgJeZq6d8uM1bTot3 Thought:Action: requests_get Action Input: {"url": "https://api.spotify.com/v1/artists/3Qh89pgJeZq6d8uM1bTot3/related-artists", "output_instructions": "Extract the ids and names of the related artists"} Observation: [ { "id": "75WcpJKWXBV3o3cfluWapK", "name": "Lute" }, { "id": "5REHfa3YDopGOzrxwTsPvH", "name": "Deante' Hitchcock" }, { "id": "6NL31G53xThQXkFs7lDpL5", "name": "Rapsody" }, { "id": "5MbNzCW3qokGyoo9giHA3V", "name": "EARTHGANG" }, { "id": "7Hjbimq43OgxaBRpFXic4x", "name": "Saba" }, { "id": "1ewyVtTZBqFYWIcepopRhp", "name": "Mick Jenkins" } ] Thought:Action: requests_get Action Input: {"url": "https://api.spotify.com/v1/artists/75WcpJKWXBV3o3cfluWapK/top-tracks?country=US", "output_instructions": "Extract the ids and names of the top tracks"} Observation: [ { "id": "6MF4tRr5lU8qok8IKaFOBE", "name": "Under The Sun (with J. Cole & Lute feat. DaBaby)" } ] Thought:I am finished executing the plan. Final Answer: The top track of the related artist Lute is "Under The Sun (with J. Cole & Lute feat. DaBaby)" with the track ID "6MF4tRr5lU8qok8IKaFOBE". > Finished chain. Observation: The top track of the related artist Lute is "Under The Sun (with J. Cole & Lute feat. DaBaby)" with the track ID "6MF4tRr5lU8qok8IKaFOBE". Thought:I am finished executing the plan and have the information the user asked for. Final Answer: The song "Under The Sun (with J. Cole & Lute feat. DaBaby)" by Lute is in the style of Tobe Nwigwe. > Finished chain. ``` --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-04-04 19:49:42 -07:00
jerwelborn	d6d6f322a9	Fix requests wrapper refactor (#2417 ) https://github.com/hwchase17/langchain/pull/2367	2023-04-04 18:22:35 -07:00
Harrison Chase	41832042cc	Harrison/pinecone hybrid (#2405 )	2023-04-04 14:09:57 -07:00
Harrison Chase	2b975de94d	add metal retriever (#2244 )	2023-04-04 12:17:13 -07:00
Harrison Chase	1f88b11c99	replicate cleanup (#2394 )	2023-04-04 12:15:03 -07:00
Harrison Chase	f5da9a5161	cr	2023-04-04 07:26:47 -07:00
Harrison Chase	8a4709582f	cr	2023-04-04 07:25:28 -07:00

1 2 3 4 5 ...

1149 Commits