langchain

mirror of https://github.com/hwchase17/langchain synced 2024-11-08 07:10:35 +00:00

Author	SHA1	Message	Date
Joseph McElroy	eac4ddb4bb	Elasticsearch Store Improvements (#8636 ) Todo: - [x] Connection options (cloud, localhost url, es_connection) support - [x] Logging support - [x] Customisable field support - [x] Distance Similarity support - [x] Metadata support - [x] Metadata Filter support - [x] Retrieval Strategies - [x] Approx - [x] Approx with Hybrid - [x] Exact - [x] Custom - [x] ELSER (excluding hybrid as we are working on RRF support) - [x] integration tests - [x] Documentation 👋 this is a contribution to improve Elasticsearch integration with Langchain. Its based loosely on the changes that are in master but with some notable changes: ## Package name & design improvements The import name is now `ElasticsearchStore`, to aid discoverability of the VectorStore. ```py ## Before from langchain.vectorstores.elastic_vector_search import ElasticVectorSearch, ElasticKnnSearch ## Now from langchain.vectorstores.elasticsearch import ElasticsearchStore ``` ## Retrieval Strategy support Before we had a number of classes, depending on the strategy you wanted. `ElasticKnnSearch` for approx, `ElasticVectorSearch` for exact / brute force. With `ElasticsearchStore` we have retrieval strategies: ### Approx Example Default strategy for the vast majority of developers who use Elasticsearch will be inferring the embeddings from outside of Elasticsearch. Uses KNN functionality of _search. ```py texts = ["foo", "bar", "baz"] docsearch = ElasticsearchStore.from_texts( texts, FakeEmbeddings(), es_url="http://localhost:9200", index_name="sample-index" ) output = docsearch.similarity_search("foo", k=1) ``` ### Approx, with hybrid Developers who want to search, using both the embedding and the text bm25 match. Its simple to enable. ```py texts = ["foo", "bar", "baz"] docsearch = ElasticsearchStore.from_texts( texts, FakeEmbeddings(), es_url="http://localhost:9200", index_name="sample-index", strategy=ElasticsearchStore.ApproxRetrievalStrategy(hybrid=True) ) output = docsearch.similarity_search("foo", k=1) ``` ### Approx, with `query_model_id` Developers who want to infer within Elasticsearch, using the model loaded in the ml node. This relies on the developer to setup the pipeline and index if they wish to embed the text in Elasticsearch. Example of this in the test. ```py texts = ["foo", "bar", "baz"] docsearch = ElasticsearchStore.from_texts( texts, FakeEmbeddings(), es_url="http://localhost:9200", index_name="sample-index", strategy=ElasticsearchStore.ApproxRetrievalStrategy( query_model_id="sentence-transformers__all-minilm-l6-v2" ), ) output = docsearch.similarity_search("foo", k=1) ``` ### I want to provide my own custom Elasticsearch Query You might want to have more control over the query, to perform multi-phase retrieval such as LTR, linearly boosting on document parameters like recently updated or geo-distance. You can do this with `custom_query_fn` ```py def my_custom_query(query_body: dict, query: str) -> dict: return {"query": {"match": {"text": {"query": "bar"}}}} texts = ["foo", "bar", "baz"] docsearch = ElasticsearchStore.from_texts( texts, FakeEmbeddings(), **elasticsearch_connection, index_name=index_name ) docsearch.similarity_search("foo", k=1, custom_query=my_custom_query) ``` ### Exact Example Developers who have a small dataset in Elasticsearch, dont want the cost of indexing the dims vs tradeoff on cost at query time. Uses script_score. ```py texts = ["foo", "bar", "baz"] docsearch = ElasticsearchStore.from_texts( texts, FakeEmbeddings(), es_url="http://localhost:9200", index_name="sample-index", strategy=ElasticsearchStore.ExactRetrievalStrategy(), ) output = docsearch.similarity_search("foo", k=1) ``` ### ELSER Example Elastic provides its own sparse vector model called ELSER. With these changes, its really easy to use. The vector store creates a pipeline and index thats setup for ELSER. All the developer needs to do is configure, ingest and query via langchain tooling. ```py texts = ["foo", "bar", "baz"] docsearch = ElasticsearchStore.from_texts( texts, FakeEmbeddings(), es_url="http://localhost:9200", index_name="sample-index", strategy=ElasticsearchStore.SparseVectorStrategy(), ) output = docsearch.similarity_search("foo", k=1) ``` ## Architecture In future, we can introduce new strategies and allow us to not break bwc as we evolve the index / query strategy. ## Credit On release, could you credit @elastic and @phoey1 please? Thank you! --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-14 23:42:35 -07:00
Harrison Chase	71d5b7c9bf	Harrison/fallbacks (#9233 ) Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-14 18:27:38 -07:00
Lance Martin	41279a3ae1	Move self-check use case to "more" section (#9137 ) Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-14 18:27:28 -07:00
Lance Martin	22858d99b5	Move code-writing use case to "more" section (#9134 ) Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-14 18:27:19 -07:00
Bagatur	249d7d06a2	adapter doc nit (#9234 )	2023-08-14 18:26:37 -07:00
Divyansh Garg	9529483c2a	Improve MultiOn client toolkit prompts (#9222 ) - Updated prompts for the MultiOn toolkit for better functionality - Non-blocking but good to have it merged to improve the overall performance for the toolkit @hinthornw @hwchase17 --------- Co-authored-by: Naman Garg <ngarg3@binghamton.edu>	2023-08-14 17:39:51 -07:00
Lance Martin	969e1683de	Move graph use case to "more" section (#8997 ) Clean `use_cases` by moving the `GraphDB` to `integrations`. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-14 17:20:38 -07:00
William FH	c478fc208e	Default On Retry (#9230 ) Base callbacks don't have a default on retry event Fix #8542 --------- Co-authored-by: landonsilla <landon.silla@stepstone.com>	2023-08-14 16:45:17 -07:00
Lance Martin	d0a0d560ad	Minor formatting on Web Research Use Case (#9221 )	2023-08-14 16:29:36 -07:00
Leonid Ganeline	93dd499997	docstrings: `document_loaders` consistency 3 (#9216 ) Updated docstrings into the consistent format (probably, the last update for the `document_loaders`.	2023-08-14 16:28:39 -07:00
Kshitij Wadhwa	a69cb95850	track langchain usage for Rockset (#9229 ) Add ability to track langchain usage for Rockset. Rockset's new python client allows setting this. To prevent old clients from failing, it ignore if setting throws exception (we can't track old versions) Tested locally with old and new Rockset python client cc @baskaryan	2023-08-14 16:27:34 -07:00
Leonid Ganeline	7810ea5812	docstrings: `chat_models` consistency (#9227 ) Updated docstrings into the consistent format.	2023-08-14 16:15:56 -07:00
William FH	b0896210c7	Return feedback with failed response if there's an error (#9223 ) In Evals	2023-08-14 15:59:16 -07:00
William FH	7124f2ebfa	Parent Doc Retriever (#9214 ) 2 things: - Implement the private method rather than the public one so callbacks are handled properly - Add search_kwargs (Open to not adding this if we are trying to deprecate this UX but seems like as a user i'd assume similar args to the vector store retriever. In fact some may assume this implements the same interface but I'm not dealing with that here) -	2023-08-14 15:41:53 -07:00
Lance Martin	17ae2998e7	Update Ollama docs (#9220 ) Based on discussion w/ team.	2023-08-14 13:56:16 -07:00
Harrison Chase	3f601b5809	add async method in (#9204 )	2023-08-14 11:04:31 -07:00
Clark	03ea0762a1	fix(jinachat): related to #9197 (#9200 ) related to: https://github.com/langchain-ai/langchain/issues/9197 --------- Co-authored-by: qianjun.wqj <qianjun.wqj@alibaba-inc.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-14 11:04:20 -07:00
Eugene Yurtsev	4f1feaca83	Wrap OpenAPI features in conditionals for pydantic v2 compatibility (#9205 ) Wrap OpenAPI in conditionals for pydantic v2 compatibility.	2023-08-14 13:40:58 -04:00
Glauco Custódio	89be10f6b4	add ttl to RedisCache (#9068 ) Add `ttl` (time to live) to `RedisCache`	2023-08-14 12:59:18 -04:00
Eugene Yurtsev	04bc5f3b18	Conditionally add pydantic v1 to namespace (#9202 ) Conditionally add pydantic_v1 to namespace.	2023-08-14 11:26:45 -04:00
shibuiwilliam	feec422bf7	fix logging to logger (#9192 ) # What - fix logging to logger	2023-08-14 08:21:09 -07:00
Bagatur	5935767056	bump lc 246, lce 9 (#9207 )	2023-08-14 08:14:37 -07:00
Bagatur	b5a57acf6c	lite llm lint (#9208 )	2023-08-14 11:03:06 -04:00
Krish Dholakia	49f1d8477c	Adding ChatLiteLLM model (#9020 ) Description: Adding a langchain integration for the LiteLLM library Tag maintainer: @hwchase17, @baskaryan Twitter handle: @krrish_dh / @Berri_AI --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-14 07:43:40 -07:00
Emmanuel Gautier	f11e5442d6	docs: update LlamaCpp input args (#9173 ) This PR only updates the LlamaCpp args documentation. The input arg has been flattened.	2023-08-14 07:42:03 -07:00
Eugene Yurtsev	72f9150a50	Update 2 more pydantic imports (#9203 ) Update two more pydantic imports to use v1 explicitly	2023-08-14 10:11:30 -04:00
Eugene Yurtsev	c172f972ea	Create pydantic v1 namespace, add partial compatibility for pydantic v2 (#9123 ) First of a few PRs to add full compatibility to both pydantic v1 and v2. This PR creates pydantic v1 namespace and adds it to sys.modules. Upcoming changes: 1. Handle `openapi-schema-pydantic = "^1.2"` and dependent chains/tools 2. bump dependencies to versions that are cross compatible for pydantic or remove them (see below) 3. Add tests to github workflows to test with pydantic v1 and v2 Dependencies From a quick look (could be wrong since was done manually) dependencies pinning pydantic below 2 (some of these can be bumped to newer versions are provide cross-compatible code) anthropic bentoml confection fastapi langsmith octoai-sdk openapi-schema-pydantic qdrant-client spacy steamship thinc zep-python Unpinned marqo () nomic () xinference(*)	2023-08-14 09:37:32 -04:00
Evan Schultz	8189dea0d8	Fixes typing issues in BaseOpenAI (#9183 ) ## Description: Sets default values for `client` and `model` attributes in the BaseOpenAI class to fix Pylance Typing issue. - Issue: #9182. - Twitter handle: @evanmschultz	2023-08-13 23:03:28 -07:00
Massimiliano Pronesti	d95eeaedbe	feat(llms): support vLLM's OpenAI-compatible server (#9179 ) This PR aims at supporting [vLLM's OpenAI-compatible server feature](https://vllm.readthedocs.io/en/latest/getting_started/quickstart.html#openai-compatible-server), i.e. allowing to call vLLM's LLMs like if they were OpenAI's. I've also udpated the related notebook providing an example usage. At the moment, vLLM only supports the `Completion` API.	2023-08-13 23:03:05 -07:00
Michael Goin	621da3c164	Adds DeepSparse as an LLM (#9184 ) Adds [DeepSparse](https://github.com/neuralmagic/deepsparse) as an LLM backend. DeepSparse supports running various open-source sparsified models hosted on [SparseZoo](https://sparsezoo.neuralmagic.com/) for performance gains on CPUs. Twitter handles: @mgoin_ @neuralmagic --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-13 22:35:58 -07:00
Bagatur	0fa69d8988	Bagatur/zep python 1.0 (#9186 ) Co-authored-by: Daniel Chalef <131175+danielchalef@users.noreply.github.com>	2023-08-13 21:52:53 -07:00
Eugene Yurtsev	9b24f0b067	Enhance deprecation decorator to modify docs with sphinx directives (#9069 ) Enhance deprecation decorator	2023-08-13 15:35:01 -04:00
Harrison Chase	8d69dacdf3	multiple retreival in parralel (#9174 )	2023-08-13 10:03:54 -07:00
Bagatur	cdfe2c96c5	bump 263 (#9156 )	2023-08-12 12:36:44 -07:00
Leonid Ganeline	19f504790e	docstrings: document_loaders consitency 2 (#9148 ) This is Part 2. See #9139 (Part 1).	2023-08-11 16:25:40 -07:00
Harrison Chase	1b58460fe3	update keys for chain (#5164 ) Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-11 16:25:13 -07:00
Eugene Yurtsev	aca8cb5fba	API Reference: Do not document private modules (#9042 ) This PR prevents documentation of private modules in the API reference	2023-08-11 15:58:14 -07:00
胡亮	7edf4ca396	Support multi gpu inference for HuggingFaceEmbeddings (#4732 ) Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-11 15:55:44 -07:00
UmerHA	8aab39e3ce	Added SmartGPT workflow (issue #4463 ) (#4816 ) # Added SmartGPT workflow by providing SmartLLM wrapper around LLMs Edit: As @hwchase17 suggested, this should be a chain, not an LLM. I have adapted the PR. It is used like this: ``` from langchain.prompts import PromptTemplate from langchain.chains import SmartLLMChain from langchain.chat_models import ChatOpenAI hard_question = "I have a 12 liter jug and a 6 liter jug. I want to measure 6 liters. How do I do it?" hard_question_prompt = PromptTemplate.from_template(hard_question) llm = ChatOpenAI(model_name="gpt-4") prompt = PromptTemplate.from_template(hard_question) chain = SmartLLMChain(llm=llm, prompt=prompt, verbose=True) chain.run({}) ``` Original text: Added SmartLLM wrapper around LLMs to allow for SmartGPT workflow (as in https://youtu.be/wVzuvf9D9BU). SmartLLM can be used wherever LLM can be used. E.g: ``` smart_llm = SmartLLM(llm=OpenAI()) smart_llm("What would be a good company name for a company that makes colorful socks?") ``` or ``` smart_llm = SmartLLM(llm=OpenAI()) prompt = PromptTemplate( input_variables=["product"], template="What is a good name for a company that makes {product}?", ) chain = LLMChain(llm=smart_llm, prompt=prompt) chain.run("colorful socks") ``` SmartGPT consists of 3 steps: 1. Ideate - generate n possible solutions ("ideas") to user prompt 2. Critique - find flaws in every idea & select best one 3. Resolve - improve upon best idea & return it Fixes #4463 ## Who can review? Community members can review the PR once tests pass. Tag maintainers/contributors who might be interested: - @hwchase17 - @agola11 Twitter: [@UmerHAdil](https://twitter.com/@UmerHAdil) \| Discord: RicChilligerDude#7589 --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-11 15:44:27 -07:00
Lucas Pickup	1d3735a84c	Ensure deployment_id is set to provided deployment, required for Azure OpenAI. (#5002 ) # Ensure deployment_id is set to provided deployment, required for Azure OpenAI. --------- Co-authored-by: Lucas Pickup <lupickup@microsoft.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-11 15:43:01 -07:00
Bagatur	45741bcc1b	Bagatur/vectara nit (#9140 ) Co-authored-by: Ofer Mendelevitch <ofer@vectara.com>	2023-08-11 15:32:03 -07:00
Dominick DEV	9b64932e55	Add LangChain utility for real-time crypto exchange prices (#4501 ) This commit adds the LangChain utility which allows for the real-time retrieval of cryptocurrency exchange prices. With LangChain, users can easily access up-to-date pricing information by running the command ".run(from_currency, to_currency)". This new feature provides a convenient way to stay informed on the latest exchange rates and make informed decisions when trading crypto. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-11 14:45:06 -07:00
Joshua Sundance Bailey	eaa505fb09	Create ArcGISLoader & example notebook (#8873 ) - Description: Adds the ArcGISLoader class to `langchain.document_loaders` - Allows users to load data from ArcGIS Online, Portal, and similar - Users can authenticate with `arcgis.gis.GIS` or retrieve public data anonymously - Uses the `arcgis.features.FeatureLayer` class to retrieve the data - Defines the most relevant keywords arguments and accepts `**kwargs` - Dependencies: Using this class requires `arcgis` and, optionally, `bs4.BeautifulSoup`. Tagging maintainers: - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-11 14:33:40 -07:00
Bagatur	e21152358a	fix (#9145 )	2023-08-11 13:58:23 -07:00
Leonid Ganeline	edb585228d	docstrings: document_loaders consitency (#9139 ) Formatted docstrings from different formats to consistent format, lile: >Loads processed docs from Docugami. "Load from `Docugami`." >Loader that uses Unstructured to load HTML files. "Load `HTML` files using `Unstructured`." >Load documents from a directory. "Load from a directory." - `Load` - no `Loads` - DocumentLoader always loads Documents, so no more "documents/docs/texts/ etc" - integrated systems and APIs enclosed in backticks,	2023-08-11 13:09:31 -07:00
Aashish Saini	0aabded97f	Updating interactive walkthrough link in index.md to resolve 404 error (#9063 ) Updated interactive walkthrough link in index.md to resolve 404 error. Also, expressing deep gratitude to LangChain library developers for their exceptional efforts 🥇 . --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-11 13:08:56 -07:00
Markus Schiffer	00bf472265	Fix for SVM retriever discarding document metadata (#9141 ) As stated in the title the SVM retriever discarded the metadata of passed in docs. This code fixes that. I also added one unit test that should test that. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-11 13:08:17 -07:00
Bagatur	bace17e0aa	rm integration deps (#9142 )	2023-08-11 12:43:08 -07:00
Eugene Yurtsev	44bc89b7bf	Support a few list like operations on ChatPromptTemplate (#9077 ) Make it easier to work with chat prompt template	2023-08-11 14:49:51 -04:00
Hai The Dude	e4418d1b7e	Added new use case docs for Web Scraping, Chromium loader, BS4 transformer (#8732 ) - Description: Added a new use case category called "Web Scraping", and a tutorial to scrape websites using OpenAI Functions Extraction chain to the docs. - Tag maintainer:@baskaryan @hwchase17 , - Twitter handle: https://www.linkedin.com/in/haiphunghiem/ (I'm on LinkedIn mostly) --------- Co-authored-by: Lance Martin <lance@langchain.dev>	2023-08-11 11:46:59 -07:00

... 2 3 4 5 6 ...

3935 Commits