langchain

Author	SHA1	Message	Date
Eugene Yurtsev	5cfa72a130	Bibtex integration for document loader and retriever (#5137 ) # Bibtex integration Wrap bibtexparser to retrieve a list of docs from a bibtex file. * Get the metadata from the bibtex entries * `page_content` get from the local pdf referenced in the `file` field of the bibtex entry using `pymupdf` * If no valid pdf file, `page_content` set to the `abstract` field of the bibtex entry * Support Zotero flavour using regex to get the file path * Added usage example in `docs/modules/indexes/document_loaders/examples/bibtex.ipynb` --------- Co-authored-by: Sébastien M. Popoff <sebastien.popoff@espci.fr> Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-25 00:21:31 -07:00
Ati Sharma	40b086d6e8	Allow to specify ID when adding to the FAISS vectorstore. (#5190 ) # Allow to specify ID when adding to the FAISS vectorstore This change allows unique IDs to be specified when adding documents / embeddings to a faiss vectorstore. - This reflects the current approach with the chroma vectorstore. - It allows rejection of inserts on duplicate IDs - will allow deletion / update by searching on deterministic ID (such as a hash). - If not specified, a random UUID is generated (as per previous behaviour, so non-breaking). This commit fixes #5065 and #3896 and should fix #2699 indirectly. I've tested adding and merging. Kindly tagging @Xmaster6y @dev2049 for review. --------- Co-authored-by: Ati Sharma <ati@agalmic.ltd> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-05-24 22:26:46 -07:00
Nicholas Liu	f0ea093de8	Change Default GoogleDriveLoader Behavior to not Load Trashed Files (issue #5104 ) (#5220 ) # Change Default GoogleDriveLoader Behavior to not Load Trashed Files (issue #5104) Fixes #5104 If the previous behavior of loading files that used to live in the folder, but are now trashed, you can use the `load_trashed_files` parameter: ``` loader = GoogleDriveLoader( folder_id="1yucgL9WGgWZdM1TOuKkeghlPizuzMYb5", recursive=False, load_trashed_files=True ) ``` As not loading trashed files should be expected behavior, should we 1. even provide the `load_trashed_files` parameter? 2. add documentation? Feels most users will stick with default behavior ## Who can review? Community members can review the PR once tests pass. Tag maintainers/contributors who might be interested: DataLoaders - @eyurtsev Twitter: [@nicholasliu77](https://twitter.com/nicholasliu77)	2023-05-24 22:26:17 -07:00
Keno	eff31a3361	Remove API key from docs (#5223 ) I found an API key for `serpapi_api_key` while reading the docs. It seems to have been modified very recently. Removed it in this PR @hwchase17 - project lead	2023-05-24 22:25:39 -07:00
maspotts	95c9aa1ccb	Create async copy of from_text() inside GraphIndexCreator. (#5214 ) Copies `GraphIndexCreator.from_text()` to make an async version called `GraphIndexCreator.afrom_text()`. This is (should be) a trivial change: it just adds a copy of `GraphIndexCreator.from_text()` which is async and awaits a call to `chain.apredict()` instead of `chain.predict()`. There is no unit test for GraphIndexCreator, and I did not create one, but this code works for me locally. @agola11 @hwchase17	2023-05-24 21:54:12 -07:00
Leonid Ganeline	2ad29f410d	fix a mistake in concepts.md (#5222 ) # fix a mistake in concepts.md ## Who can review? Community members can review the PR once tests pass. Tag maintainers/contributors who might be interested:	2023-05-24 21:47:22 -07:00
Harrison Chase	a775aa6389	Harrison/vertex (#5049 ) Co-authored-by: Leonid Kuligin <kuligin@google.com> Co-authored-by: Leonid Kuligin <lkuligin@yandex.ru> Co-authored-by: sasha-gitg <44654632+sasha-gitg@users.noreply.github.com> Co-authored-by: Justin Flick <Justinjayflick@gmail.com> Co-authored-by: Justin Flick <jflick@homesite.com>	2023-05-24 15:51:12 -07:00
Zander Chase	e6c4571191	Add 'status' command to get server status (#5197 ) Example: ``` $ langchain plus start --expose ... $ langchain plus status The LangChainPlus server is currently running. Service Status Published Ports langchain-backend Up 40 seconds 1984 langchain-db Up 41 seconds 5433 langchain-frontend Up 40 seconds 80 ngrok Up 41 seconds 4040 To connect, set the following environment variables in your LangChain application: LANGCHAIN_TRACING_V2=true LANGCHAIN_ENDPOINT=https://5cef-70-23-89-158.ngrok.io $ langchain plus stop $ langchain plus status The LangChainPlus server is not running. $ langchain plus start The LangChainPlus server is currently running. Service Status Published Ports langchain-backend Up 5 seconds 1984 langchain-db Up 6 seconds 5433 langchain-frontend Up 5 seconds 80 To connect, set the following environment variables in your LangChain application: LANGCHAIN_TRACING_V2=true LANGCHAIN_ENDPOINT=http://localhost:1984 ```	2023-05-24 21:43:16 +00:00
Zander Chase	e76e68b211	Add Delete Session Method (#5193 )	2023-05-24 21:06:03 +00:00
Zander Chase	66113c2a62	Log warning (#5192 ) Changes debug log to warning log when LC Tracer fails to instantiate	2023-05-24 21:05:13 +00:00
Ankush Gola	b7fcb35a39	add option to pass openai key to langchain plus command (#5213 )	2023-05-24 21:05:03 +00:00
Davis Chase	dcee8936c1	nit (#5208 )	2023-05-24 12:52:20 -07:00
Alon Diament	44abe925df	Add Joplin document loader (#5153 ) # Add Joplin document loader [Joplin](https://joplinapp.org/) is an open source note-taking app. Joplin has a [REST API](https://joplinapp.org/api/references/rest_api/) for accessing its local database. The proposed `JoplinLoader` uses the API to retrieve all notes in the database and their metadata. Joplin needs to be installed and running locally, and an access token is required. - The PR includes an integration test. - The PR includes an example notebook. --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-24 12:31:55 -07:00
Rodrigo Siqueira	f10be072ff	Add Iugu document loader (#5162 ) Create IUGU loader --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-24 11:47:01 -07:00
ByronHsu	f0730c6489	Allow readthedoc loader to pass custom html tag (#5175 ) ## Description The html structure of readthedocs can differ. Currently, the html tag is hardcoded in the reader, and unable to fit into some cases. This pr includes the following changes: 1. Replace `find_all` with `find` because we just want one tag. 2. Provide `custom_html_tag` to the loader. 3. Add tests for readthedoc loader 4. Refactor code ## Issues See more in https://github.com/hwchase17/langchain/pull/2609. The problem was not completely fixed in that pr. --------- Signed-off-by: byhsu <byhsu@linkedin.com> Co-authored-by: byhsu <byhsu@linkedin.com> Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-24 10:40:27 -07:00
Alexander Dibrov	d8eed6018f	Output parsing variation allowance (#5178 ) # Output parsing variation allowance for self-ask with search This change makes self-ask with search easier for Llama models to follow, as they tend toward returning 'Followup:' instead of 'Follow up:' despite an otherwise valid remaining output. Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-24 10:39:09 -07:00
Matt Wells	c173bf1c62	Fixes scope of query Session in PGVector (#5194 ) `vectorstore.PGVector`: The transactional boundary should be increased to cover the query itself Currently, within the `similarity_search_with_score_by_vector` the transactional boundary (created via the `Session` call) does not include the select query being made. This can result in un-intended consequences when interacting with the PGVector instance methods directly --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-24 10:37:45 -07:00
Tommaso De Lorenzo	52714cedd4	fixing total cost finetuned model giving zero (#5144 ) # OpanAI finetuned model giving zero tokens cost Very simple fix to the previously committed solution to allowing finetuned Openai models. Improves #5127 --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-24 10:04:08 -07:00
Harrison Chase	94cf391ef1	standardize json parsing (#5168 ) Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-24 10:03:53 -07:00
Davis Chase	2b2176a3c1	tfidf retriever (#5114 ) Co-authored-by: vempaliakhil96 <vempaliakhil96@gmail.com>	2023-05-24 10:02:09 -07:00
Shukri	b00c77dc62	Improve weaviate vectorstore docs (#5201 ) # Improve weaviate vectorstore docs	2023-05-24 09:31:48 -07:00
Tomaz Bratanic	fd866d1801	Update Cypher QA prompt (#5173 ) # Improve Cypher QA prompt The current QA prompt is optimized for networkX answer generation, which returns all the possible triples. However, Cypher search is a bit more focused and doesn't necessary return all the context information. Due to that reason, the model sometimes refuses to generate an answer even though the information is provided: ![Screenshot from 2023-05-24 08-36-23](https://github.com/hwchase17/langchain/assets/19948365/351cf9c1-2567-447c-91fd-284ae3fa1ccf) To fix this issue, I have updated the prompt. Interestingly, I tried many variations with less instructions and they didn't work properly. However, the current fix works nicely. ![Screenshot from 2023-05-24 08-37-25](https://github.com/hwchase17/langchain/assets/19948365/fc830603-e6ec-4a23-8a86-eaf572996014)	2023-05-24 08:31:30 -07:00
Zach Schillaci	aa14e223ee	Reuse `length_func` in `MapReduceDocumentsChain` (#5181 ) # Reuse `length_func` in `MapReduceDocumentsChain` Pretty straightforward refactor in `MapReduceDocumentsChain`. Reusing the local variable `length_func`, instead of the longer alternative `self.combine_document_chain.prompt_length`. @hwchase17	2023-05-24 08:28:37 -07:00
Harrison Chase	11c26ebb55	Harrison/modelscope (#5156 ) Co-authored-by: thomas-yanxin <yx20001210@163.com> Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-24 08:06:45 -07:00
Davis Chase	2d5588c5f0	bump 179 (#5200 )	2023-05-24 07:55:27 -07:00
Saba Sturua	47e4ee4370	adjust docarray docstrings (#5185 ) Follow up of https://github.com/hwchase17/langchain/pull/5015 Thanks for catching this! Just a small PR to adjust couple of strings to these changes Signed-off-by: jupyterjazz <saba.sturua@jina.ai>	2023-05-24 07:50:35 -07:00
Jeff Vestal	cf19a2a59f	example usage (#5182 ) Adding example usage for elasticsearch knn embeddings [per](https://github.com/hwchase17/langchain/pull/3401#issuecomment-1548518389) https://github.com/hwchase17/langchain/blob/master/langchain/embeddings/elasticsearch.py	2023-05-24 07:47:15 -07:00
Ikko Eltociear Ashimine	fff21a0b35	Update rellm_experimental.ipynb (#5189 ) # Your PR Title (What it does) HuggingFace -> Hugging Face	2023-05-24 11:41:00 +00:00
Nolan Tremelling	faa26650c9	Beam (#4996 ) # Beam Calls the Beam API wrapper to deploy and make subsequent calls to an instance of the gpt2 LLM in a cloud deployment. Requires installation of the Beam library and registration of Beam Client ID and Client Secret. Additional calls can then be made through the instance of the large language model in your code or by calling the Beam API. --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-24 01:25:18 -07:00
Ofer Mendelevitch	c81fb88035	Vectara (#5069 ) # Vectara Integration This PR provides integration with Vectara. Implemented here are: * langchain/vectorstore/vectara.py * tests/integration_tests/vectorstores/test_vectara.py * langchain/retrievers/vectara_retriever.py And two IPYNB notebooks to do more testing: * docs/modules/chains/index_examples/vectara_text_generation.ipynb * docs/modules/indexes/vectorstores/examples/vectara.ipynb --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-24 01:24:58 -07:00
Jason Bosco	9c4b43b494	Add Typesense vector store (#1674 ) Closes #931. --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-23 23:20:45 -07:00
Leonid Ganeline	33929489b9	docs: added missed `document_loaders` examples (#5150 ) # DOCS added missed document_loader examples Added missed examples: `JSON`, `Open Document Format (ODT)`, `Wikipedia`, `tomarkdown`. Updated them to a consistent format. ## Who can review? @hwchase17 @dev2049	2023-05-23 21:56:41 -07:00
Daniel Quinteros	c111134a55	Clarification of the reference to the "get_text_legth" function in ge… (#5154 ) # Clarification of the reference to the "get_text_legth" function in getting_started.md Reference to the function "get_text_legth" in the documentation did not make sense. Comment added for clarification. @hwchase17	2023-05-23 20:43:38 -07:00
Daniel Quinteros	de4ef24f75	Docs: updated getting_started.md (#5151 ) # Docs: updated getting_started.md Just accommodating some unnecessary spaces in the example of "pass few shot examples to a prompt template". @vowelparrot	2023-05-23 20:43:26 -07:00
mbchang	b1b7f3541c	fix: fix current_time=Now bug for aadd_documents in TimeWeightedRetriever (#5155 ) # Same as PR #5045, but for async <!-- Thank you for contributing to LangChain! Your PR will appear in our next release under the title you set. Please make sure it highlights your valuable contribution. Replace this with a description of the change, the issue it fixes (if applicable), and relevant context. List any dependencies required for this change. After you're done, someone will review your PR. They may suggest improvements. If no one reviews your PR within a few days, feel free to @-mention the same people again, as notifications can get lost. --> <!-- Remove if not applicable --> Fixes #4825 I had forgotten to update the asynchronous counterpart `aadd_documents` with the bug fix from PR #5045, so this PR also fixes `aadd_documents` too. ## Who can review? Community members can review the PR once tests pass. Tag maintainers/contributors who might be interested: @dev2049 <!-- For a quicker response, figure out the right person to tag with @ @hwchase17 - project lead Tracing / Callbacks - @agola11 Async - @agola11 DataLoaders - @eyurtsev Models - @hwchase17 - @agola11 Agents / Tools / Toolkits - @vowelparrot VectorStores / Retrievers / Memory - @dev2049 -->	2023-05-23 20:31:45 -07:00
Jeremiah Lowin	925dd3e59e	Add async versions of predict() and predict_messages() (#4867 ) # Add async versions of predict() and predict_messages() #4615 introduced a unifying interface for "base" and "chat" LLM models via the new `predict()` and `predict_messages()` methods that allow both types of models to operate on string and message-based inputs, respectively. This PR adds async versions of the same (`apredict()` and `apredict_messages()`) that are identical except for their use of `agenerate()` in place of `generate()`, which means they repurpose all existing work on the async backend. ## Who can review? Community members can review the PR once tests pass. Tag maintainers/contributors who might be interested: @hwchase17 (follows his work on #4615) @agola11 (async) --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-05-23 17:22:49 -07:00
Junlin Zhou	9242998db1	Empty check before pop (#4929 ) # Check whether 'other' is empty before popping This PR could fix a potential 'popping empty set' error. Co-authored-by: Junlin Zhou <jlzhou@zjuici.com>	2023-05-23 16:46:50 -07:00
Daniel King	de6e6c764e	Add MosaicML inference endpoints (#4607 ) # Add MosaicML inference endpoints This PR adds support in langchain for MosaicML inference endpoints. We both serve a select few open source models, and allow customers to deploy their own models using our inference service. Docs are here (https://docs.mosaicml.com/en/latest/inference.html), and sign up form is here (https://forms.mosaicml.com/demo?utm_source=langchain). I'm not intimately familiar with the details of langchain, or the contribution process, so please let me know if there is anything that needs fixing or this is the wrong way to submit a new integration, thanks! I'm also not sure what the procedure is for integration tests. I have tested locally with my api key. ## Who can review? @hwchase17 --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-05-23 15:59:08 -07:00
Adheeban Manoharan	68f0d45485	Adding Weather Loader (#5056 ) Co-authored-by: Tyler Hutcherson <tyler.hutcherson@redis.com> Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-23 15:57:33 -07:00
Jeff Vestal	0b542a9706	Add ElasticsearchEmbeddings class for generating embeddings using Elasticsearch models (#3401 ) This PR introduces a new module, `elasticsearch_embeddings.py`, which provides a wrapper around Elasticsearch embedding models. The new ElasticsearchEmbeddings class allows users to generate embeddings for documents and query texts using a [model deployed in an Elasticsearch cluster](https://www.elastic.co/guide/en/machine-learning/current/ml-nlp-model-ref.html#ml-nlp-model-ref-text-embedding). ### Main features: 1. The ElasticsearchEmbeddings class initializes with an Elasticsearch connection object and a model_id, providing an interface to interact with the Elasticsearch ML client through [infer_trained_model](https://elasticsearch-py.readthedocs.io/en/v8.7.0/api.html?highlight=trained%20model%20infer#elasticsearch.client.MlClient.infer_trained_model) . 2. The `embed_documents()` method generates embeddings for a list of documents, and the `embed_query()` method generates an embedding for a single query text. 3. The class supports custom input text field names in case the deployed model expects a different field name than the default `text_field`. 4. The implementation is compatible with any model deployed in Elasticsearch that generates embeddings as output. ### Benefits: 1. Simplifies the process of generating embeddings using Elasticsearch models. 2. Provides a clean and intuitive interface to interact with the Elasticsearch ML client. 3. Allows users to easily integrate Elasticsearch-generated embeddings. Related issue https://github.com/hwchase17/langchain/issues/3400 --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-23 14:50:33 -07:00
Theodore Rolle	754b5133e9	Improve PlanningOutputParser whitespace handling (#5143 ) Some LLM's will produce numbered lists with leading whitespace, i.e. in response to "What is the sum of 2 and 3?": ``` Plan: 1. Add 2 and 3. 2. Given the above steps taken, please respond to the users original question. ``` This commit updates the PlanningOutputParser regex to ignore leading whitespace before the step number, enabling it to correctly parse this format.	2023-05-23 12:47:26 -07:00
Tommaso De Lorenzo	5002f3ae35	solving #2887 (#5127 ) # Allowing openAI fine-tuned models Very simple fix that checks whether a openAI `model_name` is a fine-tuned model when loading `context_size` and when computing call's cost in the `openai_callback`. Fixes #2887 --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-23 11:18:03 -07:00
Myeongseop Kim	7a75bb2121	docs: fix minor typo + add wikipedia package installation part in human_input_llm.ipynb (#5118 ) # Fix typo + add wikipedia package installation part in human_input_llm.ipynb This PR 1. Fixes typo ("the the human input LLM"), 2. Addes wikipedia package installation part (in accordance with `WikipediaQueryRun` [documentation](https://python.langchain.com/en/latest/modules/agents/tools/examples/wikipedia.html)) in `human_input_llm.ipynb` (`docs/modules/models/llms/examples/human_input_llm.ipynb`)	2023-05-23 10:59:30 -07:00
Davis Chase	753f4cfc26	bump 178 (#5130 )	2023-05-23 07:43:56 -07:00
Ayan Bandyopadhyay	5c87dbf5a8	Add link to Psychic from document loaders documentation page (#5115 ) # Add link to Psychic from document loaders documentation page In my previous PR I forgot to update `document_loaders.rst` to link to `psychic.ipynb` to make it discoverable from the main documentation.	2023-05-23 06:47:23 -07:00
Tian Wei	d7f807b71f	Add AzureCognitiveServicesToolkit to call Azure Cognitive Services API (#5012 ) # Add AzureCognitiveServicesToolkit to call Azure Cognitive Services API: achieve some multimodal capabilities This PR adds a toolkit named AzureCognitiveServicesToolkit which bundles the following tools: - AzureCogsImageAnalysisTool: calls Azure Cognitive Services image analysis API to extract caption, objects, tags, and text from images. - AzureCogsFormRecognizerTool: calls Azure Cognitive Services form recognizer API to extract text, tables, and key-value pairs from documents. - AzureCogsSpeech2TextTool: calls Azure Cognitive Services speech to text API to transcribe speech to text. - AzureCogsText2SpeechTool: calls Azure Cognitive Services text to speech API to synthesize text to speech. This toolkit can be used to process image, document, and audio inputs. --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-23 06:45:48 -07:00
Jamie Broomall	d4fd589638	WhyLabs callback (#4906 ) # Add a WhyLabs callback handler * Adds a simple WhyLabsCallbackHandler * Add required dependencies as optional * protect against missing modules with imports * Add docs/ecosystem basic example based on initial prototype from @andrewelizondo > this integration gathers privacy preserving telemetry on text with whylogs and sends stastical profiles to WhyLabs platform to monitoring these metrics over time. For more information on what WhyLabs is see: https://whylabs.ai After you run the notebook (if you have env variables set for the API Keys, org_id and dataset_id) you get something like this in WhyLabs: ![Screenshot (443)](https://github.com/hwchase17/langchain/assets/88007022/6bdb3e1c-4243-4ae8-b974-23a8bb12edac) Co-authored-by: Andre Elizondo <andre@whylabs.ai> Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-22 20:29:47 -07:00
Eugene Yurtsev	d56313acba	Improve effeciency of TextSplitter.split_documents, iterate once (#5111 ) # Improve TextSplitter.split_documents, collect page_content and metadata in one iteration ## Who can review? Community members can review the PR once tests pass. Tag maintainers/contributors who might be interested: @eyurtsev In the case where documents is a generator that can only be iterated once making this change is a huge help. Otherwise a silent issue happens where metadata is empty for all documents when documents is a generator. So we expand the argument from `List[Document]` to `Union[Iterable[Document], Sequence[Document]]` --------- Co-authored-by: Steven Tartakovsky <tartakovsky.developer@gmail.com>	2023-05-22 23:00:24 -04:00
Jettro Coenradie	b950022894	Fixes issue #5072 - adds additional support to Weaviate (#5085 ) Implementation is similar to search_distance and where_filter # adds 'additional' support to Weaviate queries Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-22 18:57:10 -07:00
Zander Chase	87bba2e8d3	Pass Dataset Name by Name not Position (#5108 ) Pass dataset name by name	2023-05-23 01:21:39 +00:00

1 2 3 4 5 ...

2168 Commits