langchain

mirror of https://github.com/hwchase17/langchain synced 2024-11-08 07:10:35 +00:00

Author	SHA1	Message	Date
Leonid Ganeline	1b3ea1eeb4	docstrings: `chat_loaders` (#10307 ) Updated docstrings. Made them consistent across the module.	2023-09-07 19:35:34 -07:00
Bagatur	8826293c88	Add multilingual data anon chain (#10346 )	2023-09-07 15:15:08 -07:00
Greg Richardson	300559695b	Supabase vector self querying retriever (#10304 ) ## Description Adds Supabase Vector as a self-querying retriever. - Designed to be backwards compatible with existing `filter` logic on `SupabaseVectorStore`. - Adds new filter `postgrest_filter` to `SupabaseVectorStore` `similarity_search()` methods - Supports entire PostgREST [filter query language](https://postgrest.org/en/stable/references/api/tables_views.html#read) (used by self-querying retriever, but also works as an escape hatch for more query control) - `SupabaseVectorTranslator` converts Langchain filter into the above PostgREST query - Adds Jupyter Notebook for the self-querying retriever - Adds tests ## Tag maintainer @hwchase17 ## Twitter handle [@ggrdson](https://twitter.com/ggrdson)	2023-09-07 15:03:26 -07:00
Tze Min	20c742d8a2	Enhancement: add parameter boto3_session for AWS DynamoDB cross account use cases (#10326 ) - Description: to allow boto3 assume role for AWS cross account use cases to read and update the chat history, - Issue: use case I faced in my company, - Dependencies: no - Tag maintainer: @baskaryan , - Twitter handle: @tmin97 --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-09-07 14:58:28 -07:00
kcocco	b1d40b8626	Fix colab link(missing graph in url) and comment to match the code fo… (#10344 ) - Description: Fixing Colab broken link and comment correction to align with the code that uses Warren Buffet for wiki query - Issue: None open - Dependencies: none - Tag maintainer: n/a - Twitter handle: Not a PR change but: kcocco	2023-09-07 14:57:27 -07:00
Bagatur	49e0c83126	Split LCEL cookbook (#10342 )	2023-09-07 14:56:38 -07:00
Bagatur	41a2548611	Fix presidio docs Colab links	2023-09-07 14:47:09 -07:00
Bagatur	1d2b6c3c67	Reorganize presidio anonymization docs	2023-09-07 14:45:07 -07:00
maks-operlejn-ds	274c3dc3a8	Multilingual anonymization (#10327 ) ### Description Add multiple language support to Anonymizer PII detection in Microsoft Presidio relies on several components - in addition to the usual pattern matching (e.g. using regex), the analyser uses a model for Named Entity Recognition (NER) to extract entities such as: - `PERSON` - `LOCATION` - `DATE_TIME` - `NRP` - `ORGANIZATION` [[Source]](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/presidio_analyzer/predefined_recognizers/spacy_recognizer.py) To handle NER in specific languages, we utilize unique models from the `spaCy` library, recognized for its extensive selection covering multiple languages and sizes. However, it's not restrictive, allowing for integration of alternative frameworks such as [Stanza](https://microsoft.github.io/presidio/analyzer/nlp_engines/spacy_stanza/) or [transformers](https://microsoft.github.io/presidio/analyzer/nlp_engines/transformers/) when necessary. ### Future works - automatic language detection - instead of passing the language as a parameter in `anonymizer.anonymize`, we could detect the language/s beforehand and then use the corresponding NER model. We have discussed this internally and @mateusz-wosinski-ds will look into a standalone language detection tool/chain for LangChain 😄 ### Twitter handle @deepsense_ai / @MaksOpp ### Tag maintainer @baskaryan @hwchase17 @hinthornw	2023-09-07 14:42:24 -07:00
mateusz.wosinski	f23fed34e8	Added TYPE_CHECKING	2023-09-07 20:00:04 +02:00
mateusz.wosinski	ff1c6de86c	TYPE_CHECKING added	2023-09-07 19:56:53 +02:00
mateusz.wosinski	868db99b17	Merge branch 'master' into deepsense/text-to-speech	2023-09-07 19:43:03 +02:00
Ofer Mendelevitch	a9eb7c6cfc	Adding Self-querying for Vectara (#10332 ) - Description: Adding support for self-querying to Vectara integration - Issue: per customer request - Tag maintainer: @rlancemartin @baskaryan - Twitter handle: @ofermend Also updated some documentation, added self-query testing, and a demo notebook with self-query example.	2023-09-07 10:24:50 -07:00
Bagatur	25ec655e4f	supabase embedding usage fix (#10335 ) Should be calling Embeddings.embed_query instead of embed_documents when searching	2023-09-07 10:04:49 -07:00
Bagatur	f0ccce76fe	nuclia db nit (#10334 )	2023-09-07 09:48:56 -07:00
Bagatur	205f406485	nuclia nb nit (#10331 )	2023-09-07 08:49:33 -07:00
Bagatur	672907bbbb	bump 284 (#10330 )	2023-09-07 08:45:42 -07:00
maks-operlejn-ds	f747e76b73	Fixed link to colab notebook (#10320 ) small fix to anonymizer documentation	2023-09-07 08:42:04 -07:00
maks-operlejn-ds	4cc4534d81	Data deanonymization (#10093 ) ### Description The feature for pseudonymizing data with ability to retrieve original text (deanonymization) has been implemented. In order to protect private data, such as when querying external APIs (OpenAI), it is worth pseudonymizing sensitive data to maintain full privacy. But then, after the model response, it would be good to have the data in the original form. I implemented the `PresidioReversibleAnonymizer`, which consists of two parts: 1. anonymization - it works the same way as `PresidioAnonymizer`, plus the object itself stores a mapping of made-up values to original ones, for example: ``` { "PERSON": { "<anonymized>": "<original>", "John Doe": "Slim Shady" }, "PHONE_NUMBER": { "111-111-1111": "555-555-5555" } ... } ``` 2. deanonymization - using the mapping described above, it matches fake data with original data and then substitutes it. Between anonymization and deanonymization user can perform different operations, for example, passing the output to LLM. ### Future works - instance anonymization - at this point, each occurrence of PII is treated as a separate entity and separately anonymized. Therefore, two occurrences of the name John Doe in the text will be changed to two different names. It is therefore worth introducing support for full instance detection, so that repeated occurrences are treated as a single object. - better matching and substitution of fake values for real ones - currently the strategy is based on matching full strings and then substituting them. Due to the indeterminism of language models, it may happen that the value in the answer is slightly changed (e.g. John Doe -> John or Main St, New York -> New York) and such a substitution is then no longer possible. Therefore, it is worth adjusting the matching for your needs. - Q&A with anonymization - when I'm done writing all the functionality, I thought it would be a cool resource in documentation to write a notebook about retrieval from documents using anonymization. An iterative process, adding new recognizers to fit the data, lessons learned and what to look out for ### Twitter handle @deepsense_ai / @MaksOpp --------- Co-authored-by: MaksOpp <maks.operlejn@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-09-06 21:33:24 -07:00
Bagatur	67696fe3ba	Add myscale vector sql retriever chain (#10305 )	2023-09-06 17:30:58 -07:00
Bagatur	f4f9254dad	Move Myscale SQL vector retrieval nb	2023-09-06 17:09:40 -07:00
刘方瑞	890ed775a3	Resolve: VectorSearch enabled SQLChain? (#10177 ) Squashed from #7454 with updated features We have separated the `SQLDatabseChain` from `VectorSQLDatabseChain` and put everything into `experimental/`. Below is the original PR message from #7454. ------- We have been working on features to fill up the gap among SQL, vector search and LLM applications. Some inspiring works like self-query retrievers for VectorStores (for example [Weaviate](https://python.langchain.com/en/latest/modules/indexes/retrievers/examples/weaviate_self_query.html) and [others](https://python.langchain.com/en/latest/modules/indexes/retrievers/examples/self_query.html)) really turn those vector search databases into a powerful knowledge base! 🚀🚀 We are thinking if we can merge all in one, like SQL and vector search and LLMChains, making this SQL vector database memory as the only source of your data. Here are some benefits we can think of for now, maybe you have more 👀: With ALL data you have: since you store all your pasta in the database, you don't need to worry about the foreign keys or links between names from other data source. Flexible data structure: Even if you have changed your schema, for example added a table, the LLM will know how to JOIN those tables and use those as filters. SQL compatibility: We found that vector databases that supports SQL in the marketplace have similar interfaces, which means you can change your backend with no pain, just change the name of the distance function in your DB solution and you are ready to go! ### Issue resolved: - [Feature Proposal: VectorSearch enabled SQLChain?](https://github.com/hwchase17/langchain/issues/5122) ### Change made in this PR: - An improved schema handling that ignore `types.NullType` columns - A SQL output Parser interface in `SQLDatabaseChain` to enable Vector SQL capability and further more - A Retriever based on `SQLDatabaseChain` to retrieve data from the database for RetrievalQAChains and many others - Allow `SQLDatabaseChain` to retrieve data in python native format - Includes PR #6737 - Vector SQL Output Parser for `SQLDatabaseChain` and `SQLDatabaseChainRetriever` - Prompts that can implement text to VectorSQL - Corresponding unit-tests and notebook ### Twitter handle: - @MyScaleDB ### Tag Maintainer: Prompts / General: @hwchase17, @baskaryan DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev ### Dependencies: No dependency added	2023-09-06 17:08:12 -07:00
Bagatur	849e345371	Bagatur/nuclia vector (#10301 )	2023-09-06 16:40:47 -07:00
Bagatur	0c760f184c	Update NucliaDB vecstore deps	2023-09-06 16:29:10 -07:00
Eric BREHAULT	19b4ecdc39	Implement NucliaDB vector store (#10236 ) # Description This pull request allows to use the [NucliaDB](https://docs.nuclia.dev/docs/docs/nucliadb/intro) as a vector store in LangChain. It works with both a [local NucliaDB instance](https://docs.nuclia.dev/docs/docs/nucliadb/deploy/basics) or with [Nuclia Cloud](https://nuclia.cloud). # Dependencies It requires an up-to-date version of the `nuclia` Python package. @rlancemartin, @eyurtsev, @hinthornw, please review it when you have a moment :) Note: our Twitter handler is `@NucliaAI`	2023-09-06 16:26:14 -07:00
cccs-eric	b64a443f72	Fix SQL search_path for Trino query engine (#10248 ) This PR replaces the generic `SET search_path TO` statement by `USE` for the Trino dialect since Trino does not support `SET search_path`. Official Trino documentation can be found [here](https://trino.io/docs/current/sql/use.html). With this fix, the `SQLdatabase` will now be able to set the current schema and execute queries using the Trino engine. It will use the catalog set as default by the connection uri.	2023-09-06 16:19:37 -07:00
Bagatur	1fb7bdd595	Split sql use case docs (#10257 ) Split sql use case into directory so we can add other structured data pages	2023-09-06 16:19:21 -07:00
Bagatur	763212eafd	Add use case nb position (#10299 )	2023-09-06 15:46:33 -07:00
Ikko Eltociear Ashimine	ea5d29a702	Update amazon_comprehend_chain.ipynb (#10246 ) Huggingface, HuggingFace -> Hugging Face	2023-09-06 15:38:37 -07:00
Brian Antonelli	4df101cf77	Don't hardcode PGVector distance strategies (#10265 ) - Description: Remove hardcoded/duplicated distance strategies in the PGVector store. - Issue: NA - Dependencies: NA - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: @archmonkeymojo --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-09-06 15:20:44 -07:00
captivus	86cb9da735	Updated Additional Resources section of documentation (#10260 ) - Description: Updated Additional Resources section of documentation and added to YouTube videos with excellent playlist of Langchain content from Sam Witteveen - Issue: None -- updating documentation - Dependencies: None - Tag maintainer: @baskaryan	2023-09-06 15:10:43 -07:00
JaéGeR	b8669b249e	Added Hugging face inference api (#10280 ) Embed documents without locally downloading the HF model --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-09-06 14:55:48 -07:00
Ilya	6e6f15df24	Add strip text splits flag (#10295 ) #10085 --------- Co-authored-by: codesee-maps[bot] <86324825+codesee-maps[bot]@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-09-06 14:06:12 -07:00
Randy	1690013711	Doc: openai_functions_agent.mdx import (#10282 ) Fix the import in docmention --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-09-06 14:00:39 -07:00
William FH	13c5951e26	Add LCEL cookbook examples (#10290 ) 1. For passing config to runnable lambda 2. For branching and merging	2023-09-06 13:50:43 -07:00
ParamdeepSinghShorthillsAI	3cc242b591	Update rwkv.py import error (#10293 ) I have updated the code to ensure consistent error handling for ImportError. Instead of relying on ValueError as before, I've followed the standard practice of raising ImportError while also including detailed error messages. This modification improves code clarity and explicitly indicates that any issues are related to module imports.	2023-09-06 13:50:21 -07:00
Pihplipe Oegr	bce38b7163	Add notebook example to use sqlite-vss as a vector store. (#10292 ) Follow-up PR for https://github.com/langchain-ai/langchain/pull/10047, simply adding a notebook quickstart example for the vector store with SQLite, using the class SQLiteVSS. Maintainer tag @baskaryan Co-authored-by: Philippe Oger <philippe.oger@adevinta.com>	2023-09-06 13:46:59 -07:00
Tomaz Bratanic	db73c9d5b5	Diffbot Graph Transformer / Neo4j Graph document ingestion (#9979 ) Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-09-06 13:32:59 -07:00
Predrag Gruevski	ccb9e3ee2d	Install dev, lint, test, typing extra deps for linting steps. (#10249 ) `mypy` cannot type-check code that relies on dependencies that aren't installed. Eventually we'll probably want to install as many optional dependencies as possible. However, the full "extended deps" setup for langchain creates a 3GB cache file and takes a while to unpack and install. We'll probably want something a bit more targeted. This is a first step toward something better.	2023-09-06 11:15:28 -04:00
Predrag Gruevski	82d5d4d0ae	Deny creating files as a result of test runs. (#10253 ) A test file was accidentally dropping a `results.json` file in the current working directory as a result of running `make test`. This is undesirable, since we don't want to risk accidentally adding stray files into the repo if we run tests locally and then do `git add .` without inspecting the file list very closely.	2023-09-06 11:15:16 -04:00
Predrag Gruevski	8d5bf1fb20	Fix langchain lint on `master`. (#10289 )	2023-09-06 16:01:13 +01:00
Nik	49341483da	Update Banana.dev docs to latest correct usage (#10183 ) - Description: this PR updates all Banana.dev-related docs to match the latest client usage. The code in the docs before this PR were out of date and would never run. - Issue: [#6404](https://github.com/langchain-ai/langchain/issues/6404) - Dependencies: - - Tag maintainer: - Twitter handle: [BananaDev_ ](https://twitter.com/BananaDev_ )	2023-09-06 07:46:17 -07:00
Bagatur	9e839d4977	bump 283 (#10287 )	2023-09-06 07:33:03 -07:00
William FH	ffca5e7eea	Allow config propagation, Add default lambda name, Improve ergonomics of config passed in (#10273 ) Makes it easier to do recursion using regular python compositional patterns ```py def lambda_decorator(func): """Decorate function as a RunnableLambda""" return runnable.RunnableLambda(func) @lambda_decorator def fibonacci(a, config: runnable.RunnableConfig) -> int: if a <= 1: return a else: return fibonacci.invoke( a - 1, config ) + fibonacci.invoke(a - 2, config) fibonacci.invoke(10) ``` https://smith.langchain.com/public/cb98edb4-3a09-4798-9c22-a930037faf88/r Also makes it more natural to do things like error handle and call other langchain objects in ways we probably don't want to support in `with_fallbacks()` ```py @lambda_decorator def handle_errors(a, config: runnable.RunnableConfig) -> int: try: return my_chain.invoke(a, config) except MyExceptionType as exc: return my_other_chain.invoke({"original": a, "error": exc}, config) ``` In this case, the next chain takes in the exception object. Maybe this could be something we toggle in `with_fallbacks` but I fear we'll get into uglier APIs + heavier cognitive load if we try to do too much there --------- Co-authored-by: Nuno Campos <nuno@boringbits.io>	2023-09-06 05:54:38 -07:00
mateusz.wosinski	7b7bea5424	Fix linters, update notebook	2023-09-06 10:22:42 +02:00
Bagatur	c732d8fffd	use case docs reorder (#10074 )	2023-09-05 15:11:16 -07:00
Mario Scrocca	334bd8ebbe	Fix bug in SPARQL intent selection (#8521 ) - Description: Fix bug in SPARQL intent selection - Issue: After the change in #7758 the intent is always set to "UPDATE". Indeed, if the answer to the prompt contains only "SELECT" the `find("SELECT")` operation returns a higher value w.r.t. `-1` returned by `find("UPDATE")`. - Dependencies: None, - Tag maintainer: @baskaryan @aditya-29 - Twitter handle: @mario_scrock	2023-09-05 14:37:02 -07:00
Predrag Gruevski	7fe8bf03a0	Final poetry action fix: manually recreate softlinks broken by caching. (#10250 ) It seems the caching action was not always correctly recreating softlinks. At first glance, the softlinks it created seemed fine, but they didn't always work. Possibly hitting some kind of underlying bug, but not particularly worth debugging in depth -- we can manually create the soft links we need.	2023-09-05 15:47:58 -04:00
Predrag Gruevski	619516260d	Re-enable poetry binary caching with fix and more logging. (#10244 ) - Revert "Temporarily disable step that seems to be transiently failing. (#10234)" - Refresh shell hashtable and show poetry/python location and version.	2023-09-05 14:03:03 -04:00
Predrag Gruevski	803be5b986	Run CI when CI infra itself has changed. (#10239 ) Make sure that changes to CI infrastructure get tested on CI before being merged. Without this PR, changes to the poetry setup action don't trigger a CI run and in principle could break `master` when merged.	2023-09-05 13:08:19 -04:00

... 3 4 5 6 7 ...

4623 Commits