langchain

Commit Graph

Author	SHA1	Message	Date
Bagatur	672907bbbb	bump 284 (#10330 )	1 year ago
maks-operlejn-ds	f747e76b73	Fixed link to colab notebook (#10320 ) small fix to anonymizer documentation	1 year ago
maks-operlejn-ds	4cc4534d81	Data deanonymization (#10093 ) ### Description The feature for pseudonymizing data with ability to retrieve original text (deanonymization) has been implemented. In order to protect private data, such as when querying external APIs (OpenAI), it is worth pseudonymizing sensitive data to maintain full privacy. But then, after the model response, it would be good to have the data in the original form. I implemented the `PresidioReversibleAnonymizer`, which consists of two parts: 1. anonymization - it works the same way as `PresidioAnonymizer`, plus the object itself stores a mapping of made-up values to original ones, for example: ``` { "PERSON": { "<anonymized>": "<original>", "John Doe": "Slim Shady" }, "PHONE_NUMBER": { "111-111-1111": "555-555-5555" } ... } ``` 2. deanonymization - using the mapping described above, it matches fake data with original data and then substitutes it. Between anonymization and deanonymization user can perform different operations, for example, passing the output to LLM. ### Future works - instance anonymization - at this point, each occurrence of PII is treated as a separate entity and separately anonymized. Therefore, two occurrences of the name John Doe in the text will be changed to two different names. It is therefore worth introducing support for full instance detection, so that repeated occurrences are treated as a single object. - better matching and substitution of fake values for real ones - currently the strategy is based on matching full strings and then substituting them. Due to the indeterminism of language models, it may happen that the value in the answer is slightly changed (e.g. John Doe -> John or Main St, New York -> New York) and such a substitution is then no longer possible. Therefore, it is worth adjusting the matching for your needs. - Q&A with anonymization - when I'm done writing all the functionality, I thought it would be a cool resource in documentation to write a notebook about retrieval from documents using anonymization. An iterative process, adding new recognizers to fit the data, lessons learned and what to look out for ### Twitter handle @deepsense_ai / @MaksOpp --------- Co-authored-by: MaksOpp <maks.operlejn@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	1 year ago
Bagatur	67696fe3ba	Add myscale vector sql retriever chain (#10305 )	1 year ago
Bagatur	f4f9254dad	Move Myscale SQL vector retrieval nb	1 year ago
刘方瑞	890ed775a3	Resolve: VectorSearch enabled SQLChain? (#10177 ) Squashed from #7454 with updated features We have separated the `SQLDatabseChain` from `VectorSQLDatabseChain` and put everything into `experimental/`. Below is the original PR message from #7454. ------- We have been working on features to fill up the gap among SQL, vector search and LLM applications. Some inspiring works like self-query retrievers for VectorStores (for example [Weaviate](https://python.langchain.com/en/latest/modules/indexes/retrievers/examples/weaviate_self_query.html) and [others](https://python.langchain.com/en/latest/modules/indexes/retrievers/examples/self_query.html)) really turn those vector search databases into a powerful knowledge base! 🚀🚀 We are thinking if we can merge all in one, like SQL and vector search and LLMChains, making this SQL vector database memory as the only source of your data. Here are some benefits we can think of for now, maybe you have more 👀: With ALL data you have: since you store all your pasta in the database, you don't need to worry about the foreign keys or links between names from other data source. Flexible data structure: Even if you have changed your schema, for example added a table, the LLM will know how to JOIN those tables and use those as filters. SQL compatibility: We found that vector databases that supports SQL in the marketplace have similar interfaces, which means you can change your backend with no pain, just change the name of the distance function in your DB solution and you are ready to go! ### Issue resolved: - [Feature Proposal: VectorSearch enabled SQLChain?](https://github.com/hwchase17/langchain/issues/5122) ### Change made in this PR: - An improved schema handling that ignore `types.NullType` columns - A SQL output Parser interface in `SQLDatabaseChain` to enable Vector SQL capability and further more - A Retriever based on `SQLDatabaseChain` to retrieve data from the database for RetrievalQAChains and many others - Allow `SQLDatabaseChain` to retrieve data in python native format - Includes PR #6737 - Vector SQL Output Parser for `SQLDatabaseChain` and `SQLDatabaseChainRetriever` - Prompts that can implement text to VectorSQL - Corresponding unit-tests and notebook ### Twitter handle: - @MyScaleDB ### Tag Maintainer: Prompts / General: @hwchase17, @baskaryan DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev ### Dependencies: No dependency added	1 year ago
Bagatur	849e345371	Bagatur/nuclia vector (#10301 )	1 year ago
Bagatur	0c760f184c	Update NucliaDB vecstore deps	1 year ago
Eric BREHAULT	19b4ecdc39	Implement NucliaDB vector store (#10236 ) # Description This pull request allows to use the [NucliaDB](https://docs.nuclia.dev/docs/docs/nucliadb/intro) as a vector store in LangChain. It works with both a [local NucliaDB instance](https://docs.nuclia.dev/docs/docs/nucliadb/deploy/basics) or with [Nuclia Cloud](https://nuclia.cloud). # Dependencies It requires an up-to-date version of the `nuclia` Python package. @rlancemartin, @eyurtsev, @hinthornw, please review it when you have a moment :) Note: our Twitter handler is `@NucliaAI`	1 year ago
cccs-eric	b64a443f72	Fix SQL search_path for Trino query engine (#10248 ) This PR replaces the generic `SET search_path TO` statement by `USE` for the Trino dialect since Trino does not support `SET search_path`. Official Trino documentation can be found [here](https://trino.io/docs/current/sql/use.html). With this fix, the `SQLdatabase` will now be able to set the current schema and execute queries using the Trino engine. It will use the catalog set as default by the connection uri.	1 year ago
Bagatur	1fb7bdd595	Split sql use case docs (#10257 ) Split sql use case into directory so we can add other structured data pages	1 year ago
Bagatur	763212eafd	Add use case nb position (#10299 )	1 year ago
Ikko Eltociear Ashimine	ea5d29a702	Update amazon_comprehend_chain.ipynb (#10246 ) Huggingface, HuggingFace -> Hugging Face	1 year ago
Brian Antonelli	4df101cf77	Don't hardcode PGVector distance strategies (#10265 ) - Description: Remove hardcoded/duplicated distance strategies in the PGVector store. - Issue: NA - Dependencies: NA - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: @archmonkeymojo --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	1 year ago
captivus	86cb9da735	Updated Additional Resources section of documentation (#10260 ) - Description: Updated Additional Resources section of documentation and added to YouTube videos with excellent playlist of Langchain content from Sam Witteveen - Issue: None -- updating documentation - Dependencies: None - Tag maintainer: @baskaryan	1 year ago
JaéGeR	b8669b249e	Added Hugging face inference api (#10280 ) Embed documents without locally downloading the HF model --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	1 year ago
Ilya	6e6f15df24	Add strip text splits flag (#10295 ) #10085 --------- Co-authored-by: codesee-maps[bot] <86324825+codesee-maps[bot]@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	1 year ago
Randy	1690013711	Doc: openai_functions_agent.mdx import (#10282 ) Fix the import in docmention --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	1 year ago
William FH	13c5951e26	Add LCEL cookbook examples (#10290 ) 1. For passing config to runnable lambda 2. For branching and merging	1 year ago
ParamdeepSinghShorthillsAI	3cc242b591	Update rwkv.py import error (#10293 ) I have updated the code to ensure consistent error handling for ImportError. Instead of relying on ValueError as before, I've followed the standard practice of raising ImportError while also including detailed error messages. This modification improves code clarity and explicitly indicates that any issues are related to module imports.	1 year ago
Pihplipe Oegr	bce38b7163	Add notebook example to use sqlite-vss as a vector store. (#10292 ) Follow-up PR for https://github.com/langchain-ai/langchain/pull/10047, simply adding a notebook quickstart example for the vector store with SQLite, using the class SQLiteVSS. Maintainer tag @baskaryan Co-authored-by: Philippe Oger <philippe.oger@adevinta.com>	1 year ago
Tomaz Bratanic	db73c9d5b5	Diffbot Graph Transformer / Neo4j Graph document ingestion (#9979 ) Co-authored-by: Bagatur <baskaryan@gmail.com>	1 year ago
Predrag Gruevski	ccb9e3ee2d	Install dev, lint, test, typing extra deps for linting steps. (#10249 ) `mypy` cannot type-check code that relies on dependencies that aren't installed. Eventually we'll probably want to install as many optional dependencies as possible. However, the full "extended deps" setup for langchain creates a 3GB cache file and takes a while to unpack and install. We'll probably want something a bit more targeted. This is a first step toward something better.	1 year ago
Predrag Gruevski	82d5d4d0ae	Deny creating files as a result of test runs. (#10253 ) A test file was accidentally dropping a `results.json` file in the current working directory as a result of running `make test`. This is undesirable, since we don't want to risk accidentally adding stray files into the repo if we run tests locally and then do `git add .` without inspecting the file list very closely.	1 year ago
Predrag Gruevski	8d5bf1fb20	Fix langchain lint on `master`. (#10289 )	1 year ago
Nik	49341483da	Update Banana.dev docs to latest correct usage (#10183 ) - Description: this PR updates all Banana.dev-related docs to match the latest client usage. The code in the docs before this PR were out of date and would never run. - Issue: [#6404](https://github.com/langchain-ai/langchain/issues/6404) - Dependencies: - - Tag maintainer: - Twitter handle: [BananaDev_ ](https://twitter.com/BananaDev_ )	1 year ago
Bagatur	9e839d4977	bump 283 (#10287 )	1 year ago
William FH	ffca5e7eea	Allow config propagation, Add default lambda name, Improve ergonomics of config passed in (#10273 ) Makes it easier to do recursion using regular python compositional patterns ```py def lambda_decorator(func): """Decorate function as a RunnableLambda""" return runnable.RunnableLambda(func) @lambda_decorator def fibonacci(a, config: runnable.RunnableConfig) -> int: if a <= 1: return a else: return fibonacci.invoke( a - 1, config ) + fibonacci.invoke(a - 2, config) fibonacci.invoke(10) ``` https://smith.langchain.com/public/cb98edb4-3a09-4798-9c22-a930037faf88/r Also makes it more natural to do things like error handle and call other langchain objects in ways we probably don't want to support in `with_fallbacks()` ```py @lambda_decorator def handle_errors(a, config: runnable.RunnableConfig) -> int: try: return my_chain.invoke(a, config) except MyExceptionType as exc: return my_other_chain.invoke({"original": a, "error": exc}, config) ``` In this case, the next chain takes in the exception object. Maybe this could be something we toggle in `with_fallbacks` but I fear we'll get into uglier APIs + heavier cognitive load if we try to do too much there --------- Co-authored-by: Nuno Campos <nuno@boringbits.io>	1 year ago
Bagatur	c732d8fffd	use case docs reorder (#10074 )	1 year ago
Mario Scrocca	334bd8ebbe	Fix bug in SPARQL intent selection (#8521 ) - Description: Fix bug in SPARQL intent selection - Issue: After the change in #7758 the intent is always set to "UPDATE". Indeed, if the answer to the prompt contains only "SELECT" the `find("SELECT")` operation returns a higher value w.r.t. `-1` returned by `find("UPDATE")`. - Dependencies: None, - Tag maintainer: @baskaryan @aditya-29 - Twitter handle: @mario_scrock	1 year ago
Predrag Gruevski	7fe8bf03a0	Final poetry action fix: manually recreate softlinks broken by caching. (#10250 ) It seems the caching action was not always correctly recreating softlinks. At first glance, the softlinks it created seemed fine, but they didn't always work. Possibly hitting some kind of underlying bug, but not particularly worth debugging in depth -- we can manually create the soft links we need.	1 year ago
Predrag Gruevski	619516260d	Re-enable poetry binary caching with fix and more logging. (#10244 ) - Revert "Temporarily disable step that seems to be transiently failing. (#10234)" - Refresh shell hashtable and show poetry/python location and version.	1 year ago
Predrag Gruevski	803be5b986	Run CI when CI infra itself has changed. (#10239 ) Make sure that changes to CI infrastructure get tested on CI before being merged. Without this PR, changes to the poetry setup action don't trigger a CI run and in principle could break `master` when merged.	1 year ago
olgavrou	514857c10e	Merge pull request #13 from VowpalWabbit/small_dep_fixes fixes	1 year ago
olgavrou	15d33a144d	Merge pull request #14 from VowpalWabbit/notebook_fix Notebook fix	1 year ago
olgavrou	235dacc74a	Merge branch 'langchain-ai:master' into master	1 year ago
Bagatur	c8d7ee62ba	bump 282 (#10233 )	1 year ago
Predrag Gruevski	e34ad6fefd	Temporarily disable step that seems to be transiently failing. (#10234 )	1 year ago
Nuno Campos	5d8673a3c1	Fix usage of AsyncHtmlLoader with an already running event loop (#10220 )	1 year ago
olgavrou	3a4c895280	Merge pull request #11 from VowpalWabbit/add_notebook add random policy and notebook example	1 year ago
vintro	ac2310a405	add NumberedListOutputParser to output_parser init (#10204 ) `from langchain.output_parsers import NumberedListOutputParser` did not work, needed to add it to the init file	1 year ago
Junlin Zhou	8b95dabfe3	update(llms/TGI): Allow None as temperature value (#10212 ) Text Generation Inference's client permits the use of a None temperature as seen [here](`033230ae66/clients/python/text_generation/client.py (L71C9-L71C20)`). While I haved dived into TGI's server code and don't know about the implications of using None as a temperature setting, I think we should grant users the option to pass None as a temperature parameter to TGI.	1 year ago
olgavrou	327ea43c67	Empty-Commit	1 year ago
olgavrou	1d4e73b9f8	Merge remote-tracking branch 'origin' into small_dep_fixes	1 year ago
olgavrou	d6320cc2c0	..	1 year ago
olgavrou	7a4387c60d	notebook fix	1 year ago
olgavrou	e1791225ae	Merge remote-tracking branch 'origin' into small_dep_fixes	1 year ago
olgavrou	fdb611cc42	update poetry	1 year ago
olgavrou	8d3a8fbefe	fixes	1 year ago
William FH	be152b6a56	Better ls info (#10202 )	1 year ago

1 2 3 4 5 ...

4524 Commits (534f1b63c5a6341c358ef4135ed15e865fe518e3) All Branches Search

4524 Commits (534f1b63c5a6341c358ef4135ed15e865fe518e3)

All Branches