langchain

Commit Graph

Author	SHA1	Message	Date
blob42	f2fef44fef	docker wrapper tool for untrusted execution	1 year ago
Harrison Chase	d19f269a34	bump version to 0.0.95 (#1324 )	1 year ago
Harrison Chase	88e862e862	Harrison/indexing pipeline (#1317 )	1 year ago
Harrison Chase	62749f02c9	Harrison/deeplake (#1316 ) Co-authored-by: Davit Buniatyan <d@activeloop.ai>	1 year ago
Harrison Chase	fc581717b5	Harrison/atlas db (#1315 ) Co-authored-by: Brandon Duderstadt <brandonduderstadt@gmail.com>	1 year ago
Marc Puig	8f74a91ca4	Making it possible to use "certainty" as a parameter for the weaviate similarity_search (#1218 ) Checking if weaviate similarity_search kwargs contains "certainty" and use it accordingly. The minimal level of certainty must be a float, and it is computed by normalized distance.	1 year ago
Alexander Hoyle	385d88f646	Avoid IntegrityError for SQLiteCache updates (#1286 ) While using a `SQLiteCache`, if there are duplicate `(prompt, llm, idx)` tuples passed to [`update_cache()`](`c5dd491a21/langchain/llms/base.py (L39)`), then an `IntegrityError` is thrown. This can happen when there are duplicated prompts within the same batch. This PR changes the SQLAlchemy `session.add()` to a `session.merge()` in `cache.py`, [following the solution from this SO thread](https://stackoverflow.com/questions/10322514/dealing-with-duplicate-primary-keys-on-insert-in-sqlalchemy-declarative-style). I believe this fixes #983, but not entirely sure since that also involves async Here's a minimal example of the error: ```python from pathlib import Path import langchain from langchain.cache import SQLiteCache llm = langchain.OpenAI(model_name="text-ada-001", openai_api_key=Path("/.openai_api_key").read_text().strip()) langchain.llm_cache = SQLiteCache("test_cache.db") llm.generate(['a'] * 5) ``` ``` > IntegrityError: (sqlite3.IntegrityError) UNIQUE constraint failed: full_llm_cache.prompt, full_llm_cache.llm, full_llm_cache.idx [SQL: INSERT INTO full_llm_cache (prompt, llm, idx, response) VALUES (?, ?, ?, ?)] [parameters: ('a', "[('_type', 'openai'), ('best_of', 1), ('frequency_penalty', 0), ('logit_bias', {}), ('max_tokens', 256), ('model_name', 'text-ada-001'), ('n', 1), ('presence_penalty', 0), ('request_timeout', None), ('stop', None), ('temperature', 0.7), ('top_p', 1)]", 0, '\n\nA is for air.\n\nA is for atmosphere.')] (Background on this error at: https://sqlalche.me/e/14/gkpj) ``` After the change, we now have the following ```python class Output: def __init__(self, text): self.text = text # make dummy data cache = SQLiteCache("test_cache_2.db") cache.update(prompt="prompt_0", llm_string="llm_0", return_val=[Output("text_0")]) cache.engine.execute("SELECT * FROM full_llm_cache").fetchall() # output > [('prompt_0', 'llm_0', 0, 'text_0')] ``` ```python # update data, before change this would have thrown an `IntegrityError` cache.update(prompt="prompt_0", llm_string="llm_0", return_val=[Output("text_0_new")]) cache.engine.execute("SELECT * FROM full_llm_cache").fetchall() # output > [('prompt_0', 'llm_0', 0, 'text_0_new')] ```	1 year ago
Harrison Chase	0f6ce902bb	Harrison/banana fix (#1311 ) Co-authored-by: Erik Dunteman <44653944+erik-dunteman@users.noreply.github.com>	1 year ago
Ingo Kleiber	9159e3ec56	add CoNLL-U document loader (#1297 ) I've added a simple [CoNLL-U](https://universaldependencies.org/format.html) document loader. CoNLL-U is a common format for NLP tasks and is used, for example, in the Universal Dependencies treebank corpora. The loader reads a single file in standard CoNLL-U format and returns a document.	1 year ago
Harrison Chase	5ba1c7b601	ruff ruff (#1203 )	1 year ago
Harrison Chase	429af93cab	fix imports (#1288 )	1 year ago
Matt Robinson	3830d900bf	feat: document loader for MS Word documents (#1282 ) ### Summary Adds a document loader for MS Word Documents. Works with both `.docx` and `.doc` files as longer as the user has installed `unstructured>=0.4.11`. ### Testing The follow workflow test the loader for both `.doc` and `.docx` files using example docs from the `unstructured` repo. #### `.docx` ```python from langchain.document_loaders import UnstructuredWordDocumentLoader filename = "../unstructured/example-docs/fake.docx" loader = UnstructuredWordDocumentLoader(filename) loader.load() ``` #### `.doc` ```python from langchain.document_loaders import UnstructuredWordDocumentLoader filename = "../unstructured/example-docs/fake.doc" loader = UnstructuredWordDocumentLoader(filename) loader.load() ```	1 year ago
Harrison Chase	cc10f1fe9c	cleanup (#1274 )	1 year ago
Harrison Chase	cf8cb58b3a	Harrison/cohere params (#1278 ) Co-authored-by: Stefano Faraggi <40745694+stepp1@users.noreply.github.com>	1 year ago
Harrison Chase	42ddf44f86	Harrison/logprobs (#1279 ) Co-authored-by: Prateek Shah <97124740+prateekspanning@users.noreply.github.com>	1 year ago
Harrison Chase	b592b4d3f6	Harrison/fb loader (#1277 ) Co-authored-by: Vairo Di Pasquale <vairo.dp@gmail.com>	1 year ago
Harrison Chase	91584382e3	Harrison/errors (#1276 ) Co-authored-by: Kevin Huo <5000881+kwhuo68@users.noreply.github.com>	1 year ago
Klein Tahiraj	24f02aa39a	adding .ipynb loader and documentation Fixes #1248 (#1252 ) `NotebookLoader.load()` loads the `.ipynb` notebook file into a `Document` object. Parameters: * `include_outputs` (bool): whether to include cell outputs in the resulting document (default is False). * `max_output_length` (int): the maximum number of characters to include from each cell output (default is 10). * `remove_newline` (bool): whether to remove newline characters from the cell sources and outputs (default is False). * `traceback` (bool): whether to include full traceback (default is False).	1 year ago
Harrison Chase	6cd37e308f	Harrison/source docs (#1275 ) Co-authored-by: Tushar Dhadiwal <tushardhadiwal@users.noreply.github.com>	1 year ago
Enrico Shippole	4dbea7e9e4	Add Writer, Banana, Modal, StochasticAI (#1270 ) Add LLM wrappers and examples for Banana, Writer, Modal, Stochastic AI Added rigid json format for Banana and Modal	1 year ago
blob42	04bdd22b9f	searx: add `query_suffix` parameter (#1259 ) - allows to build tools and dynamically inject extra searxh suffix in the query. example: `search.run("python library", query_suffix="site:github.com")` resulting query: `python library site:github.com` Co-authored-by: blob42 <spike@w530>	1 year ago
Harrison Chase	df37dd1be4	fix bug with length function (#1257 )	1 year ago
Iskren Ivov Chernev	2ab4def6d3	Add DeepInfra LLM support (#1232 ) DeepInfra is an Inference-as-a-Service provider. Add a simple wrapper using HTTPS requests.	1 year ago
Satoru Sakamoto	b248037053	fix to specific language transcript (#1231 ) Currently youtube loader only seems to support English audio. Changed to load videos in the specified language.	1 year ago
Harrison Chase	fcfb409dd3	add ifttt tool (#1244 )	1 year ago
Jon Luo	aec2bb84a8	Don't instruct LLM to use the LIMIT clause, which is incompatible with SQL Server (#1242 ) The current prompt specifically instructs the LLM to use the `LIMIT` clause. This will cause issues with MS SQL Server, which uses `SELECT TOP` instead of `LIMIT`. The generated SQL will use `LIMIT`; the instruction to "always limit... using the LIMIT clause" seems to override the "create a syntactically correct mssql query to run" portion. Reported here: https://github.com/hwchase17/langchain/issues/1103#issuecomment-1441144224 I don't have access to a SQL Server instance to test, but removing that part of the prompt in OpenAI Playground results in the correct `SELECT TOP` syntax, whereas keeping it in results in the `LIMIT` clause, even when instructing it to generate syntactically correct mssql. It's also still correctly using `LIMIT` in my MariaDB database. I think in this case we can assume that the model will select the appropriate method based on the dialect specified. In general, it would be nice to be able to test a suite of SQL dialects for things like dialect-specific syntax and other issues we've run into in the past, but I'm not quite sure how to best approach that yet.	1 year ago
Dennis Antela Martinez	985f36eb3f	add aleph alpha llm (#1207 ) Integrate Aleph Alpha's client into Langchain to provide access to the luminous models - more info on latest benchmarks here: https://www.aleph-alpha.com/luminous-performance-benchmarks	1 year ago
Klein Tahiraj	ebb9e4087c	Fixing typo in loading.py (#1235 ) Just fixing a typo I found in loading.py	1 year ago
Jon Luo	8cad8c34cb	fix sqlite internal tables breaking table_info (#1224 ) With the current method used to get the SQL table info, sqlite internal schema tables are being included and are not being handled correctly by sqlalchemy because the columns have no types. This is easy to see with the Chinook database: ```python db = SQLDatabase.from_uri("sqlite:///Chinook.db") print(db.table_info) ``` ```python ... sqlalchemy.exc.CompileError: (in table 'sqlite_sequence', column 'name'): Can't generate DDL for NullType(); did you forget to specify a type on this Column? ``` SQLAlchemy 2.0 [ignores these by default](`63d90b0f44/lib/sqlalchemy/dialects/sqlite/base.py (L856-L880)`): `63d90b0f44/lib/sqlalchemy/dialects/sqlite/base.py (L2096-L2123)`	1 year ago
djacobs7	043ce02906	Fix typo in constitutional_ai base.py (#1216 ) Found a typo in the documentation code for the constitutional_ai module	1 year ago
blob42	ffeb00c82b	searx: remove duplicate param (#1219 ) Co-authored-by: blob42 <spike@w530>	1 year ago
Harrison Chase	f3c92172f2	Harrison/unstructured io (#1200 )	1 year ago
Harrison Chase	df84f69b6c	Harrison/updating docs (#1196 )	1 year ago
Harrison Chase	5f3550437f	rfc: callback changes (#1165 ) conceptually, no reason a tool should know what an "agent action" is unless any objections, can change in all callback handlers	1 year ago
Harrison Chase	0008c431fc	catch networkx error (#1201 )	1 year ago
Harrison Chase	529df2a2dd	move serpapi wrapper (#1199 ) Co-authored-by: Tim Asp <707699+timothyasp@users.noreply.github.com>	1 year ago
Konstantin Hebenstreit	bfe6038c3b	HuggingFaceEndpoint: Correct Example for ImportError (#1176 ) When I try to import the Class HuggingFaceEndpoint I get an Import Error: cannot import name 'HuggingFaceEndpoint' from 'langchain'. (langchain version 0.0.88) These two imports work fine: from langchain import HuggingFacePipeline and from langchain import HuggingFaceHub. So I corrected the import statement in the example. There is probably a better solution to this, but this fixes the Error for me.	1 year ago
Harrison Chase	8cc3fd424f	Harrison/add documents (#1197 ) Co-authored-by: OmriNach <32659330+OmriNach@users.noreply.github.com>	1 year ago
Francisco Ingham	f5c83eaef4	added ability to override default verbose and memory when load chain … (#1153 ) It is useful to be able to specify `verbose` or `memory` while still keeping the chain's overall structure. --------- Co-authored-by: Francisco Ingham <>	1 year ago
Anton Troynikov	28ffe63136	Default Chroma collection name (#1198 ) For persistence, it's convenient to have a default collection name which gets used everywhere.	1 year ago
Dennis Antela Martinez	1053c94f17	add gitbook document loader (#1180 ) Added a GitBook document loader. It lets you both, (1) fetch text from any single GitBook page, or (2) fetch all relative paths and return their respective content in Documents. I've modified the `scrape` method in the `WebBaseLoader` to accept custom web paths if given, but happy to remove it and move that logic into the `GitbookLoader` itself.	1 year ago
William FH	fb3c992749	Add a StdIn "Interaction" Tool (#1193 ) Lets a chain prompt the user for more input as a part of its execution.	1 year ago
Naveen Tatikonda	8e2152c1d6	Add Support for OpenSearch Vector database (#1191 ) ### Description This PR adds a wrapper which adds support for the OpenSearch vector database. Using opensearch-py client we are ingesting the embeddings of given text into opensearch cluster using Bulk API. We can perform the `similarity_search` on the index using the 3 popular searching methods of OpenSearch k-NN plugin: - `Approximate k-NN Search` use approximate nearest neighbor (ANN) algorithms from the [nmslib](https://github.com/nmslib/nmslib), [faiss](https://github.com/facebookresearch/faiss), and [Lucene](https://lucene.apache.org/) libraries to power k-NN search. - `Script Scoring` extends OpenSearch’s script scoring functionality to execute a brute force, exact k-NN search. - `Painless Scripting` adds the distance functions as painless extensions that can be used in more complex combinations. Also, supports brute force, exact k-NN search like Script Scoring. ### Issues Resolved https://github.com/hwchase17/langchain/issues/1054 --------- Signed-off-by: Naveen Tatikonda <navtat@amazon.com>	1 year ago
Andrew White	f0f9f276bd	Allow k to be higher than doc size in max_marginal_relevance_search (#1187 ) Fixes issue #1186. For some reason, #1117 didn't seem to fix it.	1 year ago
Zach Schillaci	06d1af114a	Refactor some loops into list comprehensions (#1185 )	1 year ago
Harrison Chase	c42f19d681	clean up loaders (#1178 )	1 year ago
blob42	9962bda70b	searx_search: docs updates (#1175 ) - fix notebook formatting, remove empty cells and add scrolling for long text --------- Co-authored-by: blob42 <spike@w530>	1 year ago
Harrison Chase	28781a6213	Harrison/markdown splitter (#1169 ) Co-authored-by: Michael Chen <flamingdescent@gmail.com> Co-authored-by: Michael Chen <michaelchen@stripe.com>	1 year ago
Harrison Chase	37dd34bea5	fix path (#1168 )	1 year ago
Ji	ed37fbaeff	for ChatVectorDBChain, add top_k_docs_for_context to allow control how many chunks of context will be retrieved (#1155 ) given that we allow user define chunk size, think it would be useful for user to define how many chunks of context will be retrieved.	1 year ago

1 2 3 4 5 ...

470 Commits (f2fef44fef8e7572eab34d8cd5763d59d8ff7142)