langchain

mirror of https://github.com/hwchase17/langchain synced 2024-11-08 07:10:35 +00:00

Author	SHA1	Message	Date
William FH	18005c6384	Disable trace_on_chain_group auto-tracing (#12807 ) Previously we treated trace_on_chain_group as a command to always start tracing. This is unintuitive (makes the function do 2 things), and makes it harder to toggle tracing	2023-11-03 10:05:09 -07:00
Erick Friis	0da75b9ebd	Autopopulate module name in cli init (#12814 )	2023-11-02 23:45:38 -07:00
William FH	98aff29fbd	Add Dataset Page to printout (#12816 )	2023-11-02 20:36:56 -07:00
Manuel Rech	2e2b9c76d9	Keep also original query - multi_query.py (#12696 ) When you use a MultiQuery it might be useful to use the original query as well as the newly generated ones to maximise the changes to retriever the correct document. I haven't created an issue, it seems a very small and easy thing. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-11-02 18:15:02 -07:00
Bagatur	658a3a8607	FEAT: Merge TileDB vecstore (#12811 )	2023-11-02 17:40:32 -07:00
Akio Nishimura	c04647bb4e	Correct number of elements in config list in `batch()` and `abatch()` of `BaseLLM` (#12713 ) - Description: Correct number of elements in config list in `batch()` and `abatch()` of `BaseLLM` in case `max_concurrency` is not None. - Issue: #12643 - Twitter handle: @akionux --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-11-02 17:28:48 -07:00
James Braza	88b506b321	Adds missing `urllib.parse` for IDE warning of `PubMedAPIWrapper` (#12808 ) Resolves an IDE (PyCharm 2023.2.3 PE) warning around `urllib.parse.quote`, also enabling CTRL-click	2023-11-02 17:27:25 -07:00
Bagatur	a2bb0dd445	TileDB update import unit tests	2023-11-02 17:24:22 -07:00
Nikos Papailiou	2fdaa1e5fd	Add TileDB vectorstore implementation (#12624 ) - Description: Add [TileDB](https://tiledb.com) vectorstore implementation. TileDB offers ANN search capabilities using the [TileDB-Vector-Search](https://github.com/TileDB-Inc/TileDB-Vector-Search) module. It provides serverless execution of ANN queries and storage of vector indexes both on local disk and cloud object stores (i.e. AWS S3). More details in: - [Why TileDB as a Vector Database](https://tiledb.com/blog/why-tiledb-as-a-vector-database) - [TileDB 101: Vector Search](https://tiledb.com/blog/tiledb-101-vector-search) - Twitter handle: @tiledb	2023-11-02 17:21:03 -07:00
盐粒 Yanli	1b233798a0	feat: Supprt pgvecto.rs as a VectorStore (#12718 ) Supprt [pgvecto.rs](https://github.com/tensorchord/pgvecto.rs) as a new VectorStore type. This introduces a new dependency [pgvecto_rs](https://pypi.org/project/pgvecto_rs/) and upgrade SQLAlchemy to ^2. Relate to https://github.com/tensorchord/pgvecto.rs/issues/11 --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-11-02 17:16:04 -07:00
Daniel Chalef	0cbdba6a9b	zep: VectorStore: Use Native MMR (#12690 ) - refactor to use Zep's native MMR; update example - @baskaryan @eyurtsev	2023-11-02 16:45:42 -07:00
Daniel Chalef	cc3d3920e3	Zep: Summary Search and Example (#12686 ) Zep now has the ability to search over chat history summaries. This PR adds support for doing so. More here: https://blog.getzep.com/zep-v0-17/ @baskaryan @eyurtsev	2023-11-02 16:31:11 -07:00
Bagatur	526313002c	add import tests to all modules (#12806 )	2023-11-02 15:32:55 -07:00
Harrison Chase	6609a6033f	fix vectorstore imports (#12804 ) Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-11-02 15:32:31 -07:00
Nuno Campos	f66a9d2adf	Automatically add configurable key to config_schema if config_specs i… (#12798 ) …s present <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/extras` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. -->	2023-11-02 21:46:15 +00:00
Praveen Venkateswaran	21eeba075c	enable the device_map parameter in huggingface pipeline (#12731 ) ### Enabling `device_map` in HuggingFacePipeline For multi-gpu settings with large models, the [accelerate](https://huggingface.co/docs/accelerate/usage_guides/big_modeling#using--accelerate) library provides the `device_map` parameter to automatically distribute the model across GPUs / disk. The [Transformers pipeline](`3520e37e86/src/transformers/pipelines/__init__.py (L543)`) enables users to specify `device` (or) `device_map`, and handles cases (with warnings) when both are specified. However, Langchain's HuggingFacePipeline only supports specifying `device` when calling transformers which limits large models and multi-gpu use-cases. Additionally, the [default value](`8bd3ce59cd/libs/langchain/langchain/llms/huggingface_pipeline.py (L72)`) of `device` is initialized to `-1` , which is incompatible with the transformers pipeline when `device_map` is specified. This PR addresses the addition of `device_map` as a parameter , and solves the incompatibility of `device = -1` when `device_map` is also specified. An additional test has been added for this feature. Additionally, some existing tests no longer work since 1. `max_new_tokens` has to be specified under `pipeline_kwargs` and not `model_kwargs` 2. The GPT2 tokenizer raises a `ValueError: Pipeline with tokenizer without pad_token cannot do batching`, since the `tokenizer.pad_token` is `None` ([related issue](https://github.com/huggingface/transformers/issues/19853) on the transformers repo). This PR handles fixing these tests as well. Co-authored-by: Praveen Venkateswaran <praveen.venkateswaran@ibm.com>	2023-11-02 14:29:06 -07:00
Mark Bell	3276aa3e17	__getattr__ should rase AttributeError not ImportError on missing attributes (#12801 ) [The python spec](https://docs.python.org/3/reference/datamodel.html#object.__getattr__) requires that `__getattr__` throw `AttributeError` for missing attributes but there are several places throwing `ImportError` in the current code base. This causes a specific problem with `hasattr` since it calls `__getattr__` then looks only for `AttributeError` exceptions. At present, calling `hasattr` on any of these modules will raise an unexpected exception that most code will not handle as `hasattr` throwing exceptions is not expected. In our case this is triggered by an exception tracker (Airbrake) that attempts to collect the version of all installed modules with code that looks like: `if hasattr(mod, "__version__"):`. With `HEAD` this is causing our exception tracker to fail on all exceptions. I only changed instances of unknown attributes raising `ImportError` and left instances of known attributes raising `ImportError`. It feels a little weird but doesn't seem to break anything.	2023-11-02 17:08:54 -04:00
Illia	71d1a48b66	Use data from all Google search results in SerpApi.com wrapper (#12770 ) - Description: Use all Google search results data in SerpApi.com wrapper instead of the first one only - Tag maintainer: @hwchase17 _P.S. `libs/langchain/tests/integration_tests/utilities/test_serpapi.py` are not executed during the `make test`._	2023-11-02 13:31:27 -07:00
Nuno Campos	c4fdf78d03	Fix AddableDict raising exception when used with non-addable values (#12785 ) <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/extras` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. -->	2023-11-02 18:56:29 +00:00
Erick Friis	49e283a0cd	CLI 0.0.13, Configurable Template Demo (#12796 )	2023-11-02 11:42:57 -07:00
Nuno Campos	d1c6ad7769	Fix on_llm_new_token(chunk=) for some chat models (#12784 ) It was passing in message instead of generation <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/extras` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. -->	2023-11-02 16:33:44 +00:00
Erick Friis	070823f294	CLI 0.0.12 (#12787 )	2023-11-02 08:29:27 -07:00
Bagatur	979501c0ca	bump 329 (#12778 )	2023-11-02 06:02:43 -07:00
Erick Friis	da821320d3	Fixes 'Nonetype' not iterable for ObsidianLoader (#12751 ) Implements #12726 from @Di3mex	2023-11-01 16:07:09 -07:00
Eugene Yurtsev	b1caae62fd	APIChain add restrictions to domains (CVE-2023-32786) (#12747 ) * Restrict the chain to specific domains by default * This is a breaking change, but it will fail loudly upon object instantiation -- so there should be no silent errors for users * Resolves CVE-2023-32786	2023-11-01 18:50:34 -04:00
Erick Friis	4421ba46d7	Demo Server, Fix Timescale (#12746 ) - improve demo server - missing deps	2023-11-01 15:29:34 -07:00
Eugene Yurtsev	0e1aedb9f4	Use jinja2 sandboxing by default (#12733 ) * This is an opt-in feature, so users should be aware of risks if using jinja2. * Regardless we'll add sandboxing by default to jinja2 templates -- this sandboxing is a best effort basis. * Best strategy is still to make sure that jinja2 templates are only loaded from trusted sources.	2023-11-01 14:54:01 -07:00
Erick Friis	14340ee7cd	use http.client instead of urllib3 (#12660 ) dep problems with requests cloudflare debugging not worth it with urllib	2023-11-01 11:15:05 -07:00
Bagatur	eee5181b7a	bump 328, exp 37 (#12722 )	2023-11-01 10:27:39 -07:00
Erick Friis	3405dbbc64	dash not underscore (#12716 ) template names are auto-populating with the wrong convention (with underscores)	2023-11-01 09:48:37 -07:00
123-fake-st	8bd3ce59cd	PyPDFLoader use url in metadata source if file is a web path (#12092 ) Description: Update `langchain.document_loaders.pdf.PyPDFLoader` to store url in metadata (instead of a temporary file path) if user provides a web path to a pdf - Issue: Related to #7034; the reporter on that issue submitted a PR updating `PyMuPDFParser` for this behavior, but it has unresolved merge issues as of 20 Oct 2023 #7077 - In addition to `PyPDFLoader` and `PyMuPDFParser`, these other classes in `langchain.document_loaders.pdf` exhibit similar behavior and could benefit from an update: `PyPDFium2Loader`, `PDFMinerLoader`, `PDFMinerPDFasHTMLLoader`, `PDFPlumberLoader` (I'm happy to contribute to some/all of that, including assisting with `PyMuPDFParser`, if my work is agreeable) - The root cause is that the underlying pdf parser classes, e.g. `langchain.document_loaders.parsers.pdf.PyPDFParser`, never receive information about the url; the parsers receive a `langchain.document_loaders.blob_loaders.blob`, which contains the pdf contents and local file path, but not the url - This update passes the web path directly to the parser since it's minimally invasive and doesn't require further changes to maintain existing behavior for local files... bigger picture, I'd consider extending `blob` so that extra information like this can be communicated, but that has much bigger implications on the codebase which I think warrants maintainer input - Dependencies: None ```python # old behavior >>> from langchain.document_loaders import PyPDFLoader >>> loader = PyPDFLoader('https://arxiv.org/pdf/1706.03762.pdf') >>> docs = loader.load() >>> docs[0].metadata {'source': '/var/folders/w2/zx77z1cs01s1thx5dhshkd58h3jtrv/T/tmpfgrorsi5/tmp.pdf', 'page': 0} # new behavior >>> from langchain.document_loaders import PyPDFLoader >>> loader = PyPDFLoader('https://arxiv.org/pdf/1706.03762.pdf') >>> docs = loader.load() >>> docs[0].metadata {'source': 'https://arxiv.org/pdf/1706.03762.pdf', 'page': 0} ```	2023-11-01 11:27:00 -04:00
Dave Kwon	b1954aab13	feat: Add page metadata on PDFMinerLoader (#12277 ) - Description: #12273 's suggestion PR Like other PDFLoader, loading pdf per each page and giving page metadata. - Issue: #12273 - Twitter handle: @blue0_0hope --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2023-11-01 11:25:37 -04:00
Duda Nogueira	7148f3e1fe	Weaviate - Fix schema existence check (#12711 ) This will allow you create the schema beforehand. The check was failing and preventing importing into existing classes. <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/extras` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. -->	2023-11-01 08:22:15 -07:00
Aidos Kanapyanov	ae63c186af	Mask API key for Anyscale LLM (#12406 ) Description: Add masking of API Key for Anyscale LLM when printed. Issue: #12165 Dependencies: None Tag maintainer: @eyurtsev --------- Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2023-11-01 10:22:26 -04:00
Predrag Gruevski	5ae51a8a85	Fix typo highlighted by `ruff` autoformatter. (#12691 ) H/t @MichaReiser for spotting it: https://github.com/langchain-ai/langchain/pull/12585/files#r1378253045	2023-10-31 22:16:06 -04:00
Erick Friis	44c8b159b9	properly increment version in cli (#12685 ) Went from 0.0.9 -> 0.0.11 without releasing. Back to 10, then release.	2023-10-31 17:27:43 -07:00
Leonid Ganeline	ddcec005bc	fix for `YahooFinanceNewsTool` (#12665 ) Added YahooFinanceNewsTool to the __init__.py It was missed here.	2023-10-31 14:58:09 -07:00
Predrag Gruevski	01a3c9b94e	Use an in-project virtualenv in the CLI package. (#12678 ) Keeping it in sync with how our other packages are configured.	2023-10-31 14:51:24 -07:00
Jacob Lee	bd668fcea1	Adds version CLI command (#12619 ) Will be automatically bumped with `poetry version patch`. @efriis @hwchase17 --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2023-10-31 14:50:04 -07:00
Frank	bf5805bb32	Add quip loader (#12259 ) - Description: implement [quip](https://quip.com) loader - Issue: https://github.com/langchain-ai/langchain/issues/10352 - Dependencies: No - pass make format, make lint, make test --------- Co-authored-by: Hao Fan <h_fan@apple.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-10-31 14:11:24 -07:00
Roman Vasilyev	c9a6940d58	PGVector fix (#12592 ) latest release broken, this fixes it --------- Co-authored-by: Roman Vasilyev <rvasilyev@mozilla.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2023-10-31 17:01:15 -04:00
Predrag Gruevski	e8b99364b3	Use `ruff` for both linting and formatting in `langchain-cli`. (#12672 ) Prior to this PR, `ruff` was used only for linting and not for formatting, despite the names of the commands. This PR makes it be used for both linting code and autoformatting it.	2023-10-31 13:52:25 -07:00
Margaret Qian	acfc485808	Update MosaicML Embedding Input Key (#12657 ) This input key was missed in the last update PR: https://github.com/langchain-ai/langchain/pull/7391 The input/output formats are intended to be like this: ``` {"inputs": [<prompt>]} {"outputs": [<output_text>]} ```	2023-10-31 14:43:30 -04:00
Predrag Gruevski	c871cc5055	Remove `print()` statements which seemed leftover from debugging. (#12648 ) Added in #12159 presumably during debugging. Right now they cause a bit of visual noise.	2023-10-31 13:45:48 -04:00
Noam Gat	14e8c74736	LM Format Enforcer Integration + Sample Notebook (#12625 ) ## Description This PR adds support for [lm-format-enforcer](https://github.com/noamgat/lm-format-enforcer) to LangChain. ![image](https://raw.githubusercontent.com/noamgat/lm-format-enforcer/main/docs/Intro.webp) The library is similar to jsonformer / RELLM which are supported in Langchain, but has several advantages such as - Batching and Beam search support - More complete JSON Schema support - LLM has control over whitespace, improving quality - Better runtime performance due to only calling the LLM's generate() function once per generate() call. The integration is loosely based on the jsonformer integration in terms of project structure. ## Dependencies No compile-time dependency was added, but if `lm-format-enforcer` is not installed, a runtime error will occur if it is trying to be used. ## Tests Due to the integration modifying the internal parameters of the underlying huggingface transformer LLM, it is not possible to test without building a real LM, which requires internet access. So, similar to the jsonformer and RELLM integrations, the testing is via the notebook. ## Twitter Handle [@noamgat](https://twitter.com/noamgat) Looking forward to hearing feedback! --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-10-31 09:49:01 -07:00
Erick Friis	7f6e751a3d	template updates (#12646 )	2023-10-31 09:13:58 -07:00
Predrag Gruevski	f94e24dfd7	Install and use `ruff format` instead of black for code formatting. (#12585 ) Best to review one commit at a time, since two of the commits are 100% autogenerated changes from running `ruff format`: - Install and use `ruff format` instead of black for code formatting. - Output of `ruff format .` in the `langchain` package. - Use `ruff format` in experimental package. - Format changes in experimental package by `ruff format`. - Manual formatting fixes to make `ruff .` pass.	2023-10-31 10:53:12 -04:00
William FH	bfd719f9d8	bind_functions convenience method (#12518 ) I always take 20-30 seconds to re-discover where the `convert_to_openai_function` wrapper lives in our codebase. Chat langchain [has no clue](https://smith.langchain.com/public/3989d687-18c7-4108-958e-96e88803da86/r) what to do either. There's the older `create_openai_fn_chain` , but we haven't been recommending it in LCEL. The example we show in the [cookbook](https://python.langchain.com/docs/expression_language/how_to/binding#attaching-openai-functions) is really verbose. General function calling should be as simple as possible to do, so this seems a bit more ergonomic to me (feel free to disagree). Another option would be to directly coerce directly in the class's init (or when calling invoke), if provided. I'm not 100% set against that. That approach may be too easy but not simple. This PR feels like a decent compromise between simple and easy. ``` from enum import Enum from typing import Optional from pydantic import BaseModel, Field class Category(str, Enum): """The category of the issue.""" bug = "bug" nit = "nit" improvement = "improvement" other = "other" class IssueClassification(BaseModel): """Classify an issue.""" category: Category other_description: Optional[str] = Field( description="If classified as 'other', the suggested other category" ) from langchain.chat_models import ChatOpenAI llm = ChatOpenAI().bind_functions([IssueClassification]) llm.invoke("This PR adds a convenience wrapper to the bind argument") # AIMessage(content='', additional_kwargs={'function_call': {'name': 'IssueClassification', 'arguments': '{\n "category": "improvement"\n}'}}) ```	2023-10-31 07:15:37 -07:00
Nuno Campos	3143324984	Improve Runnable type inference for input_schemas (#12630 ) - Prefer lambda type annotations over inferred dict schema - For sequences that start with RunnableAssign infer seq input type as "input type of 2nd item in sequence - output type of runnable assign"	2023-10-31 13:22:54 +00:00
Nuno Campos	2f563cee20	Add Runnable.with_listeners() (#12549 ) - This binds start/end/error listeners to a runnable, which will be called with the Run object	2023-10-31 11:04:51 +00:00

1 2 3 4 5 ...

1682 Commits