langchain

mirror of https://github.com/hwchase17/langchain synced 2024-11-04 06:00:26 +00:00

Author	SHA1	Message	Date
Bagatur	ff43cd6701	OpenAI remove httpx typing (#13154 ) Addresses #13124	2023-11-09 14:32:09 -08:00
Bagatur	8b2a82b5ce	Bagatur/docs smith context (#13139 )	2023-11-09 10:22:49 -08:00
Bagatur	f04cc4b7e1	bump 333 (#13131 )	2023-11-09 07:33:15 -08:00
billytrend-cohere	b346d4a455	Add message to documents (#12552 ) This adds the response message as a document to the rag retriever so users can choose to use this. Also drops document limit. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-11-09 07:30:48 -08:00
Harrison Chase	5f38770161	Support oai tool call (#13110 ) Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Nuno Campos <nuno@boringbits.io>	2023-11-09 07:29:29 -08:00
Holt Skinner	0fc8fd12bd	feat: Vertex AI Search - Add Snippet Retrieval for Non-Advanced Website Data Stores (#13020 ) https://cloud.google.com/generative-ai-app-builder/docs/snippets#snippets --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2023-11-08 21:52:50 -05:00
Jacob Lee	76283e9625	Adds embeddings filter option to return scores in state (#12489 ) CC @baskaryan @assafelovic	2023-11-08 17:50:06 -08:00
jakerachleff	18601bd4c8	Get project from langchain sdk (#13100 ) ## Description We need to centralize the API we use to get the project name for our tracers. This PR makes it so we always get this from a shared function in the langsmith sdk. ## Dependencies Upgraded langsmith from 0.52 to 0.62 to include the new API `get_tracer_project`	2023-11-08 17:10:12 -08:00
Bagatur	72e12f6bcf	update more azure docs (#13093 )	2023-11-08 14:11:16 -08:00
Bagatur	1703f132c6	update azure embedding docs (#13091 )	2023-11-08 13:39:31 -08:00
Bagatur	9fdfac22c2	bump 332 (#13089 )	2023-11-08 13:23:16 -08:00
Bagatur	1f85ec34d5	bump 331rc3 exp 39 (#13086 )	2023-11-08 13:00:13 -08:00
Anton Troynikov	9f077270c8	Don't pass EF to chroma (#13085 ) - Description: Recently Chroma rolled out a breaking change on the way we handle embedding functions, in order to support multi-modal collections. This broke the way LangChain's `Chroma` objects get created, because we were passing the EF down into the Chroma collection: https://docs.trychroma.com/migration#migration-to-0416---november-7-2023 However, internally, we are never actually using embeddings on the chroma collection - LangChain's `Chroma` object calls it instead. Thus we just don't pass an `embedding_function` to Chroma itself, which fixes the issue.	2023-11-08 12:55:35 -08:00
Erick Friis	f15f8e01cf	Azure OpenAI Embeddings (#13039 ) Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-11-08 12:37:17 -08:00
David Peterson	37561d8986	Add Proper Import Error (#13042 ) - Description: The issue was not listing the proper import error for amazon textract loader. - Issue: Time wasted trying to figure out what to install... (langchain docs don't list the dependency either) - Dependencies: N/A - Tag maintainer: @sbusso - Twitter handle: @h9ste --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-11-08 10:29:08 -08:00
Eugene Yurtsev	06c503f672	Add RunnableRetry Documentation (#13074 )	2023-11-08 18:20:18 +00:00
Bagatur	55aeff6777	oai assistant multiple actions (#13068 )	2023-11-08 08:25:37 -08:00
Erick Friis	a9b70baef9	cli updates, 0.0.16 (#13034 ) - confirm flags, serve detection - 0.0.16 - always gen code - pip bool	2023-11-08 07:47:30 -08:00
Erick Friis	506f81563f	Update Deps in Experimental (#13029 )	2023-11-07 15:15:09 -08:00
Stefano Lottini	4f4b020582	Add "Astra DB" vector store integration (#12966 ) # Astra DB Vector store integration - Description: This PR adds a `VectorStore` implementation for DataStax Astra DB using its HTTP API - Issue: (no related issue) - Dependencies: A new required dependency is `astrapy` (`>=0.5.3`) which was added to pyptoject.toml, optional, as per guidelines - Tag maintainer: I recently mentioned to @baskaryan this integration was coming - Twitter handle: `@rsprrs` if you want to mention me This PR introduces the `AstraDB` vector store class, extensive integration test coverage, a reworking of the documentation which conflates Cassandra and Astra DB on a single "provider" page and a new, completely reworked vector-store example notebook (common to the Cassandra store, since parts of the flow is shared by the two APIs). I also took care in ensuring docs (and redirects therein) are behaving correctly. All style, linting, typechecks and tests pass as far as the `AstraDB` integration is concerned. I could build the documentation and check it all right (but ran into trouble with the `api_docs_build` makefile target which I could not verify: `Error: Unable to import module 'plan_and_execute.agent_executor' with error: No module named 'langchain_experimental'` was the first of many similar errors) Thank you for a review! Stefano --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2023-11-07 14:45:33 -08:00
Yang, Bo	600caff03c	Add `Memorize` tool (#11722 ) - Description: Add `Memorize` tool - Tag maintainer: @hwchase17 This PR added a new tool `Memorize` so that an agent can use it to fine-tune itself. This tool requires `TrainableLLM` introduced in #11721 DEMO: `6a9003d5db` ![image](https://github.com/langchain-ai/langchain/assets/601530/d6f0cb45-54df-4dcf-b143-f8aefb1e76e3)	2023-11-07 12:42:10 -08:00
Bagatur	cf481c9418	bump exp 38 (#13016 )	2023-11-07 11:49:23 -08:00
Bagatur	57e19989f6	Bagatur/oai assistant (#13010 )	2023-11-07 11:44:53 -08:00
Erick Friis	74134dd7e1	cli pyproject updating (#12945 ) `langchain app add` and `langchain app remove` will now keep the dependencies list updated. --------- Co-authored-by: Nuno Campos <nuno@boringbits.io>	2023-11-07 11:06:08 -08:00
Bagatur	6175dc30aa	bump 331rc2 (#13006 )	2023-11-07 08:52:17 -08:00
Erick Friis	0c81cd923e	oai v1 embeddings (#12969 ) Initial PR to get OpenAIEmbeddings working with the new sdk fyi @rlancemartin Fixes #12943 --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-11-06 18:52:33 -08:00
Bagatur	fdbb45d79e	bump 331rc1 (#12965 )	2023-11-06 15:36:43 -08:00
Bagatur	3bb8030a6e	fix max_tokens (#12964 )	2023-11-06 15:36:05 -08:00
Bagatur	a9002a82b8	bump 331rc0 (#12963 )	2023-11-06 15:19:33 -08:00
Harrison Chase	c27400efeb	Support multimodal messages (#11320 ) Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-11-06 15:14:18 -08:00
Bagatur	4f7dff9d66	Record system fingerprint chat openai (#12960 )	2023-11-06 14:25:53 -08:00
Bagatur	8e0cb2eb84	ChatOpenAI and AzureChatOpenAI openai>=1 compatible (#12948 )	2023-11-06 13:24:18 -08:00
Kacper Łukawski	52d0055a91	Add support of Cohere Embed v3 (#12940 ) Cohere released the new embedding API (Embed v3: https://txt.cohere.com/introducing-embed-v3/) that treats document and query embeddings differently. This PR updated the `CohereEmbeddings` to use them appropriately. It also works with the old models.	2023-11-06 15:06:58 -05:00
Praveen Venkateswaran	8e0dcb37d2	Add SecretStr for Symbl.ai Nebula API (#12896 ) Description: This PR masks API key secrets for the Nebula model from Symbl.ai Issue: #12165 Maintainer: @eyurtsev --------- Co-authored-by: Praveen Venkateswaran <praveen.venkateswaran@ibm.com>	2023-11-06 14:13:59 -05:00
Vinzenz Klass	59d0bd2150	feat: acquire advisory lock before creating extension in pgvector (#12935 ) - Description: Acquire advisory lock before attempting to create extension on postgres server, preventing errors in concurrent executions. - Issue: #12933 - Dependencies: None --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2023-11-06 14:00:39 -05:00
Eugene Yurtsev	b376854b26	Fix for anyscale chat model api key (#12938 ) * ChatAnyscale was missing coercion to SecretStr for anyscale api key * The model inherits from ChatOpenAI so it should not force the openai api key to be secret str until openai model has the same changes https://github.com/langchain-ai/langchain/issues/12841	2023-11-06 13:28:02 -05:00
hmasdev	622bf12c2e	fix regex pattern of structured output parser (#12929 ) - Description: fix the regex pattern of [StructuredChatOutputParser](https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/agents/structured_chat/output_parser.py#L18) and add unit tests for the code change. - Issue: #12158 #12922 - Dependencies: None - Tag maintainer: - Twitter handle: @hmdev3 - NOTE: This PR conflicts #7495 . After #7495 is merged, I am going to update PR.	2023-11-06 07:53:14 -08:00
wemysschen	8d7144e6a6	fix baiducloud directory loader import file loader (#12924 ) Issue: fix baiducloud BOS directory loader imports its file loader --------- Co-authored-by: wemysschen <root@icoding-cwx.bcc-szzj.baidu.com>	2023-11-06 07:52:31 -08:00
Kacper Łukawski	621419f71e	Fix normalizing the cosine distance in Qdrant (#12934 ) Qdrant was incorrectly calculating the cosine similarity and returning `0.0` for the best match, instead of `1.0`. Internally Qdrant returns a cosine score from `-1.0` (worst match) to `1.0` (best match), and the current formula reflects it.	2023-11-06 07:36:59 -08:00
Hech	8fe6bcc662	Fix return metadata when searching for DingoDB (#12937 )	2023-11-06 07:35:36 -08:00
Jakub Novák	ada3d2cbd1	Add possibility to pass on_artifacts for a specific conversation (#12687 ) Possibility to pass on_artifacts to a conversation. It can be then achieved by adding this way: ```python result = agent.run( input=message.text, metadata={ "on_artifact": CALLBACK_FUNCTION }, ) ```	2023-11-06 07:29:47 -08:00
Bagatur	53f453f01a	bump 331 (#12932 )	2023-11-06 05:58:12 -08:00
Erick Friis	5000c7308e	cli template gitignores (#12914 ) - ap gitignore - package	2023-11-05 22:34:45 -08:00
Harrison Chase	aba407f774	use keys not items (#12918 )	2023-11-05 22:08:29 -08:00
wemysschen	e14aa37d59	fix bes vector store search (#12828 ) Issue: fix search body in baidu cloud vectorsearch --------- Co-authored-by: wemysschen <root@icoding-cwx.bcc-szzj.baidu.com>	2023-11-03 15:39:19 -07:00
Lance Martin	ea1ab391d4	Open Clip multimodal embeddings (#12754 )	2023-11-03 13:33:36 -07:00
Bagatur	ebee616822	bump 330 (#12853 )	2023-11-03 13:26:41 -07:00
Erick Friis	6c237716c4	Update readmes with new cli install (#12847 ) Old command still works. Just simplifying. Merge after releasing CLI 0.0.15	2023-11-03 12:10:32 -07:00
Erick Friis	7db49d3842	Confirm sys.path includes current dir for app serve (#12851 ) - Make sure sys.path is set properly for langchain app serve - bump	2023-11-03 11:37:20 -07:00
Erick Friis	1bc35f61cb	CLI 0.0.14, Uvicorn update and no more [serve] (#12845 ) Calls uvicorn directly from cli: Reload works if you define app by import string instead of object. (was doing subprocess in order to get reloading) Version bump to 0.0.14 Remove the need for [serve] for simplicity. Readmes are updated in #12847 to avoid cluttering this PR	2023-11-03 11:05:52 -07:00
William FH	18005c6384	Disable trace_on_chain_group auto-tracing (#12807 ) Previously we treated trace_on_chain_group as a command to always start tracing. This is unintuitive (makes the function do 2 things), and makes it harder to toggle tracing	2023-11-03 10:05:09 -07:00
Erick Friis	0da75b9ebd	Autopopulate module name in cli init (#12814 )	2023-11-02 23:45:38 -07:00
William FH	98aff29fbd	Add Dataset Page to printout (#12816 )	2023-11-02 20:36:56 -07:00
Manuel Rech	2e2b9c76d9	Keep also original query - multi_query.py (#12696 ) When you use a MultiQuery it might be useful to use the original query as well as the newly generated ones to maximise the changes to retriever the correct document. I haven't created an issue, it seems a very small and easy thing. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-11-02 18:15:02 -07:00
Bagatur	658a3a8607	FEAT: Merge TileDB vecstore (#12811 )	2023-11-02 17:40:32 -07:00
Akio Nishimura	c04647bb4e	Correct number of elements in config list in `batch()` and `abatch()` of `BaseLLM` (#12713 ) - Description: Correct number of elements in config list in `batch()` and `abatch()` of `BaseLLM` in case `max_concurrency` is not None. - Issue: #12643 - Twitter handle: @akionux --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-11-02 17:28:48 -07:00
James Braza	88b506b321	Adds missing `urllib.parse` for IDE warning of `PubMedAPIWrapper` (#12808 ) Resolves an IDE (PyCharm 2023.2.3 PE) warning around `urllib.parse.quote`, also enabling CTRL-click	2023-11-02 17:27:25 -07:00
Bagatur	a2bb0dd445	TileDB update import unit tests	2023-11-02 17:24:22 -07:00
Nikos Papailiou	2fdaa1e5fd	Add TileDB vectorstore implementation (#12624 ) - Description: Add [TileDB](https://tiledb.com) vectorstore implementation. TileDB offers ANN search capabilities using the [TileDB-Vector-Search](https://github.com/TileDB-Inc/TileDB-Vector-Search) module. It provides serverless execution of ANN queries and storage of vector indexes both on local disk and cloud object stores (i.e. AWS S3). More details in: - [Why TileDB as a Vector Database](https://tiledb.com/blog/why-tiledb-as-a-vector-database) - [TileDB 101: Vector Search](https://tiledb.com/blog/tiledb-101-vector-search) - Twitter handle: @tiledb	2023-11-02 17:21:03 -07:00
盐粒 Yanli	1b233798a0	feat: Supprt pgvecto.rs as a VectorStore (#12718 ) Supprt [pgvecto.rs](https://github.com/tensorchord/pgvecto.rs) as a new VectorStore type. This introduces a new dependency [pgvecto_rs](https://pypi.org/project/pgvecto_rs/) and upgrade SQLAlchemy to ^2. Relate to https://github.com/tensorchord/pgvecto.rs/issues/11 --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-11-02 17:16:04 -07:00
Daniel Chalef	0cbdba6a9b	zep: VectorStore: Use Native MMR (#12690 ) - refactor to use Zep's native MMR; update example - @baskaryan @eyurtsev	2023-11-02 16:45:42 -07:00
Daniel Chalef	cc3d3920e3	Zep: Summary Search and Example (#12686 ) Zep now has the ability to search over chat history summaries. This PR adds support for doing so. More here: https://blog.getzep.com/zep-v0-17/ @baskaryan @eyurtsev	2023-11-02 16:31:11 -07:00
Bagatur	526313002c	add import tests to all modules (#12806 )	2023-11-02 15:32:55 -07:00
Harrison Chase	6609a6033f	fix vectorstore imports (#12804 ) Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-11-02 15:32:31 -07:00
Nuno Campos	f66a9d2adf	Automatically add configurable key to config_schema if config_specs i… (#12798 ) …s present <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/extras` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. -->	2023-11-02 21:46:15 +00:00
Praveen Venkateswaran	21eeba075c	enable the device_map parameter in huggingface pipeline (#12731 ) ### Enabling `device_map` in HuggingFacePipeline For multi-gpu settings with large models, the [accelerate](https://huggingface.co/docs/accelerate/usage_guides/big_modeling#using--accelerate) library provides the `device_map` parameter to automatically distribute the model across GPUs / disk. The [Transformers pipeline](`3520e37e86/src/transformers/pipelines/__init__.py (L543)`) enables users to specify `device` (or) `device_map`, and handles cases (with warnings) when both are specified. However, Langchain's HuggingFacePipeline only supports specifying `device` when calling transformers which limits large models and multi-gpu use-cases. Additionally, the [default value](`8bd3ce59cd/libs/langchain/langchain/llms/huggingface_pipeline.py (L72)`) of `device` is initialized to `-1` , which is incompatible with the transformers pipeline when `device_map` is specified. This PR addresses the addition of `device_map` as a parameter , and solves the incompatibility of `device = -1` when `device_map` is also specified. An additional test has been added for this feature. Additionally, some existing tests no longer work since 1. `max_new_tokens` has to be specified under `pipeline_kwargs` and not `model_kwargs` 2. The GPT2 tokenizer raises a `ValueError: Pipeline with tokenizer without pad_token cannot do batching`, since the `tokenizer.pad_token` is `None` ([related issue](https://github.com/huggingface/transformers/issues/19853) on the transformers repo). This PR handles fixing these tests as well. Co-authored-by: Praveen Venkateswaran <praveen.venkateswaran@ibm.com>	2023-11-02 14:29:06 -07:00
Mark Bell	3276aa3e17	__getattr__ should rase AttributeError not ImportError on missing attributes (#12801 ) [The python spec](https://docs.python.org/3/reference/datamodel.html#object.__getattr__) requires that `__getattr__` throw `AttributeError` for missing attributes but there are several places throwing `ImportError` in the current code base. This causes a specific problem with `hasattr` since it calls `__getattr__` then looks only for `AttributeError` exceptions. At present, calling `hasattr` on any of these modules will raise an unexpected exception that most code will not handle as `hasattr` throwing exceptions is not expected. In our case this is triggered by an exception tracker (Airbrake) that attempts to collect the version of all installed modules with code that looks like: `if hasattr(mod, "__version__"):`. With `HEAD` this is causing our exception tracker to fail on all exceptions. I only changed instances of unknown attributes raising `ImportError` and left instances of known attributes raising `ImportError`. It feels a little weird but doesn't seem to break anything.	2023-11-02 17:08:54 -04:00
Illia	71d1a48b66	Use data from all Google search results in SerpApi.com wrapper (#12770 ) - Description: Use all Google search results data in SerpApi.com wrapper instead of the first one only - Tag maintainer: @hwchase17 _P.S. `libs/langchain/tests/integration_tests/utilities/test_serpapi.py` are not executed during the `make test`._	2023-11-02 13:31:27 -07:00
Nuno Campos	c4fdf78d03	Fix AddableDict raising exception when used with non-addable values (#12785 ) <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/extras` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. -->	2023-11-02 18:56:29 +00:00
Erick Friis	49e283a0cd	CLI 0.0.13, Configurable Template Demo (#12796 )	2023-11-02 11:42:57 -07:00
Nuno Campos	d1c6ad7769	Fix on_llm_new_token(chunk=) for some chat models (#12784 ) It was passing in message instead of generation <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/extras` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. -->	2023-11-02 16:33:44 +00:00
Erick Friis	070823f294	CLI 0.0.12 (#12787 )	2023-11-02 08:29:27 -07:00
Bagatur	979501c0ca	bump 329 (#12778 )	2023-11-02 06:02:43 -07:00
Erick Friis	da821320d3	Fixes 'Nonetype' not iterable for ObsidianLoader (#12751 ) Implements #12726 from @Di3mex	2023-11-01 16:07:09 -07:00
Eugene Yurtsev	b1caae62fd	APIChain add restrictions to domains (CVE-2023-32786) (#12747 ) * Restrict the chain to specific domains by default * This is a breaking change, but it will fail loudly upon object instantiation -- so there should be no silent errors for users * Resolves CVE-2023-32786	2023-11-01 18:50:34 -04:00
Erick Friis	4421ba46d7	Demo Server, Fix Timescale (#12746 ) - improve demo server - missing deps	2023-11-01 15:29:34 -07:00
Eugene Yurtsev	0e1aedb9f4	Use jinja2 sandboxing by default (#12733 ) * This is an opt-in feature, so users should be aware of risks if using jinja2. * Regardless we'll add sandboxing by default to jinja2 templates -- this sandboxing is a best effort basis. * Best strategy is still to make sure that jinja2 templates are only loaded from trusted sources.	2023-11-01 14:54:01 -07:00
Erick Friis	14340ee7cd	use http.client instead of urllib3 (#12660 ) dep problems with requests cloudflare debugging not worth it with urllib	2023-11-01 11:15:05 -07:00
Bagatur	eee5181b7a	bump 328, exp 37 (#12722 )	2023-11-01 10:27:39 -07:00
Erick Friis	3405dbbc64	dash not underscore (#12716 ) template names are auto-populating with the wrong convention (with underscores)	2023-11-01 09:48:37 -07:00
123-fake-st	8bd3ce59cd	PyPDFLoader use url in metadata source if file is a web path (#12092 ) Description: Update `langchain.document_loaders.pdf.PyPDFLoader` to store url in metadata (instead of a temporary file path) if user provides a web path to a pdf - Issue: Related to #7034; the reporter on that issue submitted a PR updating `PyMuPDFParser` for this behavior, but it has unresolved merge issues as of 20 Oct 2023 #7077 - In addition to `PyPDFLoader` and `PyMuPDFParser`, these other classes in `langchain.document_loaders.pdf` exhibit similar behavior and could benefit from an update: `PyPDFium2Loader`, `PDFMinerLoader`, `PDFMinerPDFasHTMLLoader`, `PDFPlumberLoader` (I'm happy to contribute to some/all of that, including assisting with `PyMuPDFParser`, if my work is agreeable) - The root cause is that the underlying pdf parser classes, e.g. `langchain.document_loaders.parsers.pdf.PyPDFParser`, never receive information about the url; the parsers receive a `langchain.document_loaders.blob_loaders.blob`, which contains the pdf contents and local file path, but not the url - This update passes the web path directly to the parser since it's minimally invasive and doesn't require further changes to maintain existing behavior for local files... bigger picture, I'd consider extending `blob` so that extra information like this can be communicated, but that has much bigger implications on the codebase which I think warrants maintainer input - Dependencies: None ```python # old behavior >>> from langchain.document_loaders import PyPDFLoader >>> loader = PyPDFLoader('https://arxiv.org/pdf/1706.03762.pdf') >>> docs = loader.load() >>> docs[0].metadata {'source': '/var/folders/w2/zx77z1cs01s1thx5dhshkd58h3jtrv/T/tmpfgrorsi5/tmp.pdf', 'page': 0} # new behavior >>> from langchain.document_loaders import PyPDFLoader >>> loader = PyPDFLoader('https://arxiv.org/pdf/1706.03762.pdf') >>> docs = loader.load() >>> docs[0].metadata {'source': 'https://arxiv.org/pdf/1706.03762.pdf', 'page': 0} ```	2023-11-01 11:27:00 -04:00
Dave Kwon	b1954aab13	feat: Add page metadata on PDFMinerLoader (#12277 ) - Description: #12273 's suggestion PR Like other PDFLoader, loading pdf per each page and giving page metadata. - Issue: #12273 - Twitter handle: @blue0_0hope --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2023-11-01 11:25:37 -04:00
Duda Nogueira	7148f3e1fe	Weaviate - Fix schema existence check (#12711 ) This will allow you create the schema beforehand. The check was failing and preventing importing into existing classes. <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/extras` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. -->	2023-11-01 08:22:15 -07:00
Aidos Kanapyanov	ae63c186af	Mask API key for Anyscale LLM (#12406 ) Description: Add masking of API Key for Anyscale LLM when printed. Issue: #12165 Dependencies: None Tag maintainer: @eyurtsev --------- Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2023-11-01 10:22:26 -04:00
Predrag Gruevski	5ae51a8a85	Fix typo highlighted by `ruff` autoformatter. (#12691 ) H/t @MichaReiser for spotting it: https://github.com/langchain-ai/langchain/pull/12585/files#r1378253045	2023-10-31 22:16:06 -04:00
Erick Friis	44c8b159b9	properly increment version in cli (#12685 ) Went from 0.0.9 -> 0.0.11 without releasing. Back to 10, then release.	2023-10-31 17:27:43 -07:00
Leonid Ganeline	ddcec005bc	fix for `YahooFinanceNewsTool` (#12665 ) Added YahooFinanceNewsTool to the __init__.py It was missed here.	2023-10-31 14:58:09 -07:00
Predrag Gruevski	01a3c9b94e	Use an in-project virtualenv in the CLI package. (#12678 ) Keeping it in sync with how our other packages are configured.	2023-10-31 14:51:24 -07:00
Jacob Lee	bd668fcea1	Adds version CLI command (#12619 ) Will be automatically bumped with `poetry version patch`. @efriis @hwchase17 --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2023-10-31 14:50:04 -07:00
Frank	bf5805bb32	Add quip loader (#12259 ) - Description: implement [quip](https://quip.com) loader - Issue: https://github.com/langchain-ai/langchain/issues/10352 - Dependencies: No - pass make format, make lint, make test --------- Co-authored-by: Hao Fan <h_fan@apple.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-10-31 14:11:24 -07:00
Roman Vasilyev	c9a6940d58	PGVector fix (#12592 ) latest release broken, this fixes it --------- Co-authored-by: Roman Vasilyev <rvasilyev@mozilla.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2023-10-31 17:01:15 -04:00
Predrag Gruevski	e8b99364b3	Use `ruff` for both linting and formatting in `langchain-cli`. (#12672 ) Prior to this PR, `ruff` was used only for linting and not for formatting, despite the names of the commands. This PR makes it be used for both linting code and autoformatting it.	2023-10-31 13:52:25 -07:00
Margaret Qian	acfc485808	Update MosaicML Embedding Input Key (#12657 ) This input key was missed in the last update PR: https://github.com/langchain-ai/langchain/pull/7391 The input/output formats are intended to be like this: ``` {"inputs": [<prompt>]} {"outputs": [<output_text>]} ```	2023-10-31 14:43:30 -04:00
Predrag Gruevski	c871cc5055	Remove `print()` statements which seemed leftover from debugging. (#12648 ) Added in #12159 presumably during debugging. Right now they cause a bit of visual noise.	2023-10-31 13:45:48 -04:00
Noam Gat	14e8c74736	LM Format Enforcer Integration + Sample Notebook (#12625 ) ## Description This PR adds support for [lm-format-enforcer](https://github.com/noamgat/lm-format-enforcer) to LangChain. ![image](https://raw.githubusercontent.com/noamgat/lm-format-enforcer/main/docs/Intro.webp) The library is similar to jsonformer / RELLM which are supported in Langchain, but has several advantages such as - Batching and Beam search support - More complete JSON Schema support - LLM has control over whitespace, improving quality - Better runtime performance due to only calling the LLM's generate() function once per generate() call. The integration is loosely based on the jsonformer integration in terms of project structure. ## Dependencies No compile-time dependency was added, but if `lm-format-enforcer` is not installed, a runtime error will occur if it is trying to be used. ## Tests Due to the integration modifying the internal parameters of the underlying huggingface transformer LLM, it is not possible to test without building a real LM, which requires internet access. So, similar to the jsonformer and RELLM integrations, the testing is via the notebook. ## Twitter Handle [@noamgat](https://twitter.com/noamgat) Looking forward to hearing feedback! --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-10-31 09:49:01 -07:00
Erick Friis	7f6e751a3d	template updates (#12646 )	2023-10-31 09:13:58 -07:00
Predrag Gruevski	f94e24dfd7	Install and use `ruff format` instead of black for code formatting. (#12585 ) Best to review one commit at a time, since two of the commits are 100% autogenerated changes from running `ruff format`: - Install and use `ruff format` instead of black for code formatting. - Output of `ruff format .` in the `langchain` package. - Use `ruff format` in experimental package. - Format changes in experimental package by `ruff format`. - Manual formatting fixes to make `ruff .` pass.	2023-10-31 10:53:12 -04:00
William FH	bfd719f9d8	bind_functions convenience method (#12518 ) I always take 20-30 seconds to re-discover where the `convert_to_openai_function` wrapper lives in our codebase. Chat langchain [has no clue](https://smith.langchain.com/public/3989d687-18c7-4108-958e-96e88803da86/r) what to do either. There's the older `create_openai_fn_chain` , but we haven't been recommending it in LCEL. The example we show in the [cookbook](https://python.langchain.com/docs/expression_language/how_to/binding#attaching-openai-functions) is really verbose. General function calling should be as simple as possible to do, so this seems a bit more ergonomic to me (feel free to disagree). Another option would be to directly coerce directly in the class's init (or when calling invoke), if provided. I'm not 100% set against that. That approach may be too easy but not simple. This PR feels like a decent compromise between simple and easy. ``` from enum import Enum from typing import Optional from pydantic import BaseModel, Field class Category(str, Enum): """The category of the issue.""" bug = "bug" nit = "nit" improvement = "improvement" other = "other" class IssueClassification(BaseModel): """Classify an issue.""" category: Category other_description: Optional[str] = Field( description="If classified as 'other', the suggested other category" ) from langchain.chat_models import ChatOpenAI llm = ChatOpenAI().bind_functions([IssueClassification]) llm.invoke("This PR adds a convenience wrapper to the bind argument") # AIMessage(content='', additional_kwargs={'function_call': {'name': 'IssueClassification', 'arguments': '{\n "category": "improvement"\n}'}}) ```	2023-10-31 07:15:37 -07:00
Nuno Campos	3143324984	Improve Runnable type inference for input_schemas (#12630 ) - Prefer lambda type annotations over inferred dict schema - For sequences that start with RunnableAssign infer seq input type as "input type of 2nd item in sequence - output type of runnable assign"	2023-10-31 13:22:54 +00:00
Nuno Campos	2f563cee20	Add Runnable.with_listeners() (#12549 ) - This binds start/end/error listeners to a runnable, which will be called with the Run object	2023-10-31 11:04:51 +00:00

1 2 3 4 5 ...

1782 Commits