langchain

mirror of https://github.com/hwchase17/langchain synced 2024-10-31 15:20:26 +00:00

Author	SHA1	Message	Date
Christophe Bornet	e6fa4547b1	community[minor]: Add alazy_load to AsyncHtmlLoader (#21536 ) Also fixes a bug that `_scrape` was called and was doing a second HTTP request synchronously. Twitter handle: cbornet_	2024-05-13 12:01:03 -04:00
Leonid Ganeline	4c48732f94	docs: `providers` updates 1 (#20256 ) - Proviers pages: added missed integrations; fixed format - `mistralai` converted from notebook to .mdx format	2024-05-13 11:54:51 -04:00
ccurme	15cb1133e7	docs: fix path for state_of_the_union sample file (#21609 )	2024-05-13 11:46:02 -04:00
Bagatur	83a8fdcfd1	infra: fix local doc make command (#21608 )	2024-05-13 08:30:30 -07:00
Eugene Yurtsev	4dc625057e	README: Update downloads to show downloads of langchain-core (#21387 ) Update downloads to keep track of langchain-core	2024-05-13 11:26:50 -04:00
Wang Guan	b53548dcda	langchain[minor]: allow CacheBackedEmbeddings to cache queries (#20073 ) Add optional caching of queries to cache backed embeddings	2024-05-13 15:18:04 +00:00
Guangdong Liu	a156aace2b	core[patch]:Fix Incorrect listeners parameters for Runnable.with_listeners() and .map() (#20661 ) - Issue: fix #20509 - @baskaryan, @eyurtsev ![image](https://github.com/langchain-ai/langchain/assets/48236177/f799a976-b983-4d8b-b373-64392e1fd6c6)	2024-05-13 11:16:17 -04:00
ccurme	b0f5a47f25	docs: update some retrievers how-to guides (#21607 )	2024-05-13 11:03:33 -04:00
junkeon	480c02bf55	upstage[minor]: add merge_and_split function for document loader (#21603 ) - Introduce the `merge_and_split` function in the `UpstageLayoutAnalysisLoader`. - The `merge_and_split` function takes a list of documents and a splitter as inputs. - This function merges all documents and then divides them using the `split_documents` method, which is a proprietary function of the splitter. - If the provided splitter is `None` (which is the default setting), the function will simply merge the documents without splitting them.	2024-05-13 10:55:19 -04:00
Leonid Ganeline	500569da48	community[patch]: `vectorstores` import update (#21169 ) Issue: we have several helper functions to import third-party libraries like lancedb.import_lancedb in [community.vectorstores](https://api.python.langchain.com/en/latest/vectorstores/langchain_community.vectorstores.lancedb.import_lancedb.html#langchain_community.vectorstores.lancedb.import_lancedb). And we have core.utils.utils.guard_import that works exactly for this purpose. The import_<package> functions work inconsistently and rather be private functions. Change: replaced these functions with the guard_import function. Related to #21133	2024-05-13 10:45:31 -04:00
ccurme	3003363605	langchain, community: remove cap on sqlalchemy and bump duckdb (#21509 )	2024-05-13 10:16:09 -04:00
ccurme	01a3228d8e	standard tests: add test for few-shot examples (#21019 )	2024-05-13 10:06:12 -04:00
David Duong	db22fcb58b	docs: style fixes for api reference docs (#21602 ) - Make sure the left nav bar is horizontally scrollable - Make sure the navigation dropdown is vertically scrollable and height capped at 80% of viewport height --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-05-13 06:49:50 -07:00
Chuyuan Qu	af875cff57	prompty: adding Microsoft langchain_prompty package (#21346 ) Co-authored-by: Micky Liu <wayliu@microsoft.com> Co-authored-by: wayliums <wayliums@users.noreply.github.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-05-11 04:03:44 +00:00
Erick Friis	56c6b5868b	infra: run codespell on v0.1 prs (#21545 )	2024-05-10 12:51:42 -07:00
Matt Florence	d3ca2cc8c3	langchain: Fix broken `OpenAIModerationChain` and implement async (#18537 ) Thank you for contributing to LangChain! ## PR title lancghain[patch]: fix `OpenAIModerationChain` and implement async ## PR message Description: fix `OpenAIModerationChain` and implement async Issues: - https://github.com/langchain-ai/langchain/issues/18533 - https://github.com/langchain-ai/langchain/issues/13685 Dependencies: none Twitter handle: mattflo ## Add tests and docs Existing documentation is broken: https://python.langchain.com/docs/guides/safety/moderation - [ x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ --------- Co-authored-by: Emilia Katari <emilia@outpace.com> Co-authored-by: ccurme <chester.curme@gmail.com> Co-authored-by: Erick Friis <erickfriis@gmail.com>	2024-05-10 19:04:13 +00:00
ccurme	4170e72a42	openai: fix loads unit test (#21542 ) following changes to tests in core here: https://github.com/langchain-ai/langchain/pull/21342/files	2024-05-10 18:46:34 +00:00
ccurme	d3ff9c5d6a	infra: turn off fail-fast for standard tests (#21541 )	2024-05-10 18:28:57 +00:00
Erick Friis	e8efe8384d	docs: announcement bar dark mode 0.2 (#21540 )	2024-05-10 10:13:02 -07:00
Erick Friis	64c47224a0	docs: baseUrl for ganalytics, throw on broken links (#21455 )	2024-05-10 13:49:59 +00:00
Usama Jamil	913792f5e6	docs: myscale code typo (#21522 ) Thank you for contributing to LangChain! - [ ] PR title: "package: description" - Where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [ ] PR message: *Delete this entire checklist* and replace with - Description: a description of the change - Issue: the issue # it fixes, if applicable - Dependencies: any dependencies required for this change - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [ ] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [ ] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17.	2024-05-10 13:33:22 +00:00
Sevin F. Varoglu	85cbc55f86	docs: update OctoAI LLM doc (#21528 ) This PR updates OctoAI doc to remove warnings when running the example code.	2024-05-10 09:31:16 -04:00
Daniel Glogowski	70a79f45d7	docs: update nvidia nbs (#21498 )	2024-05-10 04:38:35 -04:00
Eugene Yurtsev	39e9b644b9	docs: Add langchain over time (#21434 ) Co-authored-by: Erick Friis <erick@langchain.dev>	2024-05-10 00:34:35 +00:00
Erick Friis	3db85cbb5b	community: deps (#21508 )	2024-05-09 15:12:34 -07:00
ccurme	9c2828aaa8	docs: add local LLMs page to v0.2 docs (#21493 ) Adding this page from v0.1 docs: https://python.langchain.com/v0.1/docs/guides/development/local_llms/	2024-05-09 17:57:56 -04:00
Erick Friis	8580e350be	cli: release 0.0.22 (#21507 )	2024-05-09 21:45:20 +00:00
Anthony Chu	c735849e76	azure-dynamic-sessions: add Python REPL tool (#21264 ) Adds a Python REPL that executes code in a code interpreter session using Azure Container Apps dynamic sessions. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-05-09 21:39:04 +00:00
Erick Friis	02701c277f	langchain: core min version (#21506 )	2024-05-09 13:45:44 -07:00
ccurme	81ae184cc9	docs: add response metadata page to v0.2 docs (#21489 ) Adding this page from v0.1 docs: https://python.langchain.com/v0.1/docs/modules/model_io/chat/response_metadata/	2024-05-09 16:17:04 -04:00
Erick Friis	13b01104c9	langchain: drop sqlalchemy max, release 0.2.0rc2 (#21504 )	2024-05-09 13:12:38 -07:00
ccurme	375f447e58	community: fix builds with min dependencies (#21495 )	2024-05-09 13:01:44 -07:00
Erick Friis	2be4b1b2c9	Revert "docs: redirect base slug" (#21499 ) Reverts langchain-ai/langchain#21457	2024-05-09 12:20:16 -07:00
Erick Friis	d1fc841b1a	docs: redirect base slug (#21457 )	2024-05-09 10:52:36 -07:00
Trayan Azarov	ba7d53689c	community: Chroma Adding create_collection_if_not_exists flag to Chroma constructor (#21420 ) - Description: Adds the ability to either `get_or_create` or simply `get_collection`. This is useful when dealing with read-only Chroma instances where users are constraint to using `get_collection`. Targeted at Http/CloudClients mostly. - Issue: chroma-core/chroma#2163 - Dependencies: N/A - Twitter handle: `@t_azarov` \| Collection Exists \| create_collection_if_not_exists \| Outcome \| test \| \|-------------------\|---------------------------------\|----------------------------------------------------------------\|----------------------------------------------------------\| \| True \| False \| No errors, collection state unchanged \| `test_create_collection_if_not_exist_false_existing` \| \| True \| True \| No errors, collection state unchanged \| `test_create_collection_if_not_exist_true_existing` \| \| False \| False \| Error, `get_collection()` fails \| `test_create_collection_if_not_exist_false_non_existing` \| \| False \| True \| No errors, `get_or_create_collection()` creates the collection \| `test_create_collection_if_not_exist_true_non_existing` \|	2024-05-09 11:45:10 -04:00
ccurme	3bb9bec314	bedrock: add unit test for retriever (#21485 ) This was implemented in https://github.com/langchain-ai/langchain/pull/21349 but dropped before merge.	2024-05-09 11:37:03 -04:00
Renu Rozera	4035a1d234	Add source metadata to bedrock retriever response (#21349 ) Thank you for contributing to LangChain! - [X] PR title: "community: Add source metadata to bedrock retriever response" - [X] PR message: - Description: Bedrock retrieve API returns extra metadata in the response which is currently not returned in the retriever response - Issue: The change adds the metadata from bedrock retrieve API response to the bedrock retriever in a backward compatible way. Renamed metadata to sourceMetadata as metadata term is being used in the Document already. This is in sync with what we are doing in llama-index as well. - Dependencies: No - [X] Add tests and docs: 1. Added unit tests 2. Notebook already exists and does not need any change 3. Response from end to end testing, just to ensure backward compatibility: `[Document(page_content='Exoplanets.', metadata={'location': {'s3Location': {'uri': 's3://bucket/file_name.txt'}, 'type': 'S3'}, 'score': 0.46886647, 'source_metadata': {'x-amz-bedrock-kb-source-uri': 's3://bucket/file_name.txt', 'tag': 'space', 'team': 'Nasa', 'year': 1946.0}})]` - [X] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. --------- Co-authored-by: Piyush Jain <piyushjain@duck.com>	2024-05-09 11:06:22 -04:00
ccurme	9fa17bfabe	docs; fix links in v0.2.0 (#21483 )	2024-05-09 11:05:17 -04:00
Erick Friis	f178c67ad0	community: release 0.2.0rc1, bump deps (#21470 )	2024-05-08 23:32:44 -07:00
William FH	b28be5d407	Pass through Run ID Explicitly (#21469 )	2024-05-08 22:20:51 -07:00
Erick Friis	83eecd54fe	experimental: 0.2 relax (#21468 )	2024-05-08 21:39:42 -07:00
roiperlman	9992beaff9	community: Add arguments to whisper parser (#20378 ) Description: Added a few additional arguments to the whisper parser, which can be consumed by the underlying API. The prompt is especially important to fine-tune transcriptions. --------- Co-authored-by: Roi Perlman <roi@fivesigmalabs.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-05-08 17:53:13 -07:00
Erick Friis	5542eacad8	docs: sidebar autogen hidden support (#21454 )	2024-05-09 00:23:52 +00:00
Yash	cb31c3611f	Ndb enterprise (#21233 ) Description: Adds NeuralDBClientVectorStore to the langchain, which is our enterprise client. --------- Co-authored-by: kartikTAI <129414343+kartikTAI@users.noreply.github.com> Co-authored-by: Kartik Sarangmath <kartik@thirdai.com>	2024-05-08 16:30:58 -07:00
Erick Friis	74044e44a5	docs: useBaseUrl on svg paths (#21446 )	2024-05-08 21:55:42 +00:00
Oguz Vuruskaner	5b35f077f9	[community][fix](DeepInfraEmbeddings): Implement chunking for large batches (#21189 ) Description: This PR introduces chunking logic to the `DeepInfraEmbeddings` class to handle large batch sizes without exceeding maximum batch size of the backend. This enhancement ensures that embedding generation processes large batches by breaking them down into smaller, manageable chunks, each conforming to the maximum batch size limit. Issue: Fixes #21189 Dependencies: No new dependencies introduced.	2024-05-08 14:45:42 -07:00
Sokolov Fedor	f4ddf64faa	community: Add MarkdownifyTransformer to langchain_community.document_transformers (#21247 ) - Added new document_transformer: MarkdonifyTransformer, that uses `markdonify` package with customizable options to convert HTML to Markdown. It's similar to Html2TextTransformer, but has more flexible options and also I've noticed that sometimes MarkdownifyTransformer performs better than html2text one, so that's why I use markdownify on my project. - Added docs and tests - Usage: ```python from langchain_community.document_transformers import MarkdownifyTransformer markdownify = MarkdownifyTransformer() docs_transform = markdownify.transform_documents(docs) ``` - Example of better performance on simple task, that I've noticed: ``` <html> <head><title>Reports on product movement</title></head> <body> <p data-block-key="2wst7">The reports on product movement will be useful for forming supplier orders and controlling outcomes.</p> </body> ``` Html2TextTransformer: ```python [Document(page_content='The reports on product movement will be useful for forming supplier orders and\ncontrolling outcomes.\n\n')] # Here we can see 'and\ncontrolling', which has extra '\n' in it ``` MarkdownifyTranformer: ```python [Document(page_content='Reports on product movement\n\nThe reports on product movement will be useful for forming supplier orders and controlling outcomes.')] ``` --------- Co-authored-by: Sokolov Fedor <f.sokolov@sokolov-macbook.bbrouter> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> Co-authored-by: Sokolov Fedor <f.sokolov@sokolov-macbook.local> Co-authored-by: Sokolov Fedor <f.sokolov@192.168.1.6>	2024-05-08 14:45:13 -07:00
Alex JW	d3ce6aad2e	community: Instantiate GPT4AllEmbeddings with parameters (#21238 ) ### GPT4AllEmbeddings parameters --- Description: As of right now the Embed4All class inside _GPT4AllEmbeddings_ is instantiated as it's default which leaves no room to customize the chosen model and it's behavior. Thus: - GPT4AllEmbeddings can now be instantiated with custom parameters like a different model that shall be used. --------- Co-authored-by: AlexJauchWalser <alexander.jauch-walser@knime.com>	2024-05-08 14:44:47 -07:00
Philippe PRADOS	7be68228da	community[patch]: Make sql record manager fully compatible with async (#20735 ) The `_amake_session()` method does not allow modifying the `self.session_factory` with anything other than `async_sessionmaker`. This prohibits advanced uses of `index()`. In a RAG architecture, it is necessary to import document chunks. To keep track of the links between chunks and documents, we can use the `index()` API. This API proposes to use an SQL-type record manager. In a classic use case, using `SQLRecordManager` and a vector database, it is impossible to guarantee the consistency of the import. Indeed, if a crash occurs during the import (problem with the network, ...) there is an inconsistency between the SQL database and the vector database. With the [PR](https://github.com/langchain-ai/langchain-postgres/pull/32) we are proposing for `langchain-postgres`, it is now possible to guarantee the consistency of the import of chunks into a vector database. It's possible only if the outer session is built with the connection. ```python def main(): db_url = "postgresql+psycopg://postgres:password_postgres@localhost:5432/" engine = create_engine(db_url, echo=True) embeddings = FakeEmbeddings() pgvector:VectorStore = PGVector( embeddings=embeddings, connection=engine, ) record_manager = SQLRecordManager( namespace="namespace", engine=engine, ) record_manager.create_schema() with engine.connect() as connection: session_maker = scoped_session(sessionmaker(bind=connection)) # NOTE: Update session_factories record_manager.session_factory = session_maker pgvector.session_maker = session_maker with connection.begin(): loader = CSVLoader( "data/faq/faq.csv", source_column="source", autodetect_encoding=True, ) result = index( source_id_key="source", docs_source=loader.load()[:1], cleanup="incremental", vector_store=pgvector, record_manager=record_manager, ) print(result) ``` The same thing is possible asynchronously, but a bug in `sql_record_manager.py` in `_amake_session()` must first be fixed. ```python async def _amake_session(self) -> AsyncGenerator[AsyncSession, None]: """Create a session and close it after use.""" # FIXME: REMOVE if not isinstance(self.session_factory, async_sessionmaker):~~ if not isinstance(self.engine, AsyncEngine): raise AssertionError("This method is not supported for sync engines.") async with self.session_factory() as session: yield session ``` Then, it is possible to do the same thing asynchronously: ```python async def main(): db_url = "postgresql+psycopg://postgres:password_postgres@localhost:5432/" engine = create_async_engine(db_url, echo=True) embeddings = FakeEmbeddings() pgvector:VectorStore = PGVector( embeddings=embeddings, connection=engine, ) record_manager = SQLRecordManager( namespace="namespace", engine=engine, async_mode=True, ) await record_manager.acreate_schema() async with engine.connect() as connection: session_maker = async_scoped_session( async_sessionmaker(bind=connection), scopefunc=current_task) record_manager.session_factory = session_maker pgvector.session_maker = session_maker async with connection.begin(): loader = CSVLoader( "data/faq/faq.csv", source_column="source", autodetect_encoding=True, ) result = await aindex( source_id_key="source", docs_source=loader.load()[:1], cleanup="incremental", vector_store=pgvector, record_manager=record_manager, ) print(result) asyncio.run(main()) ``` --------- Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Sean <sean@upstage.ai> Co-authored-by: JuHyung-Son <sonju0427@gmail.com> Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: YISH <mokeyish@hotmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Jason_Chen <820542443@qq.com> Co-authored-by: Joan Fontanals <joan.fontanals.martinez@jina.ai> Co-authored-by: Pavlo Paliychuk <pavlo.paliychuk.ca@gmail.com> Co-authored-by: fzowl <160063452+fzowl@users.noreply.github.com> Co-authored-by: samanhappy <samanhappy@gmail.com> Co-authored-by: Lei Zhang <zhanglei@apache.org> Co-authored-by: Tomaz Bratanic <bratanic.tomaz@gmail.com> Co-authored-by: merdan <48309329+merdan-9@users.noreply.github.com> Co-authored-by: ccurme <chester.curme@gmail.com> Co-authored-by: Andres Algaba <andresalgaba@gmail.com> Co-authored-by: davidefantiniIntel <115252273+davidefantiniIntel@users.noreply.github.com> Co-authored-by: Jingpan Xiong <71321890+klaus-xiong@users.noreply.github.com> Co-authored-by: kaka <kaka@zbyte-inc.cloud> Co-authored-by: jingsi <jingsi@leadincloud.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Rahul Triptahi <rahul.psit.ec@gmail.com> Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com> Co-authored-by: Shengsheng Huang <shannie.huang@gmail.com> Co-authored-by: Michael Schock <mjschock@users.noreply.github.com> Co-authored-by: Anish Chakraborty <anish749@users.noreply.github.com> Co-authored-by: am-kinetica <85610855+am-kinetica@users.noreply.github.com> Co-authored-by: Dristy Srivastava <58721149+dristysrivastava@users.noreply.github.com> Co-authored-by: Matt <matthew.gotteiner@microsoft.com> Co-authored-by: William FH <13333726+hinthornw@users.noreply.github.com>	2024-05-08 17:31:11 -04:00
Andreas Motl	17e42bbd18	community[patch]: pgvector: Slight refactoring to make code a bit more reusable (#16243 ) - Description: Improve [pgvector vector store adapter](https://github.com/langchain-ai/langchain/blob/v0.1.1/libs/community/langchain_community/vectorstores/pgvector.py) to make it reusable by adapters deriving from that. - Issue: NA - Dependencies: NA - References: https://github.com/crate-workbench/langchain/pull/1 - Addressed to: @eyurtsev, @cbornet Hi from the CrateDB team, first of all, thanks a stack for conceiving and maintaining LangChain. We are currently [preparing a patch](https://github.com/crate-workbench/langchain/pull/1) for adding [CrateDB](https://github.com/crate/crate) to the list of community adapters. Because CrateDB aims to be compatible with PostgreSQL to some degree, the vector store subsystem in LangChain derives functionality from the corresponding implementation for pgvector. Therefore, in order to make the implementation more reusable, we needed to rename the private methods `__from` and `__query_collection` to the less private counterparts `_from` and `_query_collection`, so they can be overwritten, in order to unlock other adapters deriving from [pgvector](https://github.com/langchain-ai/langchain/blob/v0.1.1/libs/community/langchain_community/vectorstores/pgvector.py). With kind regards, Andreas.	2024-05-08 17:21:30 -04:00

... 5 6 7 8 9 ...

9566 Commits