langchain

mirror of https://github.com/hwchase17/langchain synced 2024-11-10 01:10:59 +00:00

Author	SHA1	Message	Date
Anthony Shaw	6c9b0f96f3	docs: Add guidance for splitting Chinese, Japanese, and Thai (#19295 ) The existing default list of separators for the `RecursiveTextSplitter` assumes spaces are word boundaries. Some languages [don't use spaces between words](https://en.wikipedia.org/wiki/Category:Writing_systems_without_word_boundaries) (Chinese, Japanese, Thai, Burmese). This PR extends the documentation to explain how to cater for those languages by adding additional punctuation to the separators and zero-width spaces which are used by some typesetters and will assist the splitter to not split in words. Ideally, these separators could be a constant in the module but for now, defining them in the documentation is a start.	2024-03-26 00:34:00 +00:00
Erick Friis	441a8012b3	mistralai[patch]: release 0.1.0 (#19540 )	2024-03-25 17:29:40 -07:00
Barun Amalkumar Halder	9246ec6b36	community[patch] : [Fiddler] ensure dataset is not added if model is present (#19293 ) Description: - minor PR to speed up onboarding by not trying to add a dataset, if a model is already present. - replace batch publish API with streaming when single events are published. Dependencies: any dependencies required for this change Twitter handle: behalder Co-authored-by: Barun Halder <barun@fiddler.ai>	2024-03-25 17:28:05 -07:00
JSDu	6e090280fd	community[patch]: milvus will autoflush, manual flush is slowly (#19300 ) reference: https://milvus.io/docs/configure_quota_limits.md#quotaAndLimitsflushRateenabled https://github.com/milvus-io/milvus/issues/31407 Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-26 00:26:58 +00:00
mackong	e65dc4b95b	community[patch]: clean warning when delete by ids (#19301 ) * Description: rearrange to avoid variable overwrite, which cause warning always. * Issue: N/A * Dependencies: N/A	2024-03-25 17:23:22 -07:00
Ian	d5415dbd68	docs: improve tidb integrations documents (#19321 ) This PR aims to enhance the documentation for TiDB integration, driven by feedback from our users. It provides detailed introductions to key features, ensuring developers can fully leverage TiDB for AI application development.	2024-03-25 17:08:23 -07:00
Stefano Mosconi	01fc69c191	community[patch]: expanding version in confluence loader (#19324 ) Description: Expanding version in all the Confluence API calls so to get when the page was last modified/created in all cases. Issue: #12812 Twitter handle: zzste	2024-03-25 17:08:01 -07:00
Dmitry Tyumentsev	08b769d539	community[patch]: YandexGPT Use recent yandexcloud sdk version (#19341 ) Fixed inability to work with [yandexcloud SDK](https://pypi.org/project/yandexcloud/) version higher 0.265.0	2024-03-25 17:05:57 -07:00
Marlene	f1313339ac	community[patch]: Fixing incorrect base URLs for Azure Cognitive Search Retriever (#19352 ) This PR adds code to make sure that the correct base URL is being created for the Azure Cognitive Search retriever. At the moment an incorrect base URL is being generated. I think this is happening because the original code was based on a depreciated API version. No dependencies need to be added. I've also added more context to the test doc strings. I should also note that ACS is now Azure AI Search. I will open a separate PR to make these changes as that would be a breaking change and should potentially be discussed. Twitter: @marlene_zw - No new tests added, however the current ACS retriever tests are now passing when I run them. - Code was linted. Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-26 00:04:59 +00:00
Tridib Roy Arjo	d667b1ea8f	docs: Update async_chromium.ipynb (#19514 ) In Jupyter, asyncio would throw an error before `.load()` unless `nest_asyncio` is applied (Issue #8494 mentioned this) +Minor typo fixes..	2024-03-26 00:02:50 +00:00
Bob Lin	5b6b1f9e1d	docs: Fix several sample code errors (#19382 )	2024-03-25 16:59:52 -07:00
FinTech秋田	03ba1d4731	community[patch]: Add Support for GPU Index Types in Milvus 2.4 (#19468 ) - Description: This commit introduces support for the newly available GPU index types introduced in Milvus 2.4 within the LangChain project's `milvus.py`. With the release of Milvus 2.4, a range of GPU-accelerated index types have been added, offering enhanced search capabilities and performance optimizations for vector search operations. This update ensures LangChain users can fully utilize the new performance benefits for vector search operations. - Reference: https://milvus.io/docs/gpu_index.md Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-25 23:39:54 +00:00
Hamid Ali	c281ec8887	docs: Fix broken link in semantic-chunker.ipynb (#19464 ) Corrected a broken link within the semantic-chunker.ipynb notebook, ensuring that users can access the referenced resource. Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-25 23:39:32 +00:00
Ash Vardanian	d01bad5169	core[patch]: Convert SimSIMD back to NumPy (#19473 ) This patch fixes the #18022 issue, converting the SimSIMD internal zero-copy outputs to NumPy. I've also noticed, that oftentimes `dtype=np.float32` conversion is used before passing to SimSIMD. Which numeric types do LangChain users generally care about? We support `float64`, `float32`, `float16`, and `int8` for cosine distances and `float16` seems reasonable for practically any kind of embeddings and any modern piece of hardware, so we can change that part as well 🤗	2024-03-25 16:36:26 -07:00
Ikko Eltociear Ashimine	980658cb47	docs: Update streaming.ipynb (#19500 ) Fixed typo. occuring -> occurring	2024-03-25 16:21:45 -07:00
Leonid Kuligin	91f4c80143	docs: fixed links (#19503 ) - [ ] PR title: "docs: fixed broken links" - [ ] PR message: - Description: fixed links in the documentation	2024-03-25 16:19:28 -07:00
Mikelarg	dac2e0165a	community[minor]: Added GigaChat Embeddings support + updated previous GigaChat integration (#19516 ) - Description: Added integration with [GigaChat](https://developers.sber.ru/portal/products/gigachat) embeddings. Also added support for extra fields in GigaChat LLM and fixed docs.	2024-03-25 16:08:37 -07:00
Martin Kolb	e5bdb26f76	community[patch]: More flexible handling for entity names in vector store "HANA Cloud" (#19523 ) - Description: Added support for lower-case and mixed-case names The names for tables and columns previouly had to be UPPER_CASE. With this enhancement, also lower_case and MixedCase are supported, - Issue: N/A - Dependencies: no new dependecies added - Twitter handle: @sapopensource	2024-03-25 15:52:45 -07:00
Erica Clark	a1ff21f90f	docs: Update local llms article to use invoke instead of deprecated __call__ (#19528 ) - Description: Since the implicit `__call__` has been deprecated in favor of `invoke`, the local_llms article also needed to be updated. This article was my introduction to Lanchain, and as it was helpful in getting me setup with running LLMs locally, it is nice to not have any warnings when running the example code. With this change, the warnings go away when running the example code. - Issue: N/A - Dependencies: N/A - Twitter handle: clarkerican	2024-03-25 15:51:39 -07:00
Orest Xherija	0b1e09029f	openai[patch]: increase max batch size for Azure OpenAI Embeddings API (#19532 ) Description: Azure OpenAI has increased its maximum batch size from 16 to 2048 for the Embeddings API per this How-To [page](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/embeddings?tabs=console#best-practices) Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-25 15:50:07 -07:00
Eugene Yurtsev	56f4c5459b	core[patch]: fix xml output parser transform (#19530 ) Previous PR passed _parser attribute which apparently is not meant to be used by user code and causes non deterministic failures on CI when testing the transform and a transform methods. Reverting this change temporarily.	2024-03-25 21:34:45 +00:00
Erick Friis	e6952b04d5	cohere[patch]: fix release (#19529 )	2024-03-25 13:46:29 -07:00
aditya thomas	aa68fd7e91	core[runnables]: docstring for class runnable, method with_listeners() (#19515 ) Description: Docstring for method with_listerners() of class Runnable Issue: [Add in code documentation to core Runnable methods #18804](https://github.com/langchain-ai/langchain/issues/18804) Dependencies: None	2024-03-25 16:24:58 -04:00
billytrend-cohere	63343b4987	cohere[patch]: add cohere as a partner package (#19049 ) Description: adds support for langchain_cohere --------- Co-authored-by: Harry M <127103098+harry-cohere@users.noreply.github.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-03-25 20:23:47 +00:00
Eugene Yurtsev	727d5023ce	core[patch]: Use defusedxml in XMLOutputParser (#19526 ) This mitigates a security concern for users still using older versions of libexpat that causes an attacker to compromise the availability of the system if an attacker manages to surface malicious payload to this XMLParser.	2024-03-25 16:21:52 -04:00
Zachary Wilkins	e1a6341940	langchain: Passthrough batch_size on index()/aindex() calls (#19443 ) Description: This change passes through `batch_size` to `add_documents()`/`aadd_documents()` on calls to `index()` and `aindex()` such that the documents are processed in the expected batch size. Issue: #19415 Dependencies: N/A Twitter handle: N/A	2024-03-25 11:58:29 -04:00
ccurme	82de8fd6c9	add kwargs (#19519 ) `HanaDB.add_texts` is missing **kwargs.	2024-03-25 11:56:01 -04:00
Nikhil Kumar	3d3b46a782	docs: Update docs for `HuggingFacePipeline` (#19306 ) Updated `HuggingFacePipeline` docs to be in sync with list of supported tasks, including translation. - [x] PR title: "community: Update docs for `HuggingFacePipeline`" - Where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [x] PR message: - Description: Update docs for `HuggingFacePipeline`, was earlier missing `translation` as a valid task - Issue: N/A - Dependencies: N/A - Twitter handle: None - [x] Add tests and docs: - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/	2024-03-25 00:29:21 -07:00
Igor Muniz Soares	743f888580	community[minor]: Dappier chat model integration (#19370 ) Description: This PR adds [Dappier](https://dappier.com/) for the chat model. It supports generate, async generate, and batch functionalities. We added unit and integration tests as well as a notebook with more details about our chat model. Dependencies: No extra dependencies are needed.	2024-03-25 07:29:05 +00:00
Jacob Lezberg	64e1df3d3a	infra: Update package version to apply CVE-related patch (#19490 ) - Description: [CVE 2024-21503](https://www.cve.org/CVERecord?id=CVE-2024-21503) was recently identified. The python linter "black" suffers from a potential Regex-related denial of service attack. Updated version from the vulnerable 24.2.0 to the patched 24.3.0. - Issue: N/A - Dependencies: The 'black' package in both `langchain` (top-level) and `templates/python-lint`. Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-25 07:11:23 +00:00
Hugoberry	96dc180883	community[minor]: Add `DuckDB` as a vectorstore (#18916 ) DuckDB has a cosine similarity function along list and array data types, which can be used as a vector store. - Description: The latest version of DuckDB features a cosine similarity function, which can be used with its support for list or array column types. This PR surfaces this functionality to langchain. - Dependencies: duckdb 0.10.0 - Twitter handle: @igocrite --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-25 07:02:35 +00:00
Ethan Yang	fa6397d76a	docs: Add OpenVINO llms docs (#19489 ) Add OpenVINOpipeline instructions in docs. OpenVINO users can find more details in this page.	2024-03-24 23:57:30 -07:00
preak95	6ea3e57a63	community[minor]: S3FileLoader to use expose mode and post_processors arguments of unstructured loader (#19270 ) Description: Update s3_file.py to use arguments mode and post_processors from the base class UnstructuredBaseLoader to include more metadata about the files from the S3 bucket such as 'page_number', 'languages' etc. Issue: NA Dependencies: None Twitter handle: preak95 --------- Co-authored-by: ccurme <chester.curme@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-25 06:56:55 +00:00
Guangdong Liu	560e2182d8	docs: docstring Runnable `pipe` and `pick` methods (docs only) (#19395 ) - Issue: #18804 - @eyurtsev @ccurme PTAL --------- Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-24 23:50:04 -07:00
Christophe Bornet	63898dbda0	langchain[patch]: Use async memory in Chain when needed (#19429 )	2024-03-24 23:49:00 -07:00
Lance Martin	db7403d667	docs: Remove non-rendering images & output spamming from doc ntbks (#19475 ) Looking at tokens / page of our docs, we see a few outliers: <img width="761" alt="image" src="https://github.com/langchain-ai/langchain/assets/122662504/677aa2d6-0a29-45e4-882a-db2bbf46d02b"> It is due to non-rendering images in one case, and output spamming. Clean these, along with other cases of excessing output spamming in docs. All get sucked into chat-langchain for retrieval.	2024-03-24 23:47:38 -07:00
Erick Friis	b617085af0	mistralai[patch]: streaming tool calls (#19469 )	2024-03-23 19:24:53 +00:00
aditya thomas	b43a9d5808	docs: adding voyageai to the list of partner packages (#19376 ) Description: Adding VoyageAI to the list of partners Issue: A standalone langchain-voyageai package has been added Dependencies: None	2024-03-22 17:08:15 -07:00
Zeeland	2549df00cd	docs: fix error bilibili url (#19375 ) Thank you for contributing to LangChain! bilibili-api-python use https://github.com/Nemo2011/bilibili-api repo. Change to the correct address. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17.	2024-03-22 17:06:17 -07:00
aditya thomas	375ab7bf59	docs: update module imports for fireworks documentation (#19377 ) Description: Update module imports for Fireworks documentation Issue: Module imports not present or in incorrect location Dependencies: None	2024-03-22 17:05:27 -07:00
aditya thomas	0cc0467267	docs: update import paths and move to lcel for llama.cpp examples (#19391 ) Description: Update import paths and move to lcel for llama.cpp examples Issue: Update import paths to reflect package refactoring and move chains to LCEL in examples Dependencies: None --------- Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-23 00:04:12 +00:00
fengjial	3b52ee05d1	community[patch]: fix bugs in baiduvectordb as vectorstore (#19380 ) fix small bugs in vectorstore/baiduvectordb	2024-03-22 17:03:59 -07:00
Cailin Wang	5402aef32e	docs: Add `partition` parameter to DashVector (#19385 ) Description: Add `partition` parameter to DashVector dashvector.ipynb Related PR: https://github.com/langchain-ai/langchain/pull/19023 Twitter handle: @CailinWang_ --------- Co-authored-by: root <root@Bluedot-AI>	2024-03-22 17:00:29 -07:00
aditya thomas	515aab3312	community[patch]: invoke callback prior to yielding token (openai) (#19389 ) Description: Invoke callback prior to yielding token for BaseOpenAI & OpenAIChat Issue: [Callback for on_llm_new_token should be invoked before the token is yielded by the model #16913](https://github.com/langchain-ai/langchain/issues/16913) Dependencies: None	2024-03-22 16:45:55 -07:00
aditya thomas	49e932cd24	community[patch]: invoke callback prior to yielding token (fireworks) (#19388 ) Description: Invoke callback prior to yielding token for Fireworks Issue: [Callback for on_llm_new_token should be invoked before the token is yielded by the model #16913](https://github.com/langchain-ai/langchain/issues/16913) Dependencies: None	2024-03-22 16:44:06 -07:00
aditya thomas	16ef88a87d	docs: moving FireworksEmbeddings documentation to docs folder (#19398 ) Description: Moving FireworksEmbeddings documentation to the location docs/integration/text_embedding/ from langchain_fireworks/docs/ Issue: FireworksEmbeddings documentation was not in the correct location Dependencies: None --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-22 23:24:22 +00:00
Leonid Ganeline	06190063e7	infra: makefile `api_docs_clean` fix (#19405 ) Fixed a Makefile command that cleans up the api_docs	2024-03-22 15:45:55 -07:00
Christophe Bornet	1b813fe6fe	langchain[patch]: Add async methods to VectorStoreRetrieverMemory (#19408 )	2024-03-22 15:44:24 -07:00
Tarun Jain	ef6d3d66d6	community[patch]: docarray requires hnsw installation (#19416 ) I have a small dataset, and I tried to use docarray: ``DocArrayHnswSearch ``. But when I execute, it returns: ```bash raise ImportError( ImportError: Could not import docarray python package. Please install it with `pip install "langchain[docarray]"`. ``` Instead of docarray it needs to be ```bash docarray[hnswlib] ``` Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-22 22:39:07 +00:00
German Swan	d4dc98a9f9	community[patch]: RecursiveUrlLoader: add base_url option (#19421 ) RecursiveUrlLoader does not currently provide an option to set `base_url` other than the `url`, though it uses a function with such an option. For example, this causes it unable to parse the `https://python.langchain.com/docs`, as it returns the 404 page, and `https://python.langchain.com/docs/get_started/introduction` has no child routes to parse. `base_url` allows setting the `https://python.langchain.com/docs` to filter by, while the starting URL is anything inside, that contains relevant links to continue crawling. I understand that for this case, the docusaurus loader could be used, but it's a common issue with many websites. --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-22 15:34:31 -07:00

1 2 3 4 5 ...

8305 Commits