langchain

mirror of https://github.com/hwchase17/langchain synced 2024-11-16 06:13:16 +00:00

Author	SHA1	Message	Date
Leonid Ganeline	3aacd11846	community[minor]: added missed class to __all__ (#19888 ) Added missed `UnstructuredCHMLoader` class to the document_loader.\_\_init\_\_.py \_\_all\_\_	2024-04-04 16:16:51 -04:00
Tomaz Bratanic	df25829f33	community[minor]: Add metadata filtering support for neo4j vector (#20001 )	2024-04-04 11:37:06 -04:00
Ben Mitchell	b52b78478f	community[minor]: Implement Async OpenSearch `afrom_texts` & `afrom_embeddings` (#20009 ) - Description: Adds async variants of afrom_texts and afrom_embeddings into `OpenSearchVectorSearch`, which allows for `afrom_documents` to be called. - Issue: I implemented this because my use case involves an async scraper generating documents as and when they're ready to be ingested by Embedding/OpenSearch - Dependencies: None that I'm aware Co-authored-by: Ben Mitchell <b.mitchell@reply.com>	2024-04-04 15:36:14 +00:00
Tomaz Bratanic	09a0ecd000	langchain[minor]: Tests update metadata filtering examples of documents (#19963 ) Removing metadata properties that are dicts as some databases don't support that, and those properties aren't used in tests anyhow..	2024-04-03 12:44:14 -04:00
happy-go-lucky	c6432abdbe	community[patch]: Implement delete method and all async methods in opensearch_vector_search (#17321 ) - Description: In order to use index and aindex in libs/langchain/langchain/indexes/_api.py, I implemented delete method and all async methods in opensearch_vector_search - Dependencies: No changes	2024-04-03 09:40:49 -07:00
Cheng, Penghui	cc407e8a1b	community[minor]: weight only quantization with intel-extension-for-transformers. (#14504 ) Support weight only quantization with intel-extension-for-transformers. [Intel® Extension for Transformers](https://github.com/intel/intel-extension-for-transformers) is an innovative toolkit to accelerate Transformer-based models on Intel platforms, in particular effective on 4th Intel Xeon Scalable processor [Sapphire Rapids](https://www.intel.com/content/www/us/en/products/docs/processors/xeon-accelerated/4th-gen-xeon-scalable-processors.html) (codenamed Sapphire Rapids). The toolkit provides the below key features: * Seamless user experience of model compressions on Transformer-based models by extending [Hugging Face transformers](https://github.com/huggingface/transformers) APIs and leveraging [Intel® Neural Compressor](https://github.com/intel/neural-compressor) * Advanced software optimizations and unique compression-aware runtime. * Optimized Transformer-based model packages. * [NeuralChat](https://github.com/intel/intel-extension-for-transformers/blob/main/intel_extension_for_transformers/neural_chat), a customizable chatbot framework to create your own chatbot within minutes by leveraging a rich set of plugins and SOTA optimizations. * [Inference](https://github.com/intel/intel-extension-for-transformers/blob/main/intel_extension_for_transformers/llm/runtime/graph) of Large Language Model (LLM) in pure C/C++ with weight-only quantization kernels. This PR is an integration of weight only quantization feature with intel-extension-for-transformers. Unit test is in lib/langchain/tests/integration_tests/llm/test_weight_only_quantization.py The notebook is in docs/docs/integrations/llms/weight_only_quantization.ipynb. The document is in docs/docs/integrations/providers/weight_only_quantization.mdx. --------- Signed-off-by: Cheng, Penghui <penghui.cheng@intel.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-04-03 16:21:34 +00:00
Eugene Yurtsev	d293431e10	core[minor]: Add aload to document loader (#19936 ) Add aload to document loader	2024-04-03 10:46:47 -04:00
Leonid Kuligin	eb0521064e	deprecating integrations moved to langchain_google_community (#19841 ) Thank you for contributing to LangChain! - [ ] PR title: "community: deprecating integrations moved to langchain_google_community" - [ ] PR message: deprecating integrations moved to langchain_google_community --------- Co-authored-by: ccurme <chester.curme@gmail.com>	2024-04-02 17:06:07 -04:00
Erick Friis	f0d5b59962	core[patch]: remove requests (#19891 ) Removes required usage of `requests` from `langchain-core`, all of which has been deprecated. - removes Tracer V1 implementations - removes old `try_load_from_hub` github-based hub implementations Removal done in a way where imports will still succeed, and usage will fail with a `RuntimeError`.	2024-04-02 20:28:10 +00:00
Peter Vandenabeele	e830a4e731	community[patch]: Add remove_comments option (default True): do not extract html comments (#13259 ) - Description: add `remove_comments` option (default: True): do not extract html _comments_, - Issue: None, - Dependencies: None, - Tag maintainer: @nfcampos , - Twitter handle: peter_v I ran `make format`, `make lint` and `make test`. Discussion: I my use case, I prefer to not have the comments in the extracted text: * e.g. from a Google tag that is added in the html as comment * e.g. content that the authors have temporarily hidden to make it non visible to the regular reader Removing the comments makes the extracted text more alike the intended text to be seen by the reader. Choice to make: do we prefer to make the default for this `remove_comments` option to be True or False? I have changed it to True in a second commit, since that is how I would prefer to use it by default. Have the cleaned text (without technical Google tags etc.) and also closer to the actually visible and intended content. I am not sure what is best aligned with the conventions of langchain in general ... INITIAL VERSION (new version above): ~Choice to make: do we prefer to make the default for this `ignore_comments` option to be True or False? I have set it to False now to be backwards compatible. On the other hand, I would use it mostly with True. I am not sure what is best aligned with the conventions of langchain in general ...~ --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-04-02 00:19:12 +00:00
Jamsheed Mistri	4f70bc119d	community[minor]: add Layerup Security integration (#19787 ) Description: adds integration with [Layerup Security](https://uselayerup.com). Docs can be found [here](https://docs.uselayerup.com). Integrates directly with our Python SDK. Dependencies: [LayerupSecurity](https://pypi.org/project/LayerupSecurity/) Note: all methods for our product require a paid API key, so I only included 1 test which checks for an invalid API key response. I have tested extensively locally. Twitter handle: [@layerup_](https://twitter.com/layerup_) --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-04-01 23:49:00 +00:00
Anıl Berk Altuner	4384fa8e49	community[minor]: Add Dria retriever (#17098 ) [Dria](https://dria.co/) is a hub of public RAG models for developers to both contribute and utilize a shared embedding lake. This PR adds a retriever that can retrieve documents from Dria.	2024-04-01 12:04:19 -07:00
Ethan Yang	48f84e253e	community[minor]: Add OpenVINO rerank model support (#19791 ) @eaidova @AlexKoff88 Could you help to review, thanks --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-04-01 18:27:23 +00:00
Chenhui Zhang	a1f3e9f537	community[minor]: Update ChatZhipuAI to support GLM-4 model (#16695 ) Description: Update `ChatZhipuAI` to support the latest `glm-4` model. Issue: N/A Dependencies: httpx, httpx-sse, PyJWT The previous `ChatZhipuAI` implementation requires the `zhipuai` package, and cannot call the latest GLM model. This is because - The old version `zhipuai==1.` doesn't support the latest model. - `zhipuai==2.` requires `pydantic V2`, which is incompatible with 'langchain-community'. This re-implementation invokes the GLM model by sending HTTP requests to [open.bigmodel.cn](https://open.bigmodel.cn/dev/api) via the `httpx` package, and uses the `httpx-sse` package to handle stream events. --------- Co-authored-by: zR <2448370773@qq.com>	2024-04-01 18:11:21 +00:00
Bagatur	d25b5b6f25	community[patch]: Release 0.0.31 (#19873 )	2024-04-01 10:50:22 -07:00
Bagatur	d62e84c4f5	community[patch]: Revert " Fix the bug that Chroma does not specify `e… (#19866 ) …mbedding_function` (#19277)" This reverts commit `7042934b5f`. Fixes #19848	2024-04-01 10:10:44 -07:00
Bagatur	0242bce38c	community[patch]: Release 0.0.30 (#19838 )	2024-03-31 21:26:30 -07:00
hsuyuming	5ab6b39098	community[patch]: add attribution_token within GoogleVertexAISearchRetriever (#18520 ) - Description: Add attribution_token within GoogleVertexAISearchRetriever so user can provide this information to Google support team or product team during debug session. Reference: https://cloud.google.com/generative-ai-app-builder/docs/view-analytics#user-events Attribution tokens. Attribution tokens are unique IDs generated by Vertex AI Search and returned with each search request. Make sure to include that attribution token as UserEvent.attributionToken with any user events resulting from a search. This is needed to identify if a search is served by the API. Only user events with a Google-generated attribution token are used to compute metrics. - Issue: No - Dependencies: No - Twitter handle: abehsu1992626 --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-31 13:54:56 -07:00
Kenneth Choe	f98d7f7494	langchain[minor], community[minor]: add CrossEncoderReranker with HuggingFaceCrossEncoder and SagemakerEndpointCrossEncoder (#13687 ) - Description: Support reranking based on cross encoder models available from HuggingFace. - Added `CrossEncoder` schema - Implemented `HuggingFaceCrossEncoder` and `SagemakerEndpointCrossEncoder` - Implemented `CrossEncoderReranker` that performs similar functionality to `CohereRerank` - Added `cross-encoder-reranker.ipynb` to demonstrate how to use it. Please let me know if anything else needs to be done to make it visible on the table-of-contents navigation bar on the left, or on the card list on [retrievers documentation page](https://python.langchain.com/docs/integrations/retrievers). - Issue: N/A - Dependencies: None other than the existing ones. --------- Co-authored-by: Kenny Choe <kchoe@amazon.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-31 20:51:31 +00:00
Kamal Zhang	368e35c3b1	community[patch]: introduce convert_to_secret() to bananadev llm (#14283 ) - Description: Per #12165, this PR add to BananaLLM the function convert_to_secret_str() during environment variable validation. - Issue: #12165 - Tag maintainer: @eyurtsev - Twitter handle: @treewatcha75751 --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-30 00:52:25 +00:00
anshaneel	0884e5de7f	community[minor]: Add Alpha Vantage API Tool (#14332 ) ### Description This implementation adds functionality from the AlphaVantage API, renowned for its comprehensive financial data. The class encapsulates various methods, each dedicated to fetching specific types of financial information from the API. ### Implemented Functions - `search_symbols`: - Searches the AlphaVantage API for financial symbols using the provided keywords. - `_get_market_news_sentiment`: - Retrieves market news sentiment for a specified stock symbol from the AlphaVantage API. - `_get_time_series_daily`: - Fetches daily time series data for a specific symbol from the AlphaVantage API. - `_get_quote_endpoint`: - Obtains the latest price and volume information for a given symbol from the AlphaVantage API. - `_get_time_series_weekly`: - Gathers weekly time series data for a particular symbol from the AlphaVantage API. - `_get_top_gainers_losers`: - Provides details on top gainers, losers, and most actively traded tickers in the US market from the AlphaVantage API. ### Issue: - #11994 ### Dependencies: - 'requests' library for HTTP requests. (import requests) - 'pytest' library for testing. (import pytest) --------- Co-authored-by: Adam Badar <94140103+adam-badar@users.noreply.github.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-30 00:44:01 +00:00
Alex Sherstinsky	a9bc212bf2	community[minor]: fix failing Predibase integration (#19776 ) - [x] PR title: "package: description" - Where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [x] PR message: *Delete this entire checklist* and replace with - Description: Langchain-Predibase integration was failing, because it was not current with the Predibase SDK; in addition, Predibase integration tests were instantiating the Langchain Community `Predibase` class with one required argument (`model`) missing. This change updates the Predibase SDK usage and fixes the integration tests. - Twitter handle: `@alexsherstinsky` --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-30 00:38:13 +00:00
ethynic	e9caa22d47	community[patch]: Update minimax.py (#14384 ) MiniMaxChat class _generate method shoud return a ChatResult object not str Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-29 23:57:06 +00:00
M.Abdulrahman Alnaseer	ba54f1577f	community[minor]: add support for llmsherpa (#19741 ) Thank you for contributing to LangChain! - [x] PR title: "community: added support for llmsherpa library" - [x] Add tests and docs: 1. Integration test: 'docs/docs/integrations/document_loaders/test_llmsherpa.py'. 2. an example notebook: `docs/docs/integrations/document_loaders/llmsherpa.ipynb`. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-29 16:04:57 -07:00
Hrvoje Milković	b7344e3347	community[minor]: Infobip tool integration (#16805 ) Description: Adding Tool that wraps Infobip API for sending sms or emails and email validation. Dependencies: None, Twitter handle: @hmilkovic Implementation: ``` libs/community/langchain_community/utilities/infobip.py ``` Integration tests: ``` libs/community/tests/integration_tests/utilities/test_infobip.py ``` Example notebook: ``` docs/docs/integrations/tools/infobip.ipynb ``` --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-29 19:01:27 +00:00
Luka Krapic	727a2ea9f1	community[patch]: history size support for DynamoDBChatMessageHistory (#16794 ) Description: PR adds support for limiting number of messages preserved in a session history for DynamoDBChatMessageHistory --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-29 18:56:21 +00:00
Dt22	6dbf1a2de0	community[patch]: fix redis input type for index_schema field (#16874 ) ### Subject: Fix Type Misdeclaration for index_schema in redis/base.py I noticed a type misdeclaration for the index_schema column in the redis/base.py file. When following the instructions outlined in [Redis Custom Metadata Indexing](https://python.langchain.com/docs/integrations/vectorstores/redis) to create our own index_schema, it leads to a Pylance type error. <br/> The error message indicates that Dict[str, list[Dict[str, str]]] is incompatible with the type Optional[Union[Dict[str, str], str, os.PathLike]]. ``` index_schema = { "tag": [{"name": "credit_score"}], "text": [{"name": "user"}, {"name": "job"}], "numeric": [{"name": "age"}], } rds, keys = Redis.from_texts_return_keys( texts, embeddings, metadatas=metadata, redis_url="redis://localhost:6379", index_name="users_modified", index_schema=index_schema, ) ``` Therefore, I have created this pull request to rectify the type declaration problem. --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-29 18:55:54 +00:00
morgana	074ad5095f	community[patch]: mmr search for Rockset vectorstore integration (#16908 ) - Description: Adding support for mmr search in the Rockset vectorstore integration. - Issue: N/A - Dependencies: N/A - Twitter handle: `@_morgan_adams_` --------- Co-authored-by: Rockset API Bot <admin@rockset.io> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-29 18:45:22 +00:00
shahrin014	f51e6a35ba	community[patch]: OllamaEmbeddings - Pass headers to post request (#16880 ) ## Feature - Set additional headers in constructor - Headers will be sent in post request This feature is useful if deploying Ollama on a cloud service such as hugging face, which requires authentication tokens to be passed in the request header. ## Tests - Test if header is passed - Test if header is not passed Similar to https://github.com/langchain-ai/langchain/pull/15881 --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-29 18:44:52 +00:00
Jan Chorowski	b8b42ccbc5	community[minor]: Pathway vectorstore(#14859 ) - Description: Integration with pathway.com data processing pipeline acting as an always updated vectorstore - Issue: not applicable - Dependencies: optional dependency on [`pathway`](https://pypi.org/project/pathway/) - Twitter handle: pathway_com The PR provides and integration with `pathway` to provide an easy to use always updated vector store: ```python import pathway as pw from langchain.embeddings.openai import OpenAIEmbeddings from langchain.text_splitter import CharacterTextSplitter from langchain.vectorstores import PathwayVectorClient, PathwayVectorServer data_sources = [] data_sources.append( pw.io.gdrive.read(object_id="17H4YpBOAKQzEJ93xmC2z170l0bP2npMy", service_user_credentials_file="credentials.json", with_metadata=True)) text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0) embeddings_model = OpenAIEmbeddings(openai_api_key=os.environ["OPENAI_API_KEY"]) vector_server = PathwayVectorServer( *data_sources, embedder=embeddings_model, splitter=text_splitter, ) vector_server.run_server(host="127.0.0.1", port="8765", threaded=True, with_cache=False) client = PathwayVectorClient( host="127.0.0.1", port="8765", ) query = "What is Pathway?" docs = client.similarity_search(query) ``` The `PathwayVectorServer` builds a data processing pipeline which continusly scans documents in a given source connector (google drive, s3, ...) and builds a vector store. The `PathwayVectorClient` implements LangChain's `VectorStore` interface and connects to the server to retrieve documents. --------- Co-authored-by: Mateusz Lewandowski <lewymati@users.noreply.github.com> Co-authored-by: mlewandowski <mlewandowski@MacBook-Pro-mlewandowski.local> Co-authored-by: Berke <berkecanrizai1@gmail.com> Co-authored-by: Adrian Kosowski <adrian@pathway.com> Co-authored-by: mlewandowski <mlewandowski@macbook-pro-mlewandowski.home> Co-authored-by: berkecanrizai <63911408+berkecanrizai@users.noreply.github.com> Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: mlewandowski <mlewandowski@MBPmlewandowski.ht.home> Co-authored-by: Szymon Dudycz <szymond@pathway.com> Co-authored-by: Szymon Dudycz <szymon.dudycz@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-29 10:50:39 -07:00
Arturs Konfino	2319212d54	community[patch]: avoid executing `toolkit.get_context()` when not necessary (#19762 ) If `prompt` is passed into `create_sql_agent()`, then `toolkit.get_context()` shouldn't be executed against the database unless relevant prompt variables (`table_info` or `table_names`) are present .	2024-03-29 16:42:21 +00:00
高璟琦	ec7a59c96c	community[minor]: Add solar embedding (#19761 ) Solar is a large language model developed by [Upstage](https://upstage.ai/). It's a powerful and purpose-trained LLM. You can visit the embedding service provided by Solar within this pr. You may get SOLAR_API_KEY from https://console.upstage.ai/services/embedding You can refer to more details about accepted llm integration at https://python.langchain.com/docs/integrations/llms/solar.	2024-03-29 09:36:05 -07:00
Tomaz Bratanic	dec00d3050	community[patch]: Add the ability to pass maps to neo4j retrieval query (#19758 ) Makes it easier to flatten complex values to text, so you don't have to use a lot of Cypher to do it.	2024-03-29 08:33:48 -07:00
Robby	f7e8a382cc	community[minor]: add hugging face text-to-speech inference API (#18880 ) Description: I implemented a tool to use Hugging Face text-to-speech inference API. Issue: n/a Dependencies: n/a Twitter handle: No Twitter, but do have [LinkedIn](https://www.linkedin.com/in/robby-horvath/) lol. --------- Co-authored-by: Robby <h0rv@users.noreply.github.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-03-29 15:02:29 +00:00
DasDingoCodes	73eb3f8fd9	community[minor]: Implement DirectoryLoader lazy_load function (#19537 ) Thank you for contributing to LangChain! - [x] PR title: "community: Implement DirectoryLoader lazy_load function" - [x] Description: The `lazy_load` function of the `DirectoryLoader` yields each document separately. If the given `loader_cls` of the `DirectoryLoader` also implemented `lazy_load`, it will be used to yield subdocuments of the file. - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access: `libs/community/tests/unit_tests/document_loaders/test_directory_loader.py` 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory: `docs/docs/integrations/document_loaders/directory.ipynb` - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-03-29 14:46:52 +00:00
Jialei	f7c903e24a	community[minor]: add support for Moonshot llm and chat model (#17100 )	2024-03-29 08:54:23 +00:00
Ethan Yang	7164015135	community[minor]: Add Openvino embedding support (#19632 ) This PR is used to support both HF and BGE embeddings with openvino --------- Co-authored-by: Alexander Kozlov <alexander.kozlov@intel.com>	2024-03-29 01:34:51 -07:00
T Cramer	540ebf35a9	community[patch]: Add explicit error message to Bedrock error output. (#17328 ) - Description: Propagate Bedrock errors into Langchain explicitly. Use-case: unset region error is hidden behind 'Could not load credentials...' message - Issue: [17654](https://github.com/langchain-ai/langchain/issues/17654) - Dependencies: None --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-29 03:07:33 +00:00
Marcus Virginia	69bb96c80f	community[patch]: surrealdb handle for empty metadata and allow collection names with complex characters (#17374 ) - Description: Handle for empty metadata and allow collection names with complex characters - Issue: #17057 - Dependencies: `surrealdb` --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-29 01:04:27 +00:00
kYLe	124ab79c23	community[minor]: Add Anyscale embedding support (#17605 ) Description: Add embedding model support for Anyscale Endpoint Dependencies: openai --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-29 00:53:53 +00:00
Lance Martin	12843f292f	community[patch]: llama cpp embeddings reset default n_batch (#17594 ) When testing Nomic embeddings -- ``` from langchain_community.embeddings import LlamaCppEmbeddings embd_model_path = "/Users/rlm/Desktop/Code/llama.cpp/models/nomic-embd/nomic-embed-text-v1.Q4_K_S.gguf" embd_lc = LlamaCppEmbeddings(model_path=embd_model_path) embedding_lc = embd_lc.embed_query(query) ``` We were seeing this error for strings > a certain size -- ``` File ~/miniforge3/envs/llama2/lib/python3.9/site-packages/llama_cpp/llama.py:827, in Llama.embed(self, input, normalize, truncate, return_count) 824 s_sizes = [] 826 # add to batch --> 827 self._batch.add_sequence(tokens, len(s_sizes), False) 828 t_batch += n_tokens 829 s_sizes.append(n_tokens) File ~/miniforge3/envs/llama2/lib/python3.9/site-packages/llama_cpp/_internals.py:542, in _LlamaBatch.add_sequence(self, batch, seq_id, logits_all) 540 self.batch.token[j] = batch[i] 541 self.batch.pos[j] = i --> 542 self.batch.seq_id[j][0] = seq_id 543 self.batch.n_seq_id[j] = 1 544 self.batch.logits[j] = logits_all ValueError: NULL pointer access ``` The default `n_batch` of llama-cpp-python's Llama is `512` but we were explicitly setting it to `8`. These need to be set to equal for embedding models. * The embedding.cpp example has an assertion to make sure these are always equal. * Apparently this is not being done properly in llama-cpp-python. With `n_batch` set to 8, if more than 8 tokens are passed the batch runs out of space and it crashes. This also explains why the CPU compute buffer size was small: raw client with default `n_batch=512` ``` llama_new_context_with_model: CPU input buffer size = 3.51 MiB llama_new_context_with_model: CPU compute buffer size = 21.00 MiB ``` langchain with `n_batch=8` ``` llama_new_context_with_model: CPU input buffer size = 0.04 MiB llama_new_context_with_model: CPU compute buffer size = 0.33 MiB ``` We can work around this by passing `n_batch=512`, but this will not be obvious to some users: ``` embedding = LlamaCppEmbeddings(model_path=embd_model_path, n_batch=512) ``` From discussion w/ @cebtenzzre. Related: https://github.com/abetlen/llama-cpp-python/issues/1189 Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-29 00:47:22 +00:00
Zijian Han	8e976545f3	community[patch]: support OpenAI whisper base url (#17695 ) Description: The base URL for OpenAI is retrieved from the environment variable "OPENAI_BASE_URL", whereas for langchain it is obtained from "OPENAI_API_BASE". By adding `base_url = os.environ.get("OPENAI_API_BASE")`, the OpenAI proxy can execute correctly. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-29 00:35:27 +00:00
Paulo Nascimento	44a3484503	community[patch]: add NotebookLoader unit test (#17721 ) Thank you for contributing to LangChain! - Description: added unit tests for NotebookLoader. Linked PR: https://github.com/langchain-ai/langchain/pull/17614 - Issue: [#17614](https://github.com/langchain-ai/langchain/pull/17614) - Twitter handle: @paulodoestech - [x] Pass lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified to check that you're passing lint and testing. See contribution guidelines for more information on how to write/run tests, lint, etc: https://python.langchain.com/docs/contributing/ - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. --------- Co-authored-by: lachiewalker <lachiewalker1@hotmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-29 00:27:46 +00:00
Paulo Nascimento	4c3a67122f	community[patch]: add Integration for OpenAI image gen with v1 sdk (#17771 ) Description: Created a Langchain Tool for OpenAI DALLE Image Generation. Issue: [#15901](https://github.com/langchain-ai/langchain/issues/15901) Dependencies: n/a Twitter handle: @paulodoestech - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-29 00:23:14 +00:00
Jiaming	3d3cc71287	community[patch]: fix bugs for bilibili Loader (#18036 ) - Description: 1. Fix the BiliBiliLoader that can receive cookie parameters, it requires 3 other parameters to run. The change is backward compatible. 2. Add test; 3. Add example in docs - Issue: [#14213] Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-28 16:39:38 -07:00
Sachin Paryani	25c9f3d1d1	community[patch]: Support Streaming in Azure Machine Learning (#18246 ) - [x] PR title: "community: Support streaming in Azure ML and few naming changes" - [x] PR message: - Description: Added support for streaming for azureml_endpoint. Also, renamed and AzureMLEndpointApiType.realtime to AzureMLEndpointApiType.dedicated. Also, added new classes CustomOpenAIChatContentFormatter and CustomOpenAIContentFormatter and updated the classes LlamaChatContentFormatter and LlamaContentFormatter to now show a deprecated warning message when instantiated. --------- Co-authored-by: Sachin Paryani <saparan@microsoft.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-28 23:38:20 +00:00
Victor Adan	afa2d85405	community[patch]: Added missing from_documents method to KNNRetriever. (#18411 ) - Description: Added missing `from_documents` method to `KNNRetriever`, providing the ability to supply metadata to LangChain `Document`s, and to give it parity to the other retrievers, which do have `from_documents`. - Issue: None - Dependencies: None - Twitter handle: None Co-authored-by: Victor Adan <vadan@netroadshow.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-28 23:18:50 +00:00
Smit Parmar	dfc4177b50	community[patch]: mypy ignore fix (#18483 ) Relates to #17048 Description : Applied fix to dynamodb and elasticsearch file. Error was : `Cannot override writeable attribute with read-only property` Suggestion: instead of adding ``` @messages.setter def messages(self, messages: List[BaseMessage]) -> None: raise NotImplementedError("Use add_messages instead") ``` we can change base class property `messages: List[BaseMessage]` to ``` @property def messages(self) -> List[BaseMessage]:... ``` then we don't need to add `@messages.setter` in all child classes.	2024-03-28 15:36:53 -07:00
Luca Dorigo	f19229c564	core[patch]: fix beta, deprecated typing (#18877 ) Description: While not technically incorrect, the TypeVar used for the `@beta` decorator prevented pyright (and thus most vscode users) from correctly seeing the types of functions/classes decorated with `@beta`. This is in part due to a small bug in pyright (https://github.com/microsoft/pyright/issues/7448 ) - however, the `Type` bound in the typevar `C = TypeVar("C", Type, Callable)` is not doing anything - classes are `Callables` by default, so by my understanding binding to `Type` does not actually provide any more safety - the modified annotation still works correctly for both functions, properties, and classes. --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-28 22:33:43 +00:00
wulixuan	b7c8bc8268	community[patch]: fix yuan2 errors in LLMs (#19004 ) 1. fix yuan2 errors while invoke Yuan2. 2. update tests.	2024-03-28 14:37:44 -07:00

1 2 3 4 5 ...

775 Commits