langchain

mirror of https://github.com/hwchase17/langchain synced 2024-10-29 17:07:25 +00:00

Author	SHA1	Message	Date
Christoph Grotz	5a4ce9ef2b	VertexAI now allows to tune codey models (#10367 ) Description: VertexAI now supports to tune codey models, I adapted the Vertex AI LLM wrapper accordingly https://cloud.google.com/vertex-ai/docs/generative-ai/models/tune-code-models	2023-09-08 09:12:24 -07:00
William FH	1b0eebe1e3	Support multiple errors (#10376 ) in on_retry	2023-09-08 09:07:15 -07:00
Bagatur	d2d11ccf63	bump 285 (#10373 )	2023-09-08 08:26:31 -07:00
William FH	46e9abdc75	Add progress bar + runner fixes (#10348 ) - Add progress bar to eval runs - Use thread pool for concurrency - Update some error messages - Friendlier project name - Print out quantiles of the final stats Closes LS-902	2023-09-08 07:45:28 -07:00
C Mazzoni	01e9d7902d	Update tool.py (#10203 ) Fixed the description of tool QuerySQLCheckerTool, the last line of the string description had the old name of the tool 'sql_db_query', this caused the models to sometimes call the non-existent tool The issue was not numerically identified. No dependencies	2023-09-07 22:04:55 -07:00
stopdropandrew	28de8d132c	Change StructuredTool's ainvoke to await (#10300 ) Fixes #10080. StructuredTool's `ainvoke` doesn't `await`.	2023-09-07 19:54:53 -07:00
Leonid Ganeline	1b3ea1eeb4	docstrings: `chat_loaders` (#10307 ) Updated docstrings. Made them consistent across the module.	2023-09-07 19:35:34 -07:00
Bagatur	8826293c88	Add multilingual data anon chain (#10346 )	2023-09-07 15:15:08 -07:00
Greg Richardson	300559695b	Supabase vector self querying retriever (#10304 ) ## Description Adds Supabase Vector as a self-querying retriever. - Designed to be backwards compatible with existing `filter` logic on `SupabaseVectorStore`. - Adds new filter `postgrest_filter` to `SupabaseVectorStore` `similarity_search()` methods - Supports entire PostgREST [filter query language](https://postgrest.org/en/stable/references/api/tables_views.html#read) (used by self-querying retriever, but also works as an escape hatch for more query control) - `SupabaseVectorTranslator` converts Langchain filter into the above PostgREST query - Adds Jupyter Notebook for the self-querying retriever - Adds tests ## Tag maintainer @hwchase17 ## Twitter handle [@ggrdson](https://twitter.com/ggrdson)	2023-09-07 15:03:26 -07:00
Tze Min	20c742d8a2	Enhancement: add parameter boto3_session for AWS DynamoDB cross account use cases (#10326 ) - Description: to allow boto3 assume role for AWS cross account use cases to read and update the chat history, - Issue: use case I faced in my company, - Dependencies: no - Tag maintainer: @baskaryan , - Twitter handle: @tmin97 --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-09-07 14:58:28 -07:00
maks-operlejn-ds	274c3dc3a8	Multilingual anonymization (#10327 ) ### Description Add multiple language support to Anonymizer PII detection in Microsoft Presidio relies on several components - in addition to the usual pattern matching (e.g. using regex), the analyser uses a model for Named Entity Recognition (NER) to extract entities such as: - `PERSON` - `LOCATION` - `DATE_TIME` - `NRP` - `ORGANIZATION` [[Source]](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/presidio_analyzer/predefined_recognizers/spacy_recognizer.py) To handle NER in specific languages, we utilize unique models from the `spaCy` library, recognized for its extensive selection covering multiple languages and sizes. However, it's not restrictive, allowing for integration of alternative frameworks such as [Stanza](https://microsoft.github.io/presidio/analyzer/nlp_engines/spacy_stanza/) or [transformers](https://microsoft.github.io/presidio/analyzer/nlp_engines/transformers/) when necessary. ### Future works - automatic language detection - instead of passing the language as a parameter in `anonymizer.anonymize`, we could detect the language/s beforehand and then use the corresponding NER model. We have discussed this internally and @mateusz-wosinski-ds will look into a standalone language detection tool/chain for LangChain 😄 ### Twitter handle @deepsense_ai / @MaksOpp ### Tag maintainer @baskaryan @hwchase17 @hinthornw	2023-09-07 14:42:24 -07:00
Ofer Mendelevitch	a9eb7c6cfc	Adding Self-querying for Vectara (#10332 ) - Description: Adding support for self-querying to Vectara integration - Issue: per customer request - Tag maintainer: @rlancemartin @baskaryan - Twitter handle: @ofermend Also updated some documentation, added self-query testing, and a demo notebook with self-query example.	2023-09-07 10:24:50 -07:00
Bagatur	25ec655e4f	supabase embedding usage fix (#10335 ) Should be calling Embeddings.embed_query instead of embed_documents when searching	2023-09-07 10:04:49 -07:00
Bagatur	672907bbbb	bump 284 (#10330 )	2023-09-07 08:45:42 -07:00
maks-operlejn-ds	4cc4534d81	Data deanonymization (#10093 ) ### Description The feature for pseudonymizing data with ability to retrieve original text (deanonymization) has been implemented. In order to protect private data, such as when querying external APIs (OpenAI), it is worth pseudonymizing sensitive data to maintain full privacy. But then, after the model response, it would be good to have the data in the original form. I implemented the `PresidioReversibleAnonymizer`, which consists of two parts: 1. anonymization - it works the same way as `PresidioAnonymizer`, plus the object itself stores a mapping of made-up values to original ones, for example: ``` { "PERSON": { "<anonymized>": "<original>", "John Doe": "Slim Shady" }, "PHONE_NUMBER": { "111-111-1111": "555-555-5555" } ... } ``` 2. deanonymization - using the mapping described above, it matches fake data with original data and then substitutes it. Between anonymization and deanonymization user can perform different operations, for example, passing the output to LLM. ### Future works - instance anonymization - at this point, each occurrence of PII is treated as a separate entity and separately anonymized. Therefore, two occurrences of the name John Doe in the text will be changed to two different names. It is therefore worth introducing support for full instance detection, so that repeated occurrences are treated as a single object. - better matching and substitution of fake values for real ones - currently the strategy is based on matching full strings and then substituting them. Due to the indeterminism of language models, it may happen that the value in the answer is slightly changed (e.g. John Doe -> John or Main St, New York -> New York) and such a substitution is then no longer possible. Therefore, it is worth adjusting the matching for your needs. - Q&A with anonymization - when I'm done writing all the functionality, I thought it would be a cool resource in documentation to write a notebook about retrieval from documents using anonymization. An iterative process, adding new recognizers to fit the data, lessons learned and what to look out for ### Twitter handle @deepsense_ai / @MaksOpp --------- Co-authored-by: MaksOpp <maks.operlejn@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-09-06 21:33:24 -07:00
刘方瑞	890ed775a3	Resolve: VectorSearch enabled SQLChain? (#10177 ) Squashed from #7454 with updated features We have separated the `SQLDatabseChain` from `VectorSQLDatabseChain` and put everything into `experimental/`. Below is the original PR message from #7454. ------- We have been working on features to fill up the gap among SQL, vector search and LLM applications. Some inspiring works like self-query retrievers for VectorStores (for example [Weaviate](https://python.langchain.com/en/latest/modules/indexes/retrievers/examples/weaviate_self_query.html) and [others](https://python.langchain.com/en/latest/modules/indexes/retrievers/examples/self_query.html)) really turn those vector search databases into a powerful knowledge base! 🚀🚀 We are thinking if we can merge all in one, like SQL and vector search and LLMChains, making this SQL vector database memory as the only source of your data. Here are some benefits we can think of for now, maybe you have more 👀: With ALL data you have: since you store all your pasta in the database, you don't need to worry about the foreign keys or links between names from other data source. Flexible data structure: Even if you have changed your schema, for example added a table, the LLM will know how to JOIN those tables and use those as filters. SQL compatibility: We found that vector databases that supports SQL in the marketplace have similar interfaces, which means you can change your backend with no pain, just change the name of the distance function in your DB solution and you are ready to go! ### Issue resolved: - [Feature Proposal: VectorSearch enabled SQLChain?](https://github.com/hwchase17/langchain/issues/5122) ### Change made in this PR: - An improved schema handling that ignore `types.NullType` columns - A SQL output Parser interface in `SQLDatabaseChain` to enable Vector SQL capability and further more - A Retriever based on `SQLDatabaseChain` to retrieve data from the database for RetrievalQAChains and many others - Allow `SQLDatabaseChain` to retrieve data in python native format - Includes PR #6737 - Vector SQL Output Parser for `SQLDatabaseChain` and `SQLDatabaseChainRetriever` - Prompts that can implement text to VectorSQL - Corresponding unit-tests and notebook ### Twitter handle: - @MyScaleDB ### Tag Maintainer: Prompts / General: @hwchase17, @baskaryan DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev ### Dependencies: No dependency added	2023-09-06 17:08:12 -07:00
Bagatur	0c760f184c	Update NucliaDB vecstore deps	2023-09-06 16:29:10 -07:00
Eric BREHAULT	19b4ecdc39	Implement NucliaDB vector store (#10236 ) # Description This pull request allows to use the [NucliaDB](https://docs.nuclia.dev/docs/docs/nucliadb/intro) as a vector store in LangChain. It works with both a [local NucliaDB instance](https://docs.nuclia.dev/docs/docs/nucliadb/deploy/basics) or with [Nuclia Cloud](https://nuclia.cloud). # Dependencies It requires an up-to-date version of the `nuclia` Python package. @rlancemartin, @eyurtsev, @hinthornw, please review it when you have a moment :) Note: our Twitter handler is `@NucliaAI`	2023-09-06 16:26:14 -07:00
cccs-eric	b64a443f72	Fix SQL search_path for Trino query engine (#10248 ) This PR replaces the generic `SET search_path TO` statement by `USE` for the Trino dialect since Trino does not support `SET search_path`. Official Trino documentation can be found [here](https://trino.io/docs/current/sql/use.html). With this fix, the `SQLdatabase` will now be able to set the current schema and execute queries using the Trino engine. It will use the catalog set as default by the connection uri.	2023-09-06 16:19:37 -07:00
Brian Antonelli	4df101cf77	Don't hardcode PGVector distance strategies (#10265 ) - Description: Remove hardcoded/duplicated distance strategies in the PGVector store. - Issue: NA - Dependencies: NA - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: @archmonkeymojo --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-09-06 15:20:44 -07:00
JaéGeR	b8669b249e	Added Hugging face inference api (#10280 ) Embed documents without locally downloading the HF model --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-09-06 14:55:48 -07:00
Ilya	6e6f15df24	Add strip text splits flag (#10295 ) #10085 --------- Co-authored-by: codesee-maps[bot] <86324825+codesee-maps[bot]@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-09-06 14:06:12 -07:00
ParamdeepSinghShorthillsAI	3cc242b591	Update rwkv.py import error (#10293 ) I have updated the code to ensure consistent error handling for ImportError. Instead of relying on ValueError as before, I've followed the standard practice of raising ImportError while also including detailed error messages. This modification improves code clarity and explicitly indicates that any issues are related to module imports.	2023-09-06 13:50:21 -07:00
Tomaz Bratanic	db73c9d5b5	Diffbot Graph Transformer / Neo4j Graph document ingestion (#9979 ) Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-09-06 13:32:59 -07:00
Predrag Gruevski	ccb9e3ee2d	Install dev, lint, test, typing extra deps for linting steps. (#10249 ) `mypy` cannot type-check code that relies on dependencies that aren't installed. Eventually we'll probably want to install as many optional dependencies as possible. However, the full "extended deps" setup for langchain creates a 3GB cache file and takes a while to unpack and install. We'll probably want something a bit more targeted. This is a first step toward something better.	2023-09-06 11:15:28 -04:00
Predrag Gruevski	82d5d4d0ae	Deny creating files as a result of test runs. (#10253 ) A test file was accidentally dropping a `results.json` file in the current working directory as a result of running `make test`. This is undesirable, since we don't want to risk accidentally adding stray files into the repo if we run tests locally and then do `git add .` without inspecting the file list very closely.	2023-09-06 11:15:16 -04:00
Predrag Gruevski	8d5bf1fb20	Fix langchain lint on `master`. (#10289 )	2023-09-06 16:01:13 +01:00
Nik	49341483da	Update Banana.dev docs to latest correct usage (#10183 ) - Description: this PR updates all Banana.dev-related docs to match the latest client usage. The code in the docs before this PR were out of date and would never run. - Issue: [#6404](https://github.com/langchain-ai/langchain/issues/6404) - Dependencies: - - Tag maintainer: - Twitter handle: [BananaDev_ ](https://twitter.com/BananaDev_ )	2023-09-06 07:46:17 -07:00
Bagatur	9e839d4977	bump 283 (#10287 )	2023-09-06 07:33:03 -07:00
William FH	ffca5e7eea	Allow config propagation, Add default lambda name, Improve ergonomics of config passed in (#10273 ) Makes it easier to do recursion using regular python compositional patterns ```py def lambda_decorator(func): """Decorate function as a RunnableLambda""" return runnable.RunnableLambda(func) @lambda_decorator def fibonacci(a, config: runnable.RunnableConfig) -> int: if a <= 1: return a else: return fibonacci.invoke( a - 1, config ) + fibonacci.invoke(a - 2, config) fibonacci.invoke(10) ``` https://smith.langchain.com/public/cb98edb4-3a09-4798-9c22-a930037faf88/r Also makes it more natural to do things like error handle and call other langchain objects in ways we probably don't want to support in `with_fallbacks()` ```py @lambda_decorator def handle_errors(a, config: runnable.RunnableConfig) -> int: try: return my_chain.invoke(a, config) except MyExceptionType as exc: return my_other_chain.invoke({"original": a, "error": exc}, config) ``` In this case, the next chain takes in the exception object. Maybe this could be something we toggle in `with_fallbacks` but I fear we'll get into uglier APIs + heavier cognitive load if we try to do too much there --------- Co-authored-by: Nuno Campos <nuno@boringbits.io>	2023-09-06 05:54:38 -07:00
Mario Scrocca	334bd8ebbe	Fix bug in SPARQL intent selection (#8521 ) - Description: Fix bug in SPARQL intent selection - Issue: After the change in #7758 the intent is always set to "UPDATE". Indeed, if the answer to the prompt contains only "SELECT" the `find("SELECT")` operation returns a higher value w.r.t. `-1` returned by `find("UPDATE")`. - Dependencies: None, - Tag maintainer: @baskaryan @aditya-29 - Twitter handle: @mario_scrock	2023-09-05 14:37:02 -07:00
Bagatur	c8d7ee62ba	bump 282 (#10233 )	2023-09-05 07:58:00 -07:00
Nuno Campos	5d8673a3c1	Fix usage of AsyncHtmlLoader with an already running event loop (#10220 )	2023-09-05 07:25:28 -07:00
vintro	ac2310a405	add NumberedListOutputParser to output_parser init (#10204 ) `from langchain.output_parsers import NumberedListOutputParser` did not work, needed to add it to the init file	2023-09-05 01:12:41 -07:00
Junlin Zhou	8b95dabfe3	update(llms/TGI): Allow None as temperature value (#10212 ) Text Generation Inference's client permits the use of a None temperature as seen [here](`033230ae66/clients/python/text_generation/client.py (L71C9-L71C20)`). While I haved dived into TGI's server code and don't know about the implications of using None as a temperature setting, I think we should grant users the option to pass None as a temperature parameter to TGI.	2023-09-05 01:07:57 -07:00
Christophe Bornet	f389c4fcab	Fix S3DirectoryLoader exception (#10193 ) #9304 introduced a critical bug. The S3DirectoryLoader fails completely because boto3 checks the naming of kw arguments and one of the args is badly named (very sorry for that) cc @baskaryan	2023-09-04 15:59:22 -07:00
Manuel Soria	dde1992fdd	Adding custom tools to SQL Agent (#10198 ) Changes in: - `create_sql_agent` function so that user can easily add custom tools as complement for the toolkit. - updating sql use case notebook to showcase 2 examples of extra tools. Motivation for these changes is having the possibility of including domain expert knowledge to the agent, which improves accuracy and reduces time/tokens. --------- Co-authored-by: Manuel Soria <manuel.soria@greyscaleai.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-09-04 15:28:28 -07:00
ElReyZero	5dbae94e04	OpenAIEmbeddings: Add optional an optional parameter to skip empty embeddings (#10196 ) ## Description ### Issue This pull request addresses a lingering issue identified in PR #7070. In that previous pull request, an attempt was made to address the problem of empty embeddings when using the `OpenAIEmbeddings` class. While PR #7070 introduced a mechanism to retry requests for embeddings, it didn't fully resolve the issue as empty embeddings still occasionally persisted. ### Problem In certain specific use cases, empty embeddings can be encountered when requesting data from the OpenAI API. In some cases, these empty embeddings can be skipped or removed without affecting the functionality of the application. However, they might not always be resolved through retries, and their presence can adversely affect the functionality of applications relying on the `OpenAIEmbeddings` class. ### Solution To provide a more robust solution for handling empty embeddings, we propose the introduction of an optional parameter, `skip_empty`, in the `OpenAIEmbeddings` class. When set to `True`, this parameter will enable the behavior of automatically skipping empty embeddings, ensuring that problematic empty embeddings do not disrupt the processing flow. The developer will be able to optionally toggle this behavior if needed without disrupting the application flow. ## Changes Made - Added an optional parameter, `skip_empty`, to the `OpenAIEmbeddings` class. - When `skip_empty` is set to `True`, empty embeddings are automatically skipped without causing errors or disruptions. ### Example Usage ```python from openai.embeddings import OpenAIEmbeddings # Initialize the OpenAIEmbeddings class with skip_empty=True embeddings = OpenAIEmbeddings(api_key="your_api_key", skip_empty=True) # Request embeddings, empty embeddings are automatically skipped. docs is a variable containing the already splitted text. results = embeddings.embed_documents(docs) # Process results without interruption from empty embeddings ```	2023-09-04 14:10:36 -07:00
Louis	bb8c095127	Add 'download_dir' argument to VLLM (#9754 ) - Description: Add a 'download_dir' argument to VLLM model (to change the cache download directotu when retrieving a model from HF hub) - Issue: On some remote machine, I want the cache dir to be in a volume where I have space (models are heavy nowadays). Sometimes the default HF cache dir might not be what we want. - Dependencies: None --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-09-04 10:53:48 -07:00
Bagatur	098b4aa465	bump 281 (#10189 )	2023-09-04 08:51:50 -07:00
Aashish Saini	699f58fb83	Fixed Import Error type (#10168 ) I have restructured the code to ensure uniform handling of ImportError. In place of previously used ValueError, I've adopted the standard practice of raising ImportError with explanatory messages. This modification enhances code readability and clarifies that any problems stem from module importation. --------- Co-authored-by: Aashish Saini <141953346+AashishSainiShorthillsAI@users.noreply.github.com> Co-authored-by: AryamanJaiswalShorthillsAI <142397527+AryamanJaiswalShorthillsAI@users.noreply.github.com> Co-authored-by: Adarsh Shrivastav <142413097+AdarshKumarShorthillsAI@users.noreply.github.com> Co-authored-by: Vishal <141389263+VishalYadavShorthillsAI@users.noreply.github.com> Co-authored-by: ChetnaGuptaShorthillsAI <142381084+ChetnaGuptaShorthillsAI@users.noreply.github.com> Co-authored-by: PankajKumarShorthillsAI <142473460+PankajKumarShorthillsAI@users.noreply.github.com> Co-authored-by: AbhishekYadavShorthillsAI <142393903+AbhishekYadavShorthillsAI@users.noreply.github.com> Co-authored-by: AmitSinghShorthillsAI <142410046+AmitSinghShorthillsAI@users.noreply.github.com> Co-authored-by: Aayush <142384656+AayushShorthillsAI@users.noreply.github.com> Co-authored-by: AnujMauryaShorthillsAI <142393269+AnujMauryaShorthillsAI@users.noreply.github.com>	2023-09-04 08:43:28 -07:00
刘方瑞	de9e545542	MyScale hot fix on type check (#10180 ) Previous PR #9353 has incomplete type checks and deprecation warnings. This PR will fix those type check and add deprecation warning to myscale vectorstore	2023-09-04 08:40:58 -07:00
JunXiang	cb928ed3d5	Fix: the duplicate characters wrong results when using `pdfplumber loader` (#10165 ) (Reopen PR #7706, hope this problem can fix.) When using `pdfplumber`, some documents may be parsed incorrectly, resulting in duplicated characters. Taking the [linked](https://bruusgaard.no/wp-content/uploads/2021/05/Datasheet1000-series.pdf) document as an example: ## Before ```python from langchain.document_loaders import PDFPlumberLoader pdf_file = 'file.pdf' loader = PDFPlumberLoader(pdf_file) docs = loader.load() print(docs[0].page_content) ``` Results: ``` 11000000 SSeerriieess PPoorrttaabbllee ssiinnggllee ggaass ddeetteeccttoorrss ffoorr HHyyddrrooggeenn aanndd CCoommbbuussttiibbllee ggaasseess TThhee RRiikkeenn KKeeiikkii GGPP--11000000 iiss aa ccoommppaacctt aanndd lliigghhttwweeiigghhtt ggaass ddeetteeccttoorr wwiitthh hhiigghh sseennssiittiivviittyy ffoorr tthhee ddeetteeccttiioonn ooff hhyyddrrooccaarrbboonnss.. TThhee mmeeaassuurreemmeenntt iiss ppeerrffoorrmmeedd ffoorr tthhiiss ppuurrppoossee bbyy mmeeaannss ooff ccaattaallyyttiicc sseennssoorr.. TThhee GGPP--11000000 hhaass aa bbuuiilltt--iinn ppuummpp wwiitthh ppuummpp bboooosstteerr ffuunnccttiioonn aanndd aa ddiirreecctt sseelleeccttiioonn ffrroomm aa lliisstt ooff 2255 hhyyddrrooccaarrbboonnss ffoorr eexxaacctt aalliiggnnmmeenntt ooff tthhee ttaarrggeett ggaass -- OOnnllyy ccaalliibbrraattiioonn oonn CCHH iiss nneecceessssaarryy.. 44 FFeeaattuurreess TThhee RRiikkeenn KKeeiikkii 110000vvvvttaabbllee ssiinnggllee HHyyddrrooggeenn aanndd CCoommbbuussttiibbllee ggaass ddeetteeccttoorrss.. TThheerree aarree 33 ssttaannddaarrdd mmooddeellss:: GGPP--11000000:: 00--1100%%LLEELL // 00--110000%%LLEELL ›› LLEELL ddeetteeccttoorr NNCC--11000000:: 00--11000000ppppmm // 00--1100000000ppppmm ›› PPPPMM ddeetteeccttoorr DDiirreecctt rreeaaddiinngg ooff tthhee ccoonncceennttrraattiioonn vvaalluueess ooff ccoommbbuussttiibbllee ggaasseess ooff 2255 ggaasseess ((55 NNPP--11000000)).. EEaassyy ooppeerraattiioonn ffeeaattuurree ooff cchhaannggiinngg tthhee ggaass nnaammee ddiissppllaayy wwiitthh 11 sswwiittcchh bbuuttttoonn.. LLoonngg ddiissttaannccee ddrraawwiinngg ppoossssiibbllee wwiitthh tthhee ppuummpp bboooosstteerr ffuunnccttiioonn.. VVaarriioouuss ccoommbbuussttiibbllee ggaasseess ccaann bbee mmeeaassuurreedd bbyy tthhee ppppmm oorrddeerr wwiitthh NNCC--11000000.. www.bruusgaard.no postmaster@bruusgaard.no +47 67 54 93 30 Rev: 446-2 ``` We can see that there are a large number of duplicated characters in the text, which can cause issues in subsequent applications. ## After Therefore, based on the [solution](https://github.com/jsvine/pdfplumber/issues/71) provided by the `pdfplumber` source project. I added the `"dedupe_chars()"` method to address this problem. (Just pass the parameter `dedupe` to `True`) ```python from langchain.document_loaders import PDFPlumberLoader pdf_file = 'file.pdf' loader = PDFPlumberLoader(pdf_file, dedupe=True) docs = loader.load() print(docs[0].page_content) ``` Results: ``` 1000 Series Portable single gas detectors for Hydrogen and Combustible gases The Riken Keiki GP-1000 is a compact and lightweight gas detector with high sensitivity for the detection of hydrocarbons. The measurement is performed for this purpose by means of catalytic sensor. The GP-1000 has a built-in pump with pump booster function and a direct selection from a list of 25 hydrocarbons for exact alignment of the target gas - Only calibration on CH is necessary. 4 Features The Riken Keiki 100vvtable single Hydrogen and Combustible gas detectors. There are 3 standard models: GP-1000: 0-10%LEL / 0-100%LEL › LEL detector NC-1000: 0-1000ppm / 0-10000ppm › PPM detector Direct reading of the concentration values of combustible gases of 25 gases (5 NP-1000). Easy operation feature of changing the gas name display with 1 switch button. Long distance drawing possible with the pump booster function. Various combustible gases can be measured by the ppm order with NC-1000. www.bruusgaard.no postmaster@bruusgaard.no +47 67 54 93 30 Rev: 446-2 ``` --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-09-04 08:37:00 -07:00
Aashish Saini	27944cb611	Fixed Import Error (#10167 ) I have restructured the code to ensure uniform handling of ImportError. In place of previously used ValueError, I've adopted the standard practice of raising ImportError with explanatory messages. This modification enhances code readability and clarifies that any problems stem from module importation. --------- Co-authored-by: Aashish Saini <141953346+AashishSainiShorthillsAI@users.noreply.github.com> Co-authored-by: AryamanJaiswalShorthillsAI <142397527+AryamanJaiswalShorthillsAI@users.noreply.github.com> Co-authored-by: Adarsh Shrivastav <142413097+AdarshKumarShorthillsAI@users.noreply.github.com> Co-authored-by: Vishal <141389263+VishalYadavShorthillsAI@users.noreply.github.com> Co-authored-by: ChetnaGuptaShorthillsAI <142381084+ChetnaGuptaShorthillsAI@users.noreply.github.com> Co-authored-by: PankajKumarShorthillsAI <142473460+PankajKumarShorthillsAI@users.noreply.github.com> Co-authored-by: AbhishekYadavShorthillsAI <142393903+AbhishekYadavShorthillsAI@users.noreply.github.com> Co-authored-by: AmitSinghShorthillsAI <142410046+AmitSinghShorthillsAI@users.noreply.github.com> Co-authored-by: Aayush <142384656+AayushShorthillsAI@users.noreply.github.com> Co-authored-by: AnujMauryaShorthillsAI <142393269+AnujMauryaShorthillsAI@users.noreply.github.com>	2023-09-04 00:32:09 -07:00
Massimiliano Pronesti	10e0431e48	feat(llms): add model_kwargs to hf tgi (#10139 ) @baskaryan Following what we discussed in #9724 and your suggestion, I've added a `model_kwargs` parameter to hf tgi.	2023-09-04 00:24:13 -07:00
Eugene Yurtsev	e0f6ba08d6	FileSysteBlobLoader: Expand user path (#10133 ) Fix for: https://github.com/langchain-ai/langchain/issues/10019 Verified fix manually	2023-09-04 00:21:33 -07:00
Krish Dholakia	31bbe80758	add additional model support to chatlitellm (#10134 ) --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-09-04 00:16:40 -07:00
IlyaKIS1	de3322609e	Implemented Milvus translator for self-querying (#10162 ) - Implemented the MilvusTranslator for self-querying using Milvus vector store - Made unit tests to test its functionality - Documented the Milvus self-querying	2023-09-04 00:16:18 -07:00
Christophe Bornet	803d0d9656	Add the possibility to configure boto3 in the S3 loaders (#9304 ) - Description: this PR adds the possibility to configure boto3 in the S3 loaders. Any named argument you add will be used to create the Boto3 session. This is useful when the AWS credentials can't be passed as env variables or can't be read from the credentials file. - Issue: N/A - Dependencies: N/A - Tag maintainer: ? - Twitter handle: cbornet_ --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-09-03 21:06:49 -07:00
Xiaoyu Xee	9bcfd58580	Add dashvector self query retriever (#9684 ) ## Description Add `Dashvector` retriever and self-query retriever ## How to use ```python from langchain.vectorstores.dashvector import DashVector vectorstore = DashVector.from_documents(docs, embeddings) retriever = SelfQueryRetriever.from_llm( llm, vectorstore, document_content_description, metadata_field_info, verbose=True ) ``` --------- Co-authored-by: smallrain.xuxy <smallrain.xuxy@alibaba-inc.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-09-03 20:51:04 -07:00

1 2 3 4 5 ...

796 Commits