Commit Graph

1894 Commits (parallel_dir_loader_back)
 

Author SHA1 Message Date
Harrison Chase b2f920e891
add tracing v2 env var (#4465)
Co-authored-by: Ankush Gola <ankush.gola@gmail.com>
1 year ago
Zander Chase 9231143f91
Fix Duplicate trust_remote_code in pipeline (#4369)
### Fix issue with duplicate specification of `trust_remote_code` in
HuggingFacePipeline

Fixes # 4351
1 year ago
Davis Chase 6fbdb9ce51
Release 0.0.164 (#4454) 1 year ago
Davis Chase 04475bea7d
Mv plan and execute to experimental (#4459) 1 year ago
netseye 1ad180f6de
Add request timeout to openai embedding (#4144)
Add request_timeout field to openai embedding. Defaults to None

---------

Co-authored-by: Jeakin <Jeakin@botu.cc>
1 year ago
zvrr 274dc4bc53
add clickhouse prompt (#4456)
# Add clickhouse prompt

Add clickhouse database sql prompt
1 year ago
Paresh Mathur 05e749d9fe
make running specific unit tests easier (#4336)
I find it's easier to do TDD if i can run specific unit tests. I know
watch is there but some people prefer running their tests manually.
1 year ago
Eugene Yurtsev 80558b5b27
Add workflow for testing with all deps (#4410)
# Add action to test with all dependencies installed

PR adds a custom action for setting up poetry that allows specifying a
cache key:
https://github.com/actions/setup-python/issues/505#issuecomment-1273013236

This makes it possible to run 2 types of unit tests: 

(1) unit tests with only core dependencies
(2) unit tests with extended dependencies (e.g., those that rely on an
optional pdf parsing library)


As part of this PR, we're moving some pdf parsing tests into the
unit-tests section and making sure that these unit tests get executed
when running with extended dependencies.
1 year ago
Matt Robinson 3637d6da6e
feat: add loader for open office odt files (#4405)
# ODF File Loader

Adds a data loader for handling Open Office ODT files. Requires
`unstructured>=0.6.3`.

### Testing

The following should work using the `fake.odt` example doc from the
[`unstructured` repo](https://github.com/Unstructured-IO/unstructured).

```python
from langchain.document_loaders import UnstructuredODTLoader

loader = UnstructuredODTLoader(file_path="fake.odt", mode="elements")
loader.load()

loader = UnstructuredODTLoader(file_path="fake.odt", mode="single")
loader.load()
```
1 year ago
Zander Chase 65f85af242
Improve math chain error msg (#4415) 1 year ago
Davis Chase f6c97e6af4
Fix Lark import error (#4421)
Any import that touches langchain.retrievers currently requires Lark.
Here's one attempt to fix. Not very pretty, very open to other ideas.
Alternatives I thought of are 1) make Lark requirement, 2) put
everything in parser.py in the try/except. Neither sounds much better

Related to #4316, #4275
1 year ago
Harrison Chase f0cfed636f change nb name 1 year ago
Harrison Chase 6b8d144ccc
Harrison/plan and solve (#4422) 1 year ago
StephaneBereux d383c0cb43
fixed the filtering error in chromadb (#1621)
Fixed two small bugs (as reported in issue #1619 ) in the filtering by
metadata for `chroma` databases :
- ```langchain.vectorstores.chroma.similarity_search``` takes a
```filter``` input parameter but do not forward it to
```langchain.vectorstores.chroma.similarity_search_with_score```
- ```langchain.vectorstores.chroma.similarity_search_by_vector```
doesn't take this parameter in input, although it could be very useful,
without any additional complexity - and it would thus be coherent with
the syntax of the two other functions.

Co-authored-by: Davis Chase <130488702+dev2049@users.noreply.github.com>
1 year ago
jrhe 28091c2101
Use passed LLM for default chain in MultiPromptChain (#4418)
Currently, MultiPromptChain instantiates a ChatOpenAI LLM instance for
the default chain to use if none of the prompts passed match. This seems
like an error as it means that you can't use your choice of LLM, or
configure how to instantiate the default LLM (e.g. passing in an API key
that isn't in the usual env variable).
1 year ago
Davis Chase 5c8e12558d
Dev2049/pinecone try except (#4424)
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Bernie G <bernie.gandin2@gmail.com>
1 year ago
Rukmani 2b14036126
Update WhatsAppChatLoader to include the character ~ in the sender name (#4420)
Fixes #4153

If the sender of a message in a group chat isn't in your contact list,
they will appear with a ~ prefix in the exported chat. This PR adds
support for parsing such lines.
1 year ago
Zander Chase f2150285a4
Fix nested runs example ID (#4413)
#### Only reference example ID on the parent run

Previously, I was assigning the example ID to every child run. 
Adds a test.
1 year ago
Davis Chase e4ca511ec8
Delete comment (#4412) 1 year ago
mbchang 9fafe7b2b9
fix: remove unnecessary line of code (#4408)
Removes unnecessary line of code in
https://python.langchain.com/en/latest/use_cases/agent_simulations/two_agent_debate_tools.html
1 year ago
Aivin V. Solatorio 6335cb5b3a
Add support for Qdrant nested filter (#4354)
# Add support for Qdrant nested filter

This extends the filter functionality for the Qdrant vectorstore. The
current filter implementation is limited to a single-level metadata
structure; however, Qdrant supports nested metadata filtering. This
extends the functionality for users to maximize the filter functionality
when using Qdrant as the vectorstore.

Reference: https://qdrant.tech/documentation/filtering/#nested-key

---------

Signed-off-by: Aivin V. Solatorio <avsolatorio@gmail.com>
1 year ago
Martin Holzhauer 872605a5c5
Add an option to extract more metadata from crawled websites (#4347)
This pr makes it possible to extract more metadata from websites for
later use.

my usecase:
parsing ld+json or microdata from sites and store it as structured data
in the metadata field
1 year ago
Leonid Ganeline ce15ffae6a
added `Wikipedia` retriever (#4302)
- added `Wikipedia` retriever. It is effectively a wrapper for
`WikipediaAPIWrapper`. It wrapps load() into get_relevant_documents()
- sorted `__all__` in the `retrievers/__init__`
- added integration tests for the WikipediaRetriever
- added an example (as Jupyter notebook) for the WikipediaRetriever
1 year ago
Davis Chase ea83eed9ba
Bump to version 0.0.163 (#4382) 1 year ago
Prayson Wilfred Daniel 2b4ba203f7
query correction from when to what (#4383)
# Minor Wording Documentation Change 

```python
agent_chain.run("When's my friend Eric's surname?")
# Answer with 'Zhu'
```

is change to 

```python
agent_chain.run("What's my friend Eric's surname?")
# Answer with 'Zhu'
```

I think when is a residual of the old query that was "When’s my friends
Eric`s birthday?".
1 year ago
Eugene Yurtsev 2ceb807da2
Add PDF parser implementations (#4356)
# Add PDF parser implementations

This PR separates the data loading from the parsing for a number of
existing PDF loaders.

Parser tests have been designed to help encourage developers to create a
consistent interface for parsing PDFs.

This interface can be made more consistent in the future by adding
information into the initializer on desired behavior with respect to splitting by
page etc.

This code is expected to be backwards compatible -- with the exception
of a bug fix with pymupdf parser which was returning `bytes` in the page
content rather than strings.

Also changing the lazy parser method of document loader to return an
Iterator rather than Iterable over documents.

## Before submitting

<!-- If you're adding a new integration, include an integration test and
an example notebook showing its use! -->

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

@

<!-- For a quicker response, figure out the right person to tag with @

        @hwchase17 - project lead

        Tracing / Callbacks
        - @agola11

        Async
        - @agola11

        DataLoader Abstractions
        - @eyurtsev

        LLM/Chat Wrappers
        - @hwchase17
        - @agola11

        Tools / Toolkits
        - @vowelparrot
 -->
1 year ago
Eugene Yurtsev ae0c3382dd
Add MimeType based parser (#4376)
# Add MimeType Based Parser

This PR adds a MimeType Based Parser. The parser inspects the mime-type
of the blob it is parsing and based on the mime-type can delegate to the sub
parser.

## Before submitting

Waiting on adding notebooks until more implementations are landed. 

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:


@hwchase17
@vowelparrot
1 year ago
Leonid Ganeline c485e7ab59
added GitHub star number (#4214)
added GitHub star number with a link to the `GitHub star history chart`
This is an interesting chart https://star-history.com/#hwchase17/langchain :)
1 year ago
Heath 0d568daacb
Update writer integration (#4363)
# Update Writer LLM integration

Changes the parameters and base URL to be in line with Writer's current
API.
Based on the documentation on this page:
https://dev.writer.com/reference/completions-1
1 year ago
BioErrorLog 04f765b838
Fix grammar in Text Splitters docs (#4373)
# Fix grammar in Text Splitters docs

Just a small fix of grammar in the documentation:

"That means there two different axes" -> "That means there are two
different axes"
1 year ago
Zander Chase c73cec5ac1
Add Example Notebook for LCP Client (#4207)
Add a notebook in the `experimental/` directory detailing:
- How to capture traces with the v2 endpoint
- How to create datasets
- How to run traces over the dataset
1 year ago
mbchang f1401a6dff
new example: two agent debate with tools (#4024) 1 year ago
玄猫 deffc65693
fix: vectorstore pgvector ensure compatibility #3884 (#4248)
Ensure compatibility with both SQLAlchemy v1/v2 

fix the issue when using SQLAlchemy v1 (reported at #3884)

`
langchain/vectorstores/pgvector.py", line 168, in
create_tables_if_not_exists
    self._conn.commit()
AttributeError: 'Connection' object has no attribute 'commit'
`

Ref Doc :
https://docs.sqlalchemy.org/en/14/changelog/migration_20.html#migration-20-autocommit
1 year ago
Davis Chase ba0057c077
Check OpenAI model kwargs (#4366)
Handle duplicate and incorrectly specified OpenAI params

Thanks @PawelFaron for the fix! Made small update

Closes #4331

---------

Co-authored-by: PawelFaron <42373772+PawelFaron@users.noreply.github.com>
Co-authored-by: Pawel Faron <ext-pawel.faron@vaisala.com>
1 year ago
Davis Chase 02ebb15c4a
Fix TextSplitter.from_tiktoken(#4361)
Thanks to @danb27 for the fix! Minor update

Fixes https://github.com/hwchase17/langchain/issues/4357

---------

Co-authored-by: Dan Bianchini <42096328+danb27@users.noreply.github.com>
1 year ago
Naveen Tatikonda 782df1db10
OpenSearch: Add Similarity Search with Score (#4089)
### Description
Add `similarity_search_with_score` method for OpenSearch to return
scores along with documents in the search results

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
1 year ago
Ankush Gola b3ecce0545
fix json saving, update docs to reference anthropic chat model (#4364)
Fixes # (issue)
https://github.com/hwchase17/langchain/issues/4085
1 year ago
ImmortalZ b04d84f6b3
fix: solve the infinite loop caused by 'add_memory' function when run… (#4318)
fix: solve the infinite loop caused by 'add_memory' function when run
'pause_to_reflect' function

run steps:
'add_memory' -> 'pause_to_reflect' -> 'add_memory':  infinite loop
1 year ago
Eugene Yurtsev aa11f7c89b
Add progress bar to filesystemblob loader, update pytest config for unit tests (#4212)
This PR adds:

* Option to show a tqdm progress bar when using the file system blob loader
* Update pytest run configuration to be stricter
* Adding a new marker that checks that required pkgs exist
1 year ago
Eduard van Valkenburg f4c8502e61
fix for cosmos not loading old messages (#4094)
I noticed cosmos was not loading old messages properly, fixed now.
1 year ago
Simba Khadder d84df25466
Add example on how to use Featureform with langchain (#4337)
Added an example on how to use Featureform to
connecting_to_a_feature_store.ipynb .
1 year ago
Harrison Chase 42df78d396
bump ver 162 (#4346) 1 year ago
Zander Chase 8b284f9ad0
Pass parsed inputs through to tool _run (#4309) 1 year ago
Zander Chase 35c9e6ab40
Pass Callbacks through load_tools (#4298)
- Update the load_tools method to properly accept `callbacks` arguments.
- Add a deprecation warning when `callback_manager` is passed
- Add two unit tests to check the deprecation warning is raised and to
confirm the callback is passed through.

Closes issue #4096
1 year ago
Zander Chase 0870a45a69
Add Pull Request Template (#4247) 1 year ago
Jinto Jose 8a338412fa
mongodb support for chat history (#4266) 1 year ago
Harrison Chase f510940bde
add check for lower bound of lark (#4287) 1 year ago
Harrison Chase c8b0b6e6c1
add youtube tools (#4320) 1 year ago
PawelFaron 1d1166ded6
Fixed huggingfacehub_api_token hadning in HuggingFaceEndpoint (#4335)
Reported here:
https://github.com/hwchase17/langchain/issues/4334

---------

Co-authored-by: Pawel Faron <ext-pawel.faron@vaisala.com>
1 year ago
Arjun Aravindan 637c61cffb
Add support for passing binary_location to the SeleniumURLLoader when creating Chrome or Firefox web drivers (#4305)
This commit adds support for passing binary_location to the SeleniumURLLoader when creating Chrome or Firefox web drivers.

This allows users to specify the Browser binary location which is required when deploying to services such as Heroku

This change also includes updated documentation and type hints to reflect the new binary_location parameter and its usage.

fixes #4304
1 year ago