Commit Graph

1921 Commits (textloader_autodetect_encodings)
 

Author SHA1 Message Date
Davis Chase ea83eed9ba
Bump to version 0.0.163 (#4382) 1 year ago
Prayson Wilfred Daniel 2b4ba203f7
query correction from when to what (#4383)
# Minor Wording Documentation Change 

```python
agent_chain.run("When's my friend Eric's surname?")
# Answer with 'Zhu'
```

is change to 

```python
agent_chain.run("What's my friend Eric's surname?")
# Answer with 'Zhu'
```

I think when is a residual of the old query that was "When’s my friends
Eric`s birthday?".
1 year ago
Eugene Yurtsev 2ceb807da2
Add PDF parser implementations (#4356)
# Add PDF parser implementations

This PR separates the data loading from the parsing for a number of
existing PDF loaders.

Parser tests have been designed to help encourage developers to create a
consistent interface for parsing PDFs.

This interface can be made more consistent in the future by adding
information into the initializer on desired behavior with respect to splitting by
page etc.

This code is expected to be backwards compatible -- with the exception
of a bug fix with pymupdf parser which was returning `bytes` in the page
content rather than strings.

Also changing the lazy parser method of document loader to return an
Iterator rather than Iterable over documents.

## Before submitting

<!-- If you're adding a new integration, include an integration test and
an example notebook showing its use! -->

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

@

<!-- For a quicker response, figure out the right person to tag with @

        @hwchase17 - project lead

        Tracing / Callbacks
        - @agola11

        Async
        - @agola11

        DataLoader Abstractions
        - @eyurtsev

        LLM/Chat Wrappers
        - @hwchase17
        - @agola11

        Tools / Toolkits
        - @vowelparrot
 -->
1 year ago
Eugene Yurtsev ae0c3382dd
Add MimeType based parser (#4376)
# Add MimeType Based Parser

This PR adds a MimeType Based Parser. The parser inspects the mime-type
of the blob it is parsing and based on the mime-type can delegate to the sub
parser.

## Before submitting

Waiting on adding notebooks until more implementations are landed. 

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:


@hwchase17
@vowelparrot
1 year ago
Leonid Ganeline c485e7ab59
added GitHub star number (#4214)
added GitHub star number with a link to the `GitHub star history chart`
This is an interesting chart https://star-history.com/#hwchase17/langchain :)
1 year ago
Heath 0d568daacb
Update writer integration (#4363)
# Update Writer LLM integration

Changes the parameters and base URL to be in line with Writer's current
API.
Based on the documentation on this page:
https://dev.writer.com/reference/completions-1
1 year ago
BioErrorLog 04f765b838
Fix grammar in Text Splitters docs (#4373)
# Fix grammar in Text Splitters docs

Just a small fix of grammar in the documentation:

"That means there two different axes" -> "That means there are two
different axes"
1 year ago
Zander Chase c73cec5ac1
Add Example Notebook for LCP Client (#4207)
Add a notebook in the `experimental/` directory detailing:
- How to capture traces with the v2 endpoint
- How to create datasets
- How to run traces over the dataset
1 year ago
mbchang f1401a6dff
new example: two agent debate with tools (#4024) 1 year ago
玄猫 deffc65693
fix: vectorstore pgvector ensure compatibility #3884 (#4248)
Ensure compatibility with both SQLAlchemy v1/v2 

fix the issue when using SQLAlchemy v1 (reported at #3884)

`
langchain/vectorstores/pgvector.py", line 168, in
create_tables_if_not_exists
    self._conn.commit()
AttributeError: 'Connection' object has no attribute 'commit'
`

Ref Doc :
https://docs.sqlalchemy.org/en/14/changelog/migration_20.html#migration-20-autocommit
1 year ago
Davis Chase ba0057c077
Check OpenAI model kwargs (#4366)
Handle duplicate and incorrectly specified OpenAI params

Thanks @PawelFaron for the fix! Made small update

Closes #4331

---------

Co-authored-by: PawelFaron <42373772+PawelFaron@users.noreply.github.com>
Co-authored-by: Pawel Faron <ext-pawel.faron@vaisala.com>
1 year ago
Davis Chase 02ebb15c4a
Fix TextSplitter.from_tiktoken(#4361)
Thanks to @danb27 for the fix! Minor update

Fixes https://github.com/hwchase17/langchain/issues/4357

---------

Co-authored-by: Dan Bianchini <42096328+danb27@users.noreply.github.com>
1 year ago
Naveen Tatikonda 782df1db10
OpenSearch: Add Similarity Search with Score (#4089)
### Description
Add `similarity_search_with_score` method for OpenSearch to return
scores along with documents in the search results

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
1 year ago
Ankush Gola b3ecce0545
fix json saving, update docs to reference anthropic chat model (#4364)
Fixes # (issue)
https://github.com/hwchase17/langchain/issues/4085
1 year ago
ImmortalZ b04d84f6b3
fix: solve the infinite loop caused by 'add_memory' function when run… (#4318)
fix: solve the infinite loop caused by 'add_memory' function when run
'pause_to_reflect' function

run steps:
'add_memory' -> 'pause_to_reflect' -> 'add_memory':  infinite loop
1 year ago
Eugene Yurtsev aa11f7c89b
Add progress bar to filesystemblob loader, update pytest config for unit tests (#4212)
This PR adds:

* Option to show a tqdm progress bar when using the file system blob loader
* Update pytest run configuration to be stricter
* Adding a new marker that checks that required pkgs exist
1 year ago
Eduard van Valkenburg f4c8502e61
fix for cosmos not loading old messages (#4094)
I noticed cosmos was not loading old messages properly, fixed now.
1 year ago
Simba Khadder d84df25466
Add example on how to use Featureform with langchain (#4337)
Added an example on how to use Featureform to
connecting_to_a_feature_store.ipynb .
1 year ago
Harrison Chase 42df78d396
bump ver 162 (#4346) 1 year ago
Zander Chase 8b284f9ad0
Pass parsed inputs through to tool _run (#4309) 1 year ago
Zander Chase 35c9e6ab40
Pass Callbacks through load_tools (#4298)
- Update the load_tools method to properly accept `callbacks` arguments.
- Add a deprecation warning when `callback_manager` is passed
- Add two unit tests to check the deprecation warning is raised and to
confirm the callback is passed through.

Closes issue #4096
1 year ago
Zander Chase 0870a45a69
Add Pull Request Template (#4247) 1 year ago
Jinto Jose 8a338412fa
mongodb support for chat history (#4266) 1 year ago
Harrison Chase f510940bde
add check for lower bound of lark (#4287) 1 year ago
Harrison Chase c8b0b6e6c1
add youtube tools (#4320) 1 year ago
PawelFaron 1d1166ded6
Fixed huggingfacehub_api_token hadning in HuggingFaceEndpoint (#4335)
Reported here:
https://github.com/hwchase17/langchain/issues/4334

---------

Co-authored-by: Pawel Faron <ext-pawel.faron@vaisala.com>
1 year ago
Arjun Aravindan 637c61cffb
Add support for passing binary_location to the SeleniumURLLoader when creating Chrome or Firefox web drivers (#4305)
This commit adds support for passing binary_location to the SeleniumURLLoader when creating Chrome or Firefox web drivers.

This allows users to specify the Browser binary location which is required when deploying to services such as Heroku

This change also includes updated documentation and type hints to reflect the new binary_location parameter and its usage.

fixes #4304
1 year ago
Lior Neudorfer 65c95f9fb2
Better error when running chain without any args (#4294)
Today, when running a chain without any arguments, the raised ValueError
incorrectly specifies that user provided "both positional arguments and
keyword arguments".

This PR adds a more accurate error in that case.
1 year ago
Harrison Chase edcd171535
bring back ref (#4308) 1 year ago
Wuxian Zhang 6f386628c2
Permit unicode outputs when dumping json in GetElementsTool (#4276)
Adds ensure_ascii=False when dumping json in the GetElementsTool
Fixes issue https://github.com/hwchase17/langchain/issues/4265
1 year ago
Eugene Brodsky a1001b29eb
Incorrect docstring for PythonCodeTextSplitter (#4296)
Fixes a copy-paste error in the doctring
1 year ago
Ikko Eltociear Ashimine f70e18a5b3
Fix typo in huggingface.py (#4277)
enviroment -> environment
1 year ago
Eugene Yurtsev 0c646bb703
Minor clean up in BlobParser (#4210)
Minor clean up to use `abstractmethod` and `ABC` instead of `abc.abstractmethod` and `abc.ABC`.
1 year ago
PawelFaron 04b74d0446
Adjusted GPT4All llm to streaming API and added support for GPT4All_J (#4131)
Fix for these issues:
https://github.com/hwchase17/langchain/issues/4126

https://github.com/hwchase17/langchain/issues/3839#issuecomment-1534258559

---------

Co-authored-by: Pawel Faron <ext-pawel.faron@vaisala.com>
1 year ago
Harrison Chase 075d9631f5
bump ver to 161 (#4239) 1 year ago
Harrison Chase 64940e9d0f
docs for azure (#4238) 1 year ago
Myeongseop Kim 747b5f87c2
Add HumanInputLLM (#4160)
Related: #4028, I opened a new PR because (1) I was unable to unstage
mistakenly committed files (I'm not familiar with git enough to resolve
this issue), (2) I felt closing the original PR and opening a new PR
would be more appropriate if I changed the class name.

This PR creates HumanInputLLM(HumanLLM in #4028), a simple LLM wrapper
class that returns user input as the response. I also added a simple
Jupyter notebook regarding how and why to use this LLM wrapper. In the
notebook, I went over how to use this LLM wrapper and showed example of
testing `WikipediaQueryRun` using HumanInputLLM.
 
I believe this LLM wrapper will be useful especially for debugging,
educational or testing purpose.
1 year ago
Davis Chase 6cd51ef3d0
Simplify router chain constructor signatures (#4146) 1 year ago
玄猫 43a7a89e93
opt: document_loader notiondb to extract url (#4222) 1 year ago
Leonid Ganeline 9544b30821
added `Wikipedia` document loader (#4141)
- Added the `Wikipedia` document loader. It is based on the existing
`unilities/WikipediaAPIWrapper`
- Added a respective ut-s and example notebook
- Sorted list of classes in __init__
1 year ago
Eugene Yurtsev 423f497168
Add BlobParser abstraction (#3979)
This PR adds the BlobParser abstraction.

It follows the proposal described here:
https://github.com/hwchase17/langchain/pull/2833#issuecomment-1509097756
1 year ago
Davis Chase 5ca13cc1f0
Dev2049/pypdfium2 (#4209)
thanks @jerrytigerxu for the addition!

---------

Co-authored-by: Jere Xu <jtxu2008@gmail.com>
Co-authored-by: jerrytigerxu <jere.tiger.xu@gmailc.om>
1 year ago
Leonid Ganeline 59204a5033
docs: `document_loaders` improvements (#4200)
- made notebooks consistent: titles, service/format descriptions.
- corrected short names to full names, for example, `Word` -> `Microsoft
Word`
- added missed descriptions
- renamed notebook files to make ToC correctly sorted
1 year ago
Harrison Chase eeb7c96e0c
bump version to 160 (#4205) 1 year ago
Davis Chase f1fc4dfebc
Dev2049/obsidian patch (#4204)
thanks @shkarlsson for the fix! (just updated formatting)

---------

Co-authored-by: shkarlsson <sven.henrik.karlsson@gmail.com>
1 year ago
George 2324f19c85
Update qdrant interface (#3971)
Hello

1) Passing `embedding_function` as a callable seems to be outdated and
the common interface is to pass `Embeddings` instance

2) At the moment `Qdrant.add_texts` is designed to be used with
`embeddings.embed_query`, which is 1) slow 2) causes ambiguity due to 1.
It should be used with `embeddings.embed_documents`

This PR solves both problems and also provides some new tests
1 year ago
Harrison Chase 76ed41f48a
update docs (#4194) 1 year ago
Zander Chase 1017e5cee2
Add LCP Client (#4198)
Adding a client to fetch datasets, examples, and runs from a LCP
instance and run objects over them.
1 year ago
Zander Chase a30f42da4e
Update V2 Tracer (#4193)
- Update the RunCreate object to work with recent changes
- Add optional Example ID to the tracer
- Adjust default persist_session behavior to attempt to load the session
if it exists
- Raise more useful HTTP errors for logging
- Add unit testing
- Fix the default ID to be a UUID for v2 tracer sessions


Broken out from the big draft here:
https://github.com/hwchase17/langchain/pull/4061
1 year ago
Mike Wang c3044b1bf0
[test] Add integration_test for PandasAgent (#4056)
- confirm creation
- confirm functionality with a simple dimension check.

The test now is calling OpenAI API directly, but learning from
@vowelparrot that we’re caching the requests, so that it’s not that
expensive. I also found we’re calling OpenAI api in other integration
tests. Please lmk if there is any concern of real external API calls. I
can alternatively make a fake LLM for this test. Thanks
1 year ago