Commit Graph

1888 Commits (parallel_dir_loader)
 

Author SHA1 Message Date
PawelFaron 04b74d0446
Adjusted GPT4All llm to streaming API and added support for GPT4All_J (#4131)
Fix for these issues:
https://github.com/hwchase17/langchain/issues/4126

https://github.com/hwchase17/langchain/issues/3839#issuecomment-1534258559

---------

Co-authored-by: Pawel Faron <ext-pawel.faron@vaisala.com>
1 year ago
Harrison Chase 075d9631f5
bump ver to 161 (#4239) 1 year ago
Harrison Chase 64940e9d0f
docs for azure (#4238) 1 year ago
Myeongseop Kim 747b5f87c2
Add HumanInputLLM (#4160)
Related: #4028, I opened a new PR because (1) I was unable to unstage
mistakenly committed files (I'm not familiar with git enough to resolve
this issue), (2) I felt closing the original PR and opening a new PR
would be more appropriate if I changed the class name.

This PR creates HumanInputLLM(HumanLLM in #4028), a simple LLM wrapper
class that returns user input as the response. I also added a simple
Jupyter notebook regarding how and why to use this LLM wrapper. In the
notebook, I went over how to use this LLM wrapper and showed example of
testing `WikipediaQueryRun` using HumanInputLLM.
 
I believe this LLM wrapper will be useful especially for debugging,
educational or testing purpose.
1 year ago
Davis Chase 6cd51ef3d0
Simplify router chain constructor signatures (#4146) 1 year ago
玄猫 43a7a89e93
opt: document_loader notiondb to extract url (#4222) 1 year ago
Leonid Ganeline 9544b30821
added `Wikipedia` document loader (#4141)
- Added the `Wikipedia` document loader. It is based on the existing
`unilities/WikipediaAPIWrapper`
- Added a respective ut-s and example notebook
- Sorted list of classes in __init__
1 year ago
Eugene Yurtsev 423f497168
Add BlobParser abstraction (#3979)
This PR adds the BlobParser abstraction.

It follows the proposal described here:
https://github.com/hwchase17/langchain/pull/2833#issuecomment-1509097756
1 year ago
Davis Chase 5ca13cc1f0
Dev2049/pypdfium2 (#4209)
thanks @jerrytigerxu for the addition!

---------

Co-authored-by: Jere Xu <jtxu2008@gmail.com>
Co-authored-by: jerrytigerxu <jere.tiger.xu@gmailc.om>
1 year ago
Leonid Ganeline 59204a5033
docs: `document_loaders` improvements (#4200)
- made notebooks consistent: titles, service/format descriptions.
- corrected short names to full names, for example, `Word` -> `Microsoft
Word`
- added missed descriptions
- renamed notebook files to make ToC correctly sorted
1 year ago
Harrison Chase eeb7c96e0c
bump version to 160 (#4205) 1 year ago
Davis Chase f1fc4dfebc
Dev2049/obsidian patch (#4204)
thanks @shkarlsson for the fix! (just updated formatting)

---------

Co-authored-by: shkarlsson <sven.henrik.karlsson@gmail.com>
1 year ago
George 2324f19c85
Update qdrant interface (#3971)
Hello

1) Passing `embedding_function` as a callable seems to be outdated and
the common interface is to pass `Embeddings` instance

2) At the moment `Qdrant.add_texts` is designed to be used with
`embeddings.embed_query`, which is 1) slow 2) causes ambiguity due to 1.
It should be used with `embeddings.embed_documents`

This PR solves both problems and also provides some new tests
1 year ago
Harrison Chase 76ed41f48a
update docs (#4194) 1 year ago
Zander Chase 1017e5cee2
Add LCP Client (#4198)
Adding a client to fetch datasets, examples, and runs from a LCP
instance and run objects over them.
1 year ago
Zander Chase a30f42da4e
Update V2 Tracer (#4193)
- Update the RunCreate object to work with recent changes
- Add optional Example ID to the tracer
- Adjust default persist_session behavior to attempt to load the session
if it exists
- Raise more useful HTTP errors for logging
- Add unit testing
- Fix the default ID to be a UUID for v2 tracer sessions


Broken out from the big draft here:
https://github.com/hwchase17/langchain/pull/4061
1 year ago
Mike Wang c3044b1bf0
[test] Add integration_test for PandasAgent (#4056)
- confirm creation
- confirm functionality with a simple dimension check.

The test now is calling OpenAI API directly, but learning from
@vowelparrot that we’re caching the requests, so that it’s not that
expensive. I also found we’re calling OpenAI api in other integration
tests. Please lmk if there is any concern of real external API calls. I
can alternatively make a fake LLM for this test. Thanks
1 year ago
Aivin V. Solatorio 6567b73e1a
JSON loader (#4067)
This implements a loader of text passages in JSON format. The `jq`
syntax is used to define a schema for accessing the relevant contents
from the JSON file. This requires dependency on the `jq` package:
https://pypi.org/project/jq/.

---------

Signed-off-by: Aivin V. Solatorio <avsolatorio@gmail.com>
1 year ago
PawelFaron bb6d97c18c
Fixed the example code (#4117)
Fixed the issue mentioned here:

https://github.com/hwchase17/langchain/issues/3799#issuecomment-1534785861

Co-authored-by: Pawel Faron <ext-pawel.faron@vaisala.com>
1 year ago
Anurag 19e28d8784
feat: Allow users to pass additional arguments to the WebDriver (#4121)
This commit adds support for passing additional arguments to the
`SeleniumURLLoader ` when creating Chrome or Firefox web drivers.
Previously, only a few arguments such as `headless` could be passed in.
With this change, users can pass any additional arguments they need as a
list of strings using the `arguments` parameter.

The `arguments` parameter allows users to configure the driver with any
options that are available for that particular browser. For example,
users can now pass custom `user_agent` strings or `proxy` settings using
this parameter.

This change also includes updated documentation and type hints to
reflect the new `arguments` parameter and its usage.

fixes #4120
1 year ago
hp0404 2a3c5f8353
Update WhatsAppChatLoader regex to handle multiple date-time formats (#4186)
This PR updates the `message_line_regex` used by `WhatsAppChatLoader` to
support different date-time formats used in WhatsApp chat exports;
resolves #4153.

The new regex handles the following input formats:
```terminal
[05.05.23, 15:48:11] James: Hi here
[11/8/21, 9:41:32 AM] User name: Message 123
1/23/23, 3:19 AM - User 2: Bye!
1/23/23, 3:22_AM - User 1: And let me know if anything changes
```

Tests have been added to verify that the loader works correctly with all
formats.
1 year ago
Nicolas a57259ec83
docs: Mendable Fixes and Improvements (#4184)
Overall fixes and improvements.
1 year ago
Harrison Chase 7dcc698ebf
bump version to 159 (#4183) 1 year ago
Harrison Chase 26534457f5
simplify csv args (#4182) 1 year ago
Eduard van Valkenburg 3095546851
PowerBI fix for table names with spaces (#4170)
small fix to make sure a table name with spaces is passed correctly to
the API for the schema lookup.
1 year ago
obbiondo b1e2e29222
fix: remove expand parameter from ConfluenceLoader by label (#4181)
expand is not an allowed parameter for the method
confluence.get_all_pages_by_label, since it doesn't return the body of
the text but just metadata of documents

Co-authored-by: Andrea Biondo <a.biondo@reply.it>
1 year ago
Zander Chase 84cfa76e00
Update Cohere Reranker (#4180)
The forward ref annotations don't get updated if we only iimport with
type checking

---------

Co-authored-by: Abhinav Verma <abhinav_win12@yahoo.co.in>
1 year ago
Davis Chase d84bb02881
Add Chroma self query (#4149)
Add internal query language -> chroma metadata filter translator
1 year ago
Vinoo Ganesh 905a2114d7
Fix: Typo in Docs (#4179)
Fixing small typo in docs
1 year ago
Ankush Gola 8de1b4c4c2
Revert "fix: #4128 missing run_manager parameter" (#4159)
Reverts hwchase17/langchain#4130
1 year ago
Chakib Ben Ziane 878d0c8155
fix: #4128 missing run_manager parameter (#4130)
`run_manager` was not being passed downstream. Not sure if this was a
deliberate choice but it seems like it broke many agent callbacks like
`agent_action` and `agent_finish`. This fix needs a proper review.

Co-authored-by: blob42 <spike@w530>
1 year ago
Zander Chase 6032a051e9
Add Tenant ID to V2 Tracer (#4135)
Update the V2 tracer to
- use UUIDs instead of int's
- load a tenant ID and use that when saving sessions
1 year ago
Zander Chase fea639c1fc
Vwp/sqlalchemy (#4145)
Bump threshold to 1.4 from 1.3. Change import to be compatible

Resolves #4142 and #4129

---------

Co-authored-by: ndaugreal <ndaugreal@gmail.com>
Co-authored-by: Jeremy Lopez <lopez86@users.noreply.github.com>
1 year ago
Zander Chase 2f087d63af
Fix Python RePL Tool (#4137)
Filter out kwargs from inferred schema when determining if a tool is
single input.

Add a couple unit tests.

Move tool unit tests to the tools dir
1 year ago
Zander Chase cc068f1b77
Add Issue Templates (#4021)
Add issue templates for
- bug reports
- feature suggestions
- documentation
and a link to the discord for general discussion.

Open to other suggestions here. Could also add another "Other" template
with just a raw text box if we think this is too restrictive


<img width="1464" alt="image"
src="https://user-images.githubusercontent.com/130414180/236115358-e603bcbe-282c-40c7-82eb-905eb93ccec0.png">
1 year ago
Zander Chase ac0a9d02bd
Visual Studio Code/Github Codespaces Dev Containers (#4035) (#4122)
Having dev containers makes its easier, faster and secure to setup the
dev environment for the repository.

The pull request consists of:

- .devcontainer folder with:
- **devcontainer.json :** (minimal necessary vscode extensions and
settings)
- **docker-compose.yaml :** (could be modified to run necessary services
as per need. Ex vectordbs, databases)
    - **Dockerfile:**(non root with dev tools)
- Changes to README - added the Open in Github Codespaces Badge - added
the Open in dev container Badge

Co-authored-by: Jinto Jose <129657162+jj701@users.noreply.github.com>
1 year ago
Harrison Chase d86ed15d88
bump version to 158 (#4091) 1 year ago
OlajideOgun 624554a43a
DeepLake: Pass in rest of args to self._search_helper (#4080)
As of right now when trying to use functions like
`max_marginal_relevance_search()` or
`max_marginal_relevance_search_by_vector()` the rest of the kwargs are
not propagated to `self._search_helper()`. For example a user cannot
explicitly state the distance_metric they want to use when calling
`max_marginal_relevance_search`
1 year ago
Eduard van Valkenburg 6d84541ff9
fix base url (#4095)
Noticed a mistake in the base url and group vs non-group urls
1 year ago
Harrison Chase a9c2450330
Harrison/toml loader (#4090)
Co-authored-by: Mika Ayenson <Mikaayenson@users.noreply.github.com>
1 year ago
Harrison Chase d4cf1eb60a
Add firestore memory (#3792) (#3941)
If you have any other suggestions or feedback, please let me know.

---------

Co-authored-by: yakigac <10434946+yakigac@users.noreply.github.com>
1 year ago
Harrison Chase fba6921b50
Harrison/one drive loader (#4081)
Co-authored-by: José Ferraz Neto <netoferraz@gmail.com>
1 year ago
golergka bd277b5327
feat: prune summary buffer (#4004)
If the library user has to decrease the `max_token_limit`, he would
probably want to prune the summary buffer even though he haven't added
any new messages.

Personally, I need it because I want to serialise memory buffer object
and save to database, and when I load it, I may have re-configured my
code to have a shorter memory to save on tokens.
1 year ago
AndreLCanada bf726f9d8a
Update python_repl docs (#4012)
In the example for creating a Python REPL tool under the Agent module,
the ".run" was omitted in the example. I believe this is required when
defining a Tool.
1 year ago
Mike Wang 67db495fcf
[agent] Add Spark Agent (#4020)
- added support for spark through pyspark library.
- added jupyter notebook as example.
1 year ago
Gengliang Wang 8af25867cb
Simplify HumanMessages in the quick start guide (#4026)
In the section `Get Message Completions from a Chat Model` of the quick
start guide, the HumanMessage doesn't need to include `Translate this
sentence from English to French.` when there is a system message.

Simplify HumanMessages in these examples can further demonstrate the
power of LLM.
1 year ago
Harrison Chase 087a4bd2b8
improve agent documentation (#4062) 1 year ago
rogerserper b1446bea5f
google-serper: async + full json results + support for Google Images, Places and News (#4078)
* implemented arun, results, and aresults. Reuses aiosession if
available.
* helper tools GoogleSerperRun and GoogleSerperResults
* support for Google Images, Places and News (examples given) and
filtering based on time (e.g. past hour)
* updated docs
1 year ago
mbchang cdea47491d
refactor: refactor dialogue examples (DialogueAgent, DialogueSimulator) (#4074)
refactor dialogue examples to have same DialogueAgent and
DialogueSimulator definitions
1 year ago
Jan Philipp Harries 657f5f259f
Added option to reduce verbosity of Deeplake integration (#4038)
The deeplake integration was/is very verbose (see e.g. [the
documentation
example](https://python.langchain.com/en/latest/use_cases/code/code-analysis-deeplake.html)
when loading or creating a deeplake dataset with only limited options to
dial down verbosity.

Additionally, the warning that a "Deep Lake Dataset already exists" was
confusing, as there is as far as I can tell no other way to load a
dataset.

This small PR changes that and introduces an explicit `verbose` argument
which is also passed to the deeplake library.

There should be minimal changes to the default output (the loading line
is printed instead of warned to make it consistent with `ds.summary()`
which also prints.
1 year ago