Commit Graph

2279 Commits (0ad76c3380055bfbd518348c3ec47548b9868d76)
 

Author SHA1 Message Date
Taras Tsugrii 0ad76c3380
Replace loop appends with list comprehension. (#5528)
# Replace loop appends with list comprehension.

It's significantly faster because it avoids repeated method lookup. It's
also more idiomatic and readable.
1 year ago
Timothy Ji bd9e0f3934
Add param requests_kwargs for WebBaseLoader (#5485)
# Add param `requests_kwargs` for WebBaseLoader

Fixes # (issue)

#5483 

## Who can review?

@eyurtsev
1 year ago
Taras Tsugrii 359fb8fa3a
Replace list comprehension with generator. (#5526)
# Replace list comprehension with generator.

Since these strings can be fairly long, it's best to not construct
unnecessary temporary list just to pass it to `join`. Generators produce
items one-by-one and even though they are slightly more expensive than
lists in terms of CPU they are much more memory-friendly and slightly
more readable.
1 year ago
Matt Robinson 4c8aad0d1b
docs: unstructured no longer requires installing detectron2 from source (#5524)
# Update Unstructured docs to remove the `detectron2` install
instructions

Removes `detectron2` installation instructions from the Unstructured
docs because installing `detectron2` is no longer required for
`unstructured>=0.7.0`. The `detectron2` model now runs using the ONNX
runtime.

## Who can review?

@hwchase17 
@eyurtsev
1 year ago
Rithwik Ediga Lakhamsani d765d77e9b
Add minor fixes for PySpark Document Loader Docs (#5525)
# Add minor fixes for PySpark Document Loader Docs

Renamed "PySpack" to "PySpark" and executed the notebook to show
outputs.
1 year ago
Taras Tsugrii af41cdfc8b
Replace enumerate with zip. (#5527)
# Replace enumerate with zip.

It's more idiomatic and slightly more readable.
1 year ago
James O'Dwyer 226a7521ed
Add Managed Motorhead (#5507)
# Add Managed Motorhead
This change enabled MotorheadMemory to utilize Metal's managed version
of Motorhead. We can easily enable this by passing in a `api_key` and
`client_id` in order to hit the managed url and access the memory api on
Metal.

Twitter: [@softboyjimbo](https://twitter.com/softboyjimbo)

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

 @dev2049 @hwchase17

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
1 year ago
Piyush Jain 5ffa924488
Skips creating boto client for Bedrock if passed in constructor (#5523)
# Skips creating boto client if passed in constructor
Current LLM and Embeddings class always creates a new boto client, even
if one is passed in a constructor. This blocks certain users from
passing in externally created boto clients, for example in SSO
authentication.

## Who can review?
@hwchase17 
@jasondotparse 
@rsgrewal-aws

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @vowelparrot

  VectorStores / Retrievers / Memory
  - @dev2049

 -->
1 year ago
Leonid Ganeline 6b47aaab82
added DeepLearing.AI course link (#5518)
# added DeepLearing.AI course link


## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:


 not @hwchase17 - hehe
1 year ago
Víctor Navarro Aránguiz f39340ff6b
Add allow_download as attribute for GPT4All (#5512)
# Added support for download GPT4All model if does not exist

I've include the class attribute `allow_download` to the GPT4All class.
By default, `allow_download` is set to False.

## Changes Made
- Added a new attribute `allow_download` to the GPT4All class.
- Updated the `validate_environment` method to pass the `allow_download`
parameter to the GPT4All model constructor.

## Context
This change provides more control over model downloading in the GPT4All
class. Previously, if the model file was not found in the cache
directory `~/.cache/gpt4all/`, the package returned error "Failed to
retrieve model (type=value_error)". Now, if `allow_download` is set as
True then it will use GPT4All package to download it . With the addition
of the `allow_download` attribute, users can now choose whether the
wrapper is allowed to download the model or not.

## Dependencies
There are no new dependencies introduced by this change. It only
utilizes existing functionality provided by the GPT4All package.

## Testing
Since this is a minor change to the existing behavior, the existing test
suite for the GPT4All package should cover this scenario

Co-authored-by: Vokturz <victornavarrrokp47@gmail.com>
1 year ago
Zander Chase ea09c0846f
Add Feedback Methods + Evaluation examples (#5166)
Add CRUD methods to interact with feedback endpoints + added eval
examples to the notebook
1 year ago
Davis Chase 46b7181f13
bump 187 (#5504) 1 year ago
Harrison Chase f0ea77b230
add more vars to text splitter (#5503) 1 year ago
Piyush Jain 562fdfc8f9
Bedrock llm and embeddings (#5464)
# Bedrock LLM and Embeddings
This PR adds a new LLM and an Embeddings class for the
[Bedrock](https://aws.amazon.com/bedrock) service. The PR also includes
example notebooks for using the LLM class in a conversation chain and
embeddings usage in creating an embedding for a query and document.

**Note**: AWS is doing a private release of the Bedrock service on
05/31/2023; users need to request access and added to an allowlist in
order to start using the Bedrock models and embeddings. Please use the
[Bedrock Home Page](https://aws.amazon.com/bedrock) to request access
and to learn more about the models available in Bedrock.

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @vowelparrot

  VectorStores / Retrievers / Memory
  - @dev2049

 -->
1 year ago
Harrison Chase 5ce74b5958
code splitter docs (#5480)
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
1 year ago
Harrison Chase 470b2822a3
Add matching engine vectorstore (#3350)
Co-authored-by: Tom Piaggio <tomaspiaggio@google.com>
Co-authored-by: scafati98 <jupyter@matchingengine.us-central1-a.c.scafati-joonix.internal>
Co-authored-by: scafati98 <scafatieugenio@gmail.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
1 year ago
Kacper Łukawski 8bcaca435a
Feature: Qdrant filters supports (#5446)
# Support Qdrant filters

Qdrant has an [extensive filtering
system](https://qdrant.tech/documentation/concepts/filtering/) with rich
type support. This PR makes it possible to use the filters in Langchain
by passing an additional param to both the
`similarity_search_with_score` and `similarity_search` methods.

## Who can review?

@dev2049 @hwchase17

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
1 year ago
Harrison Chase f72bb966f8
Harrison/html splitter (#5468)
Co-authored-by: David Revillas <26328973+r3v1@users.noreply.github.com>
1 year ago
Ankush Gola 1671c2afb2
py tracer fixes (#5377) 1 year ago
Jose Ignacio Hervás Díaz ce8b7a2a69
SQLite-backed Entity Memory (#5129)
# SQLite-backed Entity Memory

Following the initiative of
https://github.com/hwchase17/langchain/pull/2397 I think it would be
helpful to be able to persist Entity Memory on disk by default

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
1 year ago
Jeff Vestal 46e181aa8b
Allow ElasticsearchEmbeddings to create a connection with ES Client object (#5321)
This PR adds a new method `from_es_connection` to the
`ElasticsearchEmbeddings` class allowing users to use Elasticsearch
clusters outside of Elastic Cloud.

Users can create an Elasticsearch Client object and pass that to the new
function.
The returned object is identical to the one returned by calling
`from_credentials`

```
# Create Elasticsearch connection
es_connection = Elasticsearch(
    hosts=['https://es_cluster_url:port'], 
    basic_auth=('user', 'password')
)

# Instantiate ElasticsearchEmbeddings using es_connection
embeddings = ElasticsearchEmbeddings.from_es_connection(
  model_id,
  es_connection,
)
```

I also added examples to the elasticsearch jupyter notebook

Fixes # https://github.com/hwchase17/langchain/issues/5239

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
1 year ago
Mark Pors 0a44bfdca3
Allow for async use of SelfAskWithSearchChain (#5394)
# Allow for async use of SelfAskWithSearchChain


Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
1 year ago
Víctor Navarro Aránguiz 8121e04200
added n_threads functionality for gpt4all (#5427)
# Added support for modifying the number of threads in the GPT4All model

I have added the capability to modify the number of threads used by the
GPT4All model. This allows users to adjust the model's parallel
processing capabilities based on their specific requirements.

## Changes Made
- Updated the `validate_environment` method to set the number of threads
for the GPT4All model using the `values["n_threads"]` parameter from the
`GPT4All` class constructor.

## Context
Useful in scenarios where users want to optimize the model's performance
by leveraging multi-threading capabilities.
Please note that the `n_threads` parameter was included in the `GPT4All`
class constructor but was previously unused. This change ensures that
the specified number of threads is utilized by the model .

## Dependencies
There are no new dependencies introduced by this change. It only
utilizes existing functionality provided by the GPT4All package.

## Testing
Since this is a minor change testing is not required.

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
1 year ago
Blithe e31705b5ab
convert the parameter 'text' to uppercase in the function 'parse' of the class BooleanOutputParser (#5397)
when the LLMs output 'yes|no',BooleanOutputParser can parse it to
'True|False', fix the ValueError in parse().
<!--
when use the BooleanOutputParser in the chain_filter.py, the LLMs output
'yes|no',the function 'parse' will throw ValueError。
-->

Fixes # (issue)
  #5396
  https://github.com/hwchase17/langchain/issues/5396

---------

Co-authored-by: gaofeng27692 <gaofeng27692@hundsun.com>
1 year ago
Natalie 199cc700a3
Ability to specify credentials wihen using Google BigQuery as a data loader (#5466)
# Adds ability to specify credentials when using Google BigQuery as a
data loader

Fixes #5465 . Adds ability to set credentials which must be of the
`google.auth.credentials.Credentials` type. This argument is optional
and will default to `None.

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
1 year ago
Harrison Chase eab4b4ccd7
add simple test for imports (#5461)
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
1 year ago
Janos Tolgyesi 1111f18eb4
Add maximal relevance search to SKLearnVectorStore (#5430)
# Add maximal relevance search to SKLearnVectorStore

This PR implements the maximum relevance search in SKLearnVectorStore. 

Twitter handle: jtolgyesi (I submitted also the original implementation
of SKLearnVectorStore)

## Before submitting

Unit tests are included.

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
1 year ago
Ayan Bandyopadhyay 8181f9e362
Update psychicapi version (#5471)
Update [psychicapi](https://pypi.org/project/psychicapi/) python package
dependency to the latest version 0.5. The newest python package version
addresses breaking changes in the Psychic http api.
1 year ago
Kacper Łukawski f93d256190
Feat: Add batching to Qdrant (#5443)
# Add batching to Qdrant

Several people requested a batching mechanism while uploading data to
Qdrant. It is important, as there are some limits for the maximum size
of the request payload, and without batching implemented in Langchain,
users need to implement it on their own. This PR exposes a new optional
`batch_size` parameter, so all the documents/texts are loaded in batches
of the expected size (64, by default).

The integration tests of Qdrant are extended to cover two cases:
1. Documents are sent in separate batches.
2. All the documents are sent in a single request.
1 year ago
Camille Van Hoffelen 80e133f16d
Added async _acall to FakeListLLM (#5439)
# Added Async _acall to FakeListLLM

FakeListLLM is handy when unit testing apps built with langchain. This
allows the use of FakeListLLM inside concurrent code with
[asyncio](https://docs.python.org/3/library/asyncio.html).

I also changed the pydocstring which was out of date.

## Who can review?

@hwchase17 - project lead
@agola11 - async
1 year ago
Leonid Ganeline 1f11f80641
docs: cleaning (#5413)
# docs cleaning

Changed docs to consistent format (probably, we need an official doc
integration template):
- ClearML - added product descriptions; changed title/headers
- Rebuff  - added product descriptions; changed title/headers
- WhyLabs  - added product descriptions; changed title/headers
- Docugami - changed title/headers/structure
- Airbyte - fixed title
- Wolfram Alpha - added descriptions, fixed title
- OpenWeatherMap -  - added product descriptions; changed title/headers
- Unstructured - changed description

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

@hwchase17
@dev2049
1 year ago
Matt Wells 1d861dc37a
MRKL output parser no longer breaks well formed queries (#5432)
# Handles the edge scenario in which the action input is a well formed
SQL query which ends with a quoted column

There may be a cleaner option here (or indeed other edge scenarios) but
this seems to robustly determine if the action input is likely to be a
well formed SQL query in which we don't want to arbitrarily trim off `"`
characters

Fixes #5423

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Agents / Tools / Toolkits
  - @vowelparrot
1 year ago
Yoann Poupart c1807d8408
`encoding_kwargs` for InstructEmbeddings (#5450)
# What does this PR do?

Bring support of `encode_kwargs` for ` HuggingFaceInstructEmbeddings`,
change the docstring example and add a test to illustrate with
`normalize_embeddings`.

Fixes #3605
(Similar to #3914)

Use case:
```python
from langchain.embeddings import HuggingFaceInstructEmbeddings

model_name = "hkunlp/instructor-large"
model_kwargs = {'device': 'cpu'}
encode_kwargs = {'normalize_embeddings': True}
hf = HuggingFaceInstructEmbeddings(
    model_name=model_name,
    model_kwargs=model_kwargs,
    encode_kwargs=encode_kwargs
)
```
1 year ago
Patrick Keane e09afb4b44
Removes duplicated call from langchain/client/langchain.py (#5449)
This removes duplicate code presumably introduced by a cut-and-paste
error, spotted while reviewing the code in
```langchain/client/langchain.py```. The original code had back to back
occurrences of the following code block:

```
        response = self._get(
            path,
            params=params,
        )
        raise_for_status_with_text(response)
```
1 year ago
Jan Brinkmann 0d3a9d481f
Fixed docstring in faiss.py for load_local (#5440)
# Fix for docstring in faiss.py vectorstore (load_local)

The doctring should reflect that load_local loads something FROM the
disk.
1 year ago
Davis Chase 4379bd4cbb
bump 186 (#5459) 1 year ago
Davis Chase 2649b638dd
fix (#5457) 1 year ago
Davis Chase 64b4165c8d
bump 185 (#5442) 1 year ago
ByronHsu 9d658aaa5a
Add more code splitters (go, rst, js, java, cpp, scala, ruby, php, swift, rust) (#5171)
As the title says, I added more code splitters.
The implementation is trivial, so i don't add separate tests for each
splitter.
Let me know if any concerns.

Fixes # (issue)
https://github.com/hwchase17/langchain/issues/5170

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:
@eyurtsev @hwchase17

---------

Signed-off-by: byhsu <byhsu@linkedin.com>
Co-authored-by: byhsu <byhsu@linkedin.com>
1 year ago
Paul-Emile Brotons a61b7f7e7c
adding MongoDBAtlasVectorSearch (#5338)
# Add MongoDBAtlasVectorSearch for the python library

Fixes #5337
---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
1 year ago
Harrison Chase c4b502a470
Harrison/condense q llm (#5438) 1 year ago
Lei Xu ee57054d05
Rename and fix typo in lancedb (#5425)
# Fix typo in LanceDB notebook filename
1 year ago
Zander Chase 26ff18575c
Set old LCTracer to default to port 8000 (#5381)
Issue from:
https://discord.com/channels/1038097195422978059/1069478035918688346/1112445980466483222
1 year ago
Harrison Chase 760632b292
Harrison/spark reader (#5405)
Co-authored-by: Rithwik Ediga Lakhamsani <rithwik.ediga@databricks.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
1 year ago
UmerHA 8259f9b7fa
DocumentLoader for GitHub (#5408)
# Creates GitHubLoader (#5257)

GitHubLoader is a DocumentLoader that loads issues and PRs from GitHub.

Fixes #5257

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
1 year ago
German Martin 0b3e0dd1d2
New Trello document loader (#4767)
# Added New Trello loader class and documentation

Simple Loader on top of py-trello wrapper. 
With a board name you can pull cards and to do some field parameter
tweaks on load operation.
I included documentation and examples.
Included unit test cases using patch and a fixture for py-trello client
class.

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
1 year ago
Harrison Chase 72f99ff953
Harrison/text splitter (#5417)
adds support for keeping separators around when using recursive text
splitter
1 year ago
小铭 cf5803e44c
Add ToolException that a tool can throw. (#5050)
# Add ToolException that a tool can throw
This is an optional exception that tool throws when execution error
occurs.
When this exception is thrown, the agent will not stop working,but will
handle the exception according to the handle_tool_error variable of the
tool,and the processing result will be returned to the agent as
observation,and printed in pink on the console.It can be used like this:
```python 
from langchain.schema import ToolException
from langchain import LLMMathChain, SerpAPIWrapper, OpenAI
from langchain.agents import AgentType, initialize_agent
from langchain.chat_models import ChatOpenAI
from langchain.tools import BaseTool, StructuredTool, Tool, tool
from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(temperature=0)
llm_math_chain = LLMMathChain(llm=llm, verbose=True)

class Error_tool:
    def run(self, s: str):
        raise ToolException('The current search tool is not available.')
    
def handle_tool_error(error) -> str:
    return "The following errors occurred during tool execution:"+str(error)

search_tool1 = Error_tool()
search_tool2 = SerpAPIWrapper()
tools = [
    Tool.from_function(
        func=search_tool1.run,
        name="Search_tool1",
        description="useful for when you need to answer questions about current events.You should give priority to using it.",
        handle_tool_error=handle_tool_error,
    ),
    Tool.from_function(
        func=search_tool2.run,
        name="Search_tool2",
        description="useful for when you need to answer questions about current events",
        return_direct=True,
    )
]
agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True,
                         handle_tool_errors=handle_tool_error)
agent.run("Who is Leo DiCaprio's girlfriend? What is her current age raised to the 0.43 power?")
```

![image](https://github.com/hwchase17/langchain/assets/32786500/51930410-b26e-4f85-a1e1-e6a6fb450ada)

## Who can review?
- @vowelparrot

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
1 year ago
Harrison Chase cce731c3c2
bump version 184 (#5407) 1 year ago
Harrison Chase 2da8c48be1
Harrison/datetime parser (#4693)
Co-authored-by: Jacob Valdez <jacobfv@msn.com>
Co-authored-by: Jacob Valdez <jacob.valdez@limboid.ai>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
1 year ago