Commit Graph

781 Commits

Author SHA1 Message Date
Lorenzo
00a7c31ffd
Fix: Nested Dicts Handling of Document Metadata (#9880)
## Description
When the `MultiQueryRetriever` is used to get the list of documents
relevant according to a query, inside a vector store, and at least one
of these contain metadata with nested dictionaries, a `TypeError:
unhashable type: 'dict'` exception is thrown.
This is caused by the `unique_union` function which, to guarantee the
uniqueness of the returned documents, tries, unsuccessfully, to hash the
nested dictionaries and use them as a part of key.
```python
unique_documents_dict = {
    (doc.page_content, tuple(sorted(doc.metadata.items()))): doc
    for doc in documents
}
```

## Issue
#9872 (MultiQueryRetriever (get_relevant_documents) raises TypeError:
unhashable type: 'dict' with dic metadata)

## Solution
A possible solution is to dump the metadata dict to a string and use it
as a part of hashed key.
```python
unique_documents_dict = {
    (doc.page_content, json.dumps(doc.metadata, sort_keys=True)): doc
    for doc in documents
}
```

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-09-03 15:27:46 -07:00
Davide Menini
b8baead70c
fix (Html2TextTransformer): allow configuration of html2text (#9914)
Hi, this PR enables configuring the html2text package, instead of being
bound to use the hardcoded values. While simply passing `ignore_links`
and `ignore_images` to the `transform_documents` method was possible, I
preferred passing them to the `__init__` method for 2 reasons:

1. It is more efficient in case of subsequent calls to
`transform_documents`.
2. It allows to move the "complexity" to the instantiation, keeping the
actual execution simple and general enough. IMO the transformers should
all follow this pattern, allowing something like this:
```python
# Instantiate transformers
transformers = [
    TransformerA(foo='bar'),
    TransformerB(bar='foo'),
    # others
]

# During execution, call them sequentially
documents = ...
for tr in transformers:
    documents = tr.transform_documents(documents)
```

Thanks for the reviews!

---------

Co-authored-by: taamedag <Davide.Menini@swisscom.com>
2023-09-03 15:10:25 -07:00
Frédéric Lepied
4dc47bd3ac
time_weighted_retriever: use a timestamp if needed (#9906)
If last_accessed_at metadata is a float use it as a timestamp. This
allows to support vector stores that do not store datetime objects like
ChromaDb.

Fixes: https://github.com/langchain-ai/langchain/issues/3685

<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - Description: a description of the change, 
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. These live is docs/extras
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17, @rlancemartin.
 -->
2023-09-03 15:05:30 -07:00
Josh White
bc8cceebf7
Extend DynamoDBChatMessageHistory to support composite keys (#9896)
- Description: Adds two optional parameters to the
DynamoDBChatMessageHistory class to enable users to pass in a name for
their PrimaryKey, or a Key object itself to enable the use of composite
keys, a common DynamoDB paradigm.
  
[AWS DynamoDB Key
docs](https://aws.amazon.com/blogs/database/choosing-the-right-dynamodb-partition-key/)
  
  - Issue: N/A
  - Dependencies: N/A
  - Twitter handle: N/A

---------

Co-authored-by: Josh White <josh@ctrlstack.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-09-03 15:05:16 -07:00
Programmers Emperor
872d829201
Update __init__.py (#9955)
Add SQLDatabaseSequentialChain Class to __init__.py so it can be
accessed and used

<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
- Description: SQLDatabaseSequentialChain is not found when importing
Langchain_experimental package, when I open __init__.py
Langchain_expermental.sql, I found that SQLDatabaseSequentialChain is
imported and add to __all__ list
- Issue: SQLDatabaseSequentialChain is not found in
Langchain_experimental package
  - Dependencies: None,
  - Tag maintainer: None,
  - Twitter handle: None,

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. These live is docs/extras
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17, @rlancemartin.
 -->
2023-09-03 15:02:58 -07:00
Lucas Rodrigues Pereira
5c7afe8aae
Fix json parsing error of MULTI_PROMPT_ROUTER_TEMPLATE (#9944)
The output at times lacks the closing markdown code block. The prompt is
changed to explicitly request the closing backticks.

<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - Description: a description of the change, 
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. These live is docs/extras
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17, @rlancemartin.
 -->
2023-09-03 15:00:50 -07:00
Lance Martin
387813bfb2
Sort by most recent chatIDs (#9946)
When we `lazy_load` iMessage chats, return chats w/ most recent msg
first (matches what is visualized in app).
2023-09-03 15:00:20 -07:00
German Martin
cf5a50469f
TextGen is missing async methods. (#9986)
Adding _acall and _astream method that were missing. Preventing
streaming during async executions.

 @rlancemartin.
2023-09-03 14:57:40 -07:00
Blake (Yung Cher Ho)
f4bed8a04c
Takeoff baseurl support (#10091)
## Description
This PR introduces a minor change to the TitanTakeoff integration. 
Instead of specifying a port on localhost, this PR will allow users to
specify a baseURL instead. This will allow users to use the integration
if they have TitanTakeoff deployed externally (not on localhost). This
removes the hardcoded reference to localhost "http://localhost:{port}".

### Info about Titan Takeoff
Titan Takeoff is an inference server created by
[TitanML](https://www.titanml.co/) that allows you to deploy large
language models locally on your hardware in a single command. Most
generative model architectures are included, such as Falcon, Llama 2,
GPT2, T5 and many more.

Read more about Titan Takeoff here:
-
[Blog](https://medium.com/@TitanML/introducing-titan-takeoff-6c30e55a8e1e)
- [Docs](https://docs.titanml.co/docs/titan-takeoff/getting-started)

### Dependencies
No new dependencies are introduced. However, users will need to install
the titan-iris package in their local environment and start the Titan
Takeoff inferencing server in order to use the Titan Takeoff
integration.

Thanks for your help and please let me know if you have any questions.
cc: @hwchase17 @baskaryan

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-09-03 14:45:59 -07:00
Eddie Cohen
565c021730
Add ne comparator (#10006)
Description: Adds the not comparator and operator to pinecone, chroma
and deeplake.
Issue: Not a registered issue but when using a selfqueryretriever with
pinecone I got this error + stacktrace when I entered a query that asked
to not include specific data:
 
>  raised following `error:`
> Received unrecognized function ne. Valid functions are [<Operator.AND:
'and'>, <Operator.OR: 'or'>, <Operator.NOT: 'not'>, <Comparator.EQ:
'eq'>, <Comparator.GT: 'gt'>, <Comparator.GTE: 'gte'>, <Comparator.LT:
'lt'>, <Comparator.LTE: 'lte'>]

I noticed that chroma and deeplake also support not equals/not filtering
so I added it there as well



[pinecone](https://docs.pinecone.io/docs/metadata-filtering#metadata-query-language)
[chroma](https://docs.trychroma.com/usage-guide#filtering-by-metadata)

[deeplake](https://docs.activeloop.ai/enterprise-features/compute-engine/querying-datasets/query-syntax#and-or-not)
2023-09-03 14:45:11 -07:00
Leonid Ganeline
2221194450
Yahoo Finance News tool (#10014)
Added:
- the `Yahoo Finance News` tool
- Ut-s
- An example
2023-09-03 14:43:57 -07:00
Lars von Wedel
6d82503eb1
Add parser and loader for Azure document intelligence service. (#10136)
Hi,

this PR contains loader / parser for Azure Document intelligence which
is a ML-based service to ingest arbitrary PDFs / images, even if
scanned. The loader generates Documents by pages of the original
document. This is my first contribution to LangChain.

Unfortunately I could not find the correct place for test cases. Happy
to add one if you can point me to the location, but as this is a
cloud-based service, a test would require network access and credentials
- so might be of limited help.

Dependencies: The needed dependency was already part of pyproject.toml,
no change.
Twitter: feel free to mention @LarsAC on the announcement
2023-09-03 14:25:39 -07:00
Harrison Chase
4abe85be57
Harrison/string inplace (#10153)
Co-authored-by: Wrick Talukdar <wrick.talukdar@gmail.com>
Co-authored-by: Anjan Biswas <anjanavb@amazon.com>
Co-authored-by: Jha <nikjha@amazon.com>
Co-authored-by: Lucky-Lance <77819606+Lucky-Lance@users.noreply.github.com>
Co-authored-by: 陆徐东 <luxudong@MacBook-Pro.local>
2023-09-03 14:25:29 -07:00
Harrison Chase
f5af756397
fake messages list model (#10152)
create a fake chat model that you can configure with list of messages
2023-09-03 13:49:43 -07:00
Harrison Chase
9e6cc7b236
make hub push public by default (#10138) 2023-09-03 13:04:58 -07:00
Bagatur
0e4c5dd176
bump 13 (#10130) 2023-09-02 10:22:31 -07:00
Bagatur
42582adb66
bump 280 (#10117) 2023-09-01 17:43:14 -07:00
Bagatur
9e196cb470
rm sqlite3 import (#10115) 2023-09-01 17:14:06 -07:00
Arpan Pokharel
f8bca156d4
Add where filter in weaviate similarity search with score (#9978)
- Description: Add where filter in weaviate similarity search with score
  - Issue: #9853 
  - Dependencies: -
  - Tag maintainer: -
  - Twitter handle: -
2023-09-01 16:09:19 -07:00
Leonid Kuligin
30239b3025
added support for inference from Model Garden (#9367)
#8850

---------

Co-authored-by: Leonid Kuligin <kuligin@google.com>
2023-09-01 15:58:21 -07:00
Benjamin Matson
58d7d86e51
feat: add bedrock chat model (#8017)
Replace this comment with:
  - Description: Add Bedrock implementation of Anthropic Claude for Chat
  - Tag maintainer: @hwchase17, @baskaryan
  - Twitter handle: @bwmatson

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-09-01 13:16:57 -07:00
Massimiliano Pronesti
a7c9bd30d4
feat(llms): add missing params to huggingface text-generation (#9724)
This small PR aims at supporting the following missing parameters in the
`HuggingfaceTextGen` LLM:
- `return_full_text` - sometimes useful for completion tasks
- `do_sample` - quite handy to control the randomness of the model.
- `watermark`

@hwchase17 @baskaryan

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-09-01 13:16:27 -07:00
KyrianC
491089754d
EdenAI LLM update. Add models name option (#8963)
This PR follows the **Eden AI (LLM + embeddings) integration**. #8633 

We added an optional parameter to choose different AI models for
providers (like 'text-bison' for provider 'google', 'text-davinci-003'
for provider 'openai', etc.).

Usage:

```python
llm = EdenAI(
    feature="text",
    provider="google",
    params={
        "model": "text-bison",  # new
        "temperature": 0.2,
        "max_tokens": 250,
    },
)

```

You can also change the provider + model after initialization
```python
llm = EdenAI(
    feature="text",
    provider="google",
    params={
        "temperature": 0.2,
        "max_tokens": 250,
    },
)

prompt = """
hi 
"""

llm(prompt, providers='openai', model='text-davinci-003')  # change provider & model
```

The jupyter notebook as been updated with an example well.


Ping: @hwchase17, @baskaryan

---------

Co-authored-by: RedhaWassim <rwasssim@gmail.com>
Co-authored-by: sam <melaine.samy@gmail.com>
2023-09-01 12:11:33 -07:00
maks-operlejn-ds
b5a74fb973
Temporarily remove language selection (#10097)
Adapting Microsoft Presidio to other languages requires a bit more work,
so for now it will be good idea to remove the language option to choose,
so as not to cause errors and confusion.
https://microsoft.github.io/presidio/analyzer/languages/

I will handle different languages after the weekend 😄
2023-09-01 11:30:48 -07:00
Bagatur
71c418725f
index rename delete_mode -> cleanup (#10103) 2023-09-01 11:12:10 -07:00
Nuno Campos
427f696fb0
Nc/runnables seqmap tags (#9753) 2023-09-01 18:53:10 +01:00
Harrison Chase
d7bf7dc412
add repr for not serializable (#10071)
Co-authored-by: Nuno Campos <nuno@boringbits.io>
2023-09-01 09:18:32 -07:00
Bagatur
355ff09cce
bump 279 (#10098) 2023-09-01 08:49:26 -07:00
Pihplipe Oegr
3dafbd852e
Add sqlite-vss as a vector database (#10047)
This adds sqlite-vss as an option for a vector database. Contains the
code and a few tests. Tests are passing and the library sqlite-vss is
added as optional as explained in the contributing guidelines. I
adjusted the code for lint/black/ and mypy. It looks that everything is
currently passing.

Adding sqlite-vss was mentioned in this issue:
https://github.com/langchain-ai/langchain/issues/1019.
Also mentioned here in the sqlite-vss repo for the curious:
https://github.com/asg017/sqlite-vss/issues/66

Maintainer tag: @baskaryan

---------

Co-authored-by: Philippe Oger <philippe.oger@adevinta.com>
2023-09-01 08:36:34 -07:00
KyrianC
c7a5504789
Add EdenAI Tools (#9764)
This PR follows the Eden AI (LLM + embeddings) integration. #8633

We added different Tools to empower agents with new capabilities :

- text: explicit content detection

- image: explicit content detection

- image: object detection

- OCR: invoice parsing

- OCR: ID parsing

- audio: speech to text

- audio: text to speech

 
We plan to add more in the future (like translation, language detection,
+ others).


Usage:

```python
llm=EdenAI(feature="text",provider="openai", params={"temperature" : 0.2,"max_tokens" : 250})

tools = [
    EdenAiTextModerationTool(providers=["openai"],language="en"),
    EdenAiObjectDetectionTool(providers=["google","api4ai"]),
    EdenAiTextToSpeechTool(providers=["amazon"],language="en",voice="MALE"),
    EdenAiExplicitImageTool(providers=["amazon","google"]),
    EdenAiSpeechToTextTool(providers=["amazon"]),
    EdenAiParsingIDTool(providers=["amazon","klippa"],language="en"),
    EdenAiParsingInvoiceTool(providers=["amazon","google"],language="en"),
]

agent_chain = initialize_agent(
    tools,
    llm,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True,
    return_intermediate_steps=True,
)

result = agent_chain(""" i have this text : 'i want to slap you' 
                   first : i want to know if this text contains explicit content or not .
                   second : if it does contain explicit content i want to know what is the explicit content in this text, 
                   third : i want to make the text into speech .
                   if there is URL in the observations , you will always put it in the output (final answer) .
                   """)
```

output: 
>  Entering new AgentExecutor chain...
> I need to extract the information from the ID and then convert it to
text and then to speech
> Action: edenai_identity_parsing
> Action Input:
"https://www.citizencard.com/images/citizencard-uk-id-card-2023.jpg"
> Observation: last_name : 
>   value : ANGELA
> given_names : 
>   value : GREENE
> birth_place : 
> birth_date : 
>   value : 2000-11-09
> issuance_date : 
> expire_date : 
> document_id : 
> issuing_state : 
> address : 
> age : 
> country : 
> document_type : 
>   value : DRIVER LICENSE FRONT
> gender : 
> image_id : 
> image_signature : 
> mrz : 
> nationality : 
> Thought: I now need to convert the information to text and then to
speech
> Action: edenai_text_to_speech
> Action Input: "Welcome Angela Greene!"
> Observation:
https://d14uq1pz7dzsdq.cloudfront.net/0c494819-0bbc-4433-bfa4-6e99bd9747ea_.mp3?Expires=1693316851&Signature=YcMoVQgPuIMEOuSpFuvhkFM8JoBMSoGMcZb7MVWdqw7JEf5~67q9dEI90o5todE5mYXB5zSYoib6rGrmfBl4Rn5~yqDwZ~Tmc24K75zpQZIEyt5~ZSnHuXy4IFWGmlIVuGYVGMGKxTGNeCRNUXDhT6TXGZlr4mwa79Ei1YT7KcNyc1dsTrYB96LphnsqOERx4X9J9XriSwxn70X8oUPFfQmLcitr-syDhiwd9Wdpg6J5yHAJjf657u7Z1lFTBMoXGBuw1VYmyno-3TAiPeUcVlQXPueJ-ymZXmwaITmGOfH7HipZngZBziofRAFdhMYbIjYhegu5jS7TxHwRuox32A__&Key-Pair-Id=K1F55BTI9AHGIK
> Thought: I now know the final answer
> Final Answer:
https://d14uq1pz7dzsdq.cloudfront.net/0c494819-0bbc-4433-bfa4-6e99bd9747ea_.mp3?Expires=1693316851&Signature=YcMoVQgPuIMEOuSpFuvhkFM8JoBMSoGMcZb7MVWdqw7JEf5~67q9dEI90o5todE5mYXB5zSYoib6rGrmfBl4Rn5~yqDwZ~Tmc24K75zpQZIEyt5~ZSnHuXy4IFWGmlIVuGYVGMGKxTGNeCRNUXDhT6TXGZlr4mwa79Ei1YT7KcNyc1dsTrYB96LphnsqOERx4X9J9XriSwxn70X8oUPFfQmLcitr-syDhiwd9Wdpg6J5y
> 
>  Finished chain.

Other examples are available in the jupyter notebook.


This PR is made in parallel with  EdenAI LLM update #8963 
I apologize for the messy PR. While working in implementing Tools we
realized there was a few problems we needed to fix on LLM as well.

Ping: @hwchase17, @baskaryan

---------

Co-authored-by: RedhaWassim <rwasssim@gmail.com>
2023-09-01 08:26:56 -07:00
Nuno Campos
5569385ee1 Lint 2023-09-01 15:53:54 +01:00
Nuno Campos
e17275ee57 Add root run wrapping call to RunnableEach() 2023-09-01 15:51:29 +01:00
Nuno Campos
63306899a2 PR review suggestions 2023-09-01 15:50:04 +01:00
Nuno Campos
7966af1e9c Lint 2023-09-01 15:50:04 +01:00
Nuno Campos
4c0e1e501c Re-implement retry, adding a root run, and implement return_exception for batch() and abatch() 2023-09-01 15:50:04 +01:00
Nuno Campos
0eba80912f Lint 2023-09-01 15:49:31 +01:00
Nuno Campos
af2e4ce2cd Use a non-inheritable tag 2023-09-01 15:49:31 +01:00
Nuno Campos
85088dc5df Lint 2023-09-01 15:49:31 +01:00
Nuno Campos
4eecf90f33 Lint 2023-09-01 15:49:31 +01:00
Nuno Campos
2242e2160f Lint 2023-09-01 15:49:31 +01:00
Nuno Campos
b2ac835466 Add .with_retry() to Runnables 2023-09-01 15:49:31 +01:00
Nuno Campos
81ebcc161e Lint 2023-09-01 15:46:53 +01:00
Nuno Campos
fc42726ea0 Styling 2023-09-01 15:32:43 +01:00
Nuno Campos
897f791940 Remove run_id from patch 2023-09-01 15:32:37 +01:00
William Fu-Hinthorn
4d7cd6db5f add cm 2023-09-01 15:32:37 +01:00
Nuno Campos
f9a845b382 Lint 2023-09-01 15:31:08 +01:00
Nuno Campos
06e89c1caa Lint 2023-09-01 15:31:08 +01:00
Nuno Campos
738d93215d Allow patching run_name and max_concurrency 2023-09-01 15:31:08 +01:00
Nuno Campos
9a07032055 Lint 2023-09-01 15:31:08 +01:00
Nuno Campos
5426712311 Adjust merge logic 2023-09-01 15:31:08 +01:00
Nuno Campos
f95bd0bcd9 Fix issue 2023-09-01 15:31:08 +01:00
Nuno Campos
f69155b4f7 Add run_id, run_name to RunnableConfig 2023-09-01 15:31:08 +01:00
Nuno Campos
a3c69cf41d Add .with_config() method to Runnables which allows binding any config values to a Runnable 2023-09-01 15:31:08 +01:00
jmhayes3
324c86acd5
fix typo in web_research.py (#10076)
fix spelling
2023-08-31 22:19:03 -07:00
Davide Menini
3f8f3de28e
fix (parsers/json): do not escape double quotes if already escaped (#9916)
This PR fixes an issues I found when upgrading to a more recent version
of Langchain. I was using 0.0.142 before, and this issue popped up
already when the `_custom_parser` was added to `output_parsers/json`.

Anyway, the issue is that the parser tries to escape quotes when they
are double-escaped (e.g. `\\"`), leading to OutputParserException.
This is particularly undesired in my app, because I have an Agent that
uses a single input Tool, which expects as input a JSON string with the
structure:
```python
{
    "foo": string,
    "bar": string
}
```
The LLM (GPT3.5) response is (almost) always something like
`"action_input": "{\\"foo\\": \\"bar\\", \\"bar\\": \\"foo\\"}"` and
since the upgrade this is not correctly parsed.

---------

Co-authored-by: taamedag <Davide.Menini@swisscom.com>
2023-08-31 17:11:52 -07:00
Harrison Chase
566ce06f4a
add async support for tools (#10058) 2023-08-31 16:52:05 -07:00
Jiří Moravčík
86646ec555
feat: Add ApifyWrapper class (#10067)
If you look at documentation
https://python.langchain.com/docs/integrations/tools/apify (or the
actual file
https://github.com/langchain-ai/langchain/blob/master/docs/extras/integrations/tools/apify.ipynb
), there's a class `ApifyWrapper` mentioned. It seems it got lost in
some refactoring, i.e. it does not exist in the codebase ATM.

I just propose to add it back.
It would fix issues e.g.
https://github.com/langchain-ai/langchain/issues/8307 or
https://github.com/langchain-ai/langchain/issues/8201

To add, Apify is a wanted integration, e.g. see
https://twitter.com/hwchase17/status/1695490295914545626 or
https://twitter.com/hwchase17/status/1695470765343461756

Lastly, I offer taking ownership of the Apify-related parts of the
codebase, so you can tag me if anything is needed.
2023-08-31 15:47:44 -07:00
Robert Perrotta
02e51f4217
update_forward_refs for Run (#9969)
Adds a call to Pydantic's `update_forward_refs` for the `Run` class (in
addition to the `ChainRun` and `ToolRun` classes, for which that method
is already called). Without it, the self-reference of child classes
(type `List[Run]`) is problematic. For example:

```python
from langchain.callbacks import StdOutCallbackHandler
from langchain.chains import LLMChain
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
from wandb.integration.langchain import WandbTracer

llm = OpenAI()
prompt = PromptTemplate.from_template("1 + {number} = ")

chain = LLMChain(llm=llm, prompt=prompt, callbacks=[StdOutCallbackHandler(), WandbTracer()])
print(chain.run(number=2))

```

results in the following output before the change

```
WARNING:root:Error in on_chain_start callback: field "child_runs" not yet prepared so type is still a ForwardRef, you might need to call Run.update_forward_refs().

> Entering new LLMChain chain...
Prompt after formatting:
1 + 2 = 
WARNING:root:Error in on_chain_end callback: No chain Run found to be traced

> Finished chain.

3
```

but afterwards the callback error messages are gone.
2023-08-31 15:25:59 -07:00
Eugene Yurtsev
74fcfed4e2
lint for pydantic imports (#9937)
Catch pydantic imports
2023-08-31 15:55:29 -04:00
Zizhong Zhang
641b71e2cd
refactor: rename to OpaquePrompts (#10013)
Renamed to OpaquePrompts

cc @baskaryan Thanks in advance!
2023-08-31 12:21:24 -07:00
Bagatur
19400ba253
bump 278 (#10052) 2023-08-31 07:35:42 -07:00
Bagatur
29270e0378
fix #3117 (#9957)
fix #3117
2023-08-31 07:29:49 -07:00
Bagatur
5b913003e0 bump 2023-08-31 07:27:56 -07:00
Bagatur
4b15328767
Add indexing support for postgresql (#9933)
Add support to postgresql for the SQL Manager Record

This code was tested locally. I'm looking at how to add testing with
postgres in a separate PR.
2023-08-31 07:27:09 -07:00
Bagatur
e60e1cdf23
fixed openai_functions api_response format args err (#9968)
root cause: args may not have a key (params) resulting in an error
2023-08-31 00:49:19 -07:00
Bagatur
3efab8d3df
implement vectorstores by tencent vectordb (#9989)
Hi there!
I'm excited to open this PR to add support for using 'Tencent Cloud
VectorDB' as a vector store.

Tencent Cloud VectorDB is a fully-managed, self-developed,
enterprise-level distributed database service designed for storing,
retrieving, and analyzing multi-dimensional vector data. The database
supports multiple index types and similarity calculation methods, with a
single index supporting vector scales up to 1 billion and capable of
handling millions of QPS with millisecond-level query latency. Tencent
Cloud VectorDB not only provides external knowledge bases for large
models to improve their accuracy, but also has wide applications in AI
fields such as recommendation systems, NLP services, computer vision,
and intelligent customer service.

The PR includes:
 Implementation of Vectorstore.

I have read your [contributing
guidelines](72b7d76d79/.github/CONTRIBUTING.md).
And I have passed the tests below

 make format
 make lint
 make coverage
 make test
2023-08-31 00:48:25 -07:00
Bagatur
d43a36c32a
Bagatur/dereference tool schema (#10007)
fix for #9375
2023-08-31 00:48:12 -07:00
Bagatur
6b5a970949
refactor(document_loaders): abstract page evaluation logic in PlaywrightURLLoader (#9995)
This PR brings structural updates to `PlaywrightURLLoader`, aiming at
making the code more readable and extensible through the abstraction of
page evaluation logic. These changes also align this implementation with
a similar structure used in LangChain.js.

The key enhancements include:

1. Introduction of 'PlaywrightEvaluator', an abstract base class for all
evaluators.
2. Creation of 'UnstructuredHtmlEvaluator', a concrete class
implementing 'PlaywrightEvaluator', which uses `unstructured` library
for processing page's HTML content.
3. Extension of 'PlaywrightURLLoader' constructor to optionally accept
an evaluator of the type 'PlaywrightEvaluator'. It defaults to
'UnstructuredHtmlEvaluator' if no evaluator is provided.
4. Refactoring of 'load' and 'aload' methods to use the 'evaluate' and
'evaluate_async' methods of the provided 'PageEvaluator' for page
content handling.

This update brings flexibility to 'PlaywrightURLLoader' as it can now
utilize different evaluators for page processing depending on the
requirement. The abstraction also improves code maintainability and
readability.

Twitter: @ywkim
2023-08-31 00:45:33 -07:00
Bagatur
b1644bc9ad cr 2023-08-31 00:43:34 -07:00
Hunsmore
13fef1e5d3
add bloomz_7b, llama-2-7b, llama-2-13b, llama-2-70b to ErnieBotChat (#10024)
- Description: Add bloomz_7b, llama-2-7b, llama-2-13b, llama-2-70b to
ErnieBotChat, which only supported ERNIE-Bot-turbo and ERNIE-Bot.
  - Issue: #10022,
  - Dependencies: no extra dependencies

---------

Co-authored-by: hetianfeng <hetianfeng@meituan.com>
2023-08-31 00:38:55 -07:00
skspark
52a3e8a261
Add integration TCs on bing search (#8068) (#10021)
## Description
Added integration TCs on bing search utility

## Issue
#8068 

## Dependencies
None
2023-08-31 00:34:06 -07:00
William FH
5341b04d68
Update error message (#9970)
in evals
2023-08-30 17:42:55 -07:00
William FH
b82ad19ed2
Check memory address (#9971)
Don't want to dup the collector but can have multiple
2023-08-30 15:30:22 -07:00
Bagatur
e805f8e263 add tests 2023-08-30 15:23:02 -07:00
Bagatur
1f5c579ef4 add 2023-08-30 13:37:50 -07:00
Bagatur
240cc289e6 wip 2023-08-30 13:37:39 -07:00
maks-operlejn-ds
a8f804a618
Add data anonymizer (#9863)
### Description

The feature for anonymizing data has been implemented. In order to
protect private data, such as when querying external APIs (OpenAI), it
is worth pseudonymizing sensitive data to maintain full privacy.

Anonynization consists of two steps:

1. **Identification:** Identify all data fields that contain personally
identifiable information (PII).
2. **Replacement**: Replace all PIIs with pseudo values or codes that do
not reveal any personal information about the individual but can be used
for reference. We're not using regular encryption, because the language
model won't be able to understand the meaning or context of the
encrypted data.

We use *Microsoft Presidio* together with *Faker* framework for
anonymization purposes because of the wide range of functionalities they
provide. The full implementation is available in `PresidioAnonymizer`.

### Future works

- **deanonymization** - add the ability to reverse anonymization. For
example, the workflow could look like this: `anonymize -> LLMChain ->
deanonymize`. By doing this, we will retain anonymity in requests to,
for example, OpenAI, and then be able restore the original data.
- **instance anonymization** - at this point, each occurrence of PII is
treated as a separate entity and separately anonymized. Therefore, two
occurrences of the name John Doe in the text will be changed to two
different names. It is therefore worth introducing support for full
instance detection, so that repeated occurrences are treated as a single
object.

### Twitter handle
@deepsense_ai / @MaksOpp

---------

Co-authored-by: MaksOpp <maks.operlejn@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-30 10:39:44 -07:00
Bagatur
b3e3a31240
bump 277 (#9997) 2023-08-30 08:29:51 -07:00
Bagatur
9828701de1
mv base cache to schema (#9953)
if you remove all other imports from langchain.init it exposes a
circular dep
2023-08-30 08:10:51 -07:00
Christophe Bornet
9870bfb9cd
Add bucket and object key to metadata in S3 loader (#9317)
- Description: this PR adds `s3_object_key` and `s3_bucket` to the doc
metadata when loading an S3 file. This is particularly useful when using
`S3DirectoryLoader` to remove the files from the dir once they have been
processed (getting the object keys from the metadata `source` field
seems brittle)
  - Dependencies: N/A
  - Tag maintainer: ?
  - Twitter handle: _cbornet

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2023-08-30 11:03:24 -04:00
Eugene Yurtsev
6da158388b Merge branch 'master' into ywkim/master 2023-08-30 10:46:26 -04:00
Guy Korland
24c0b01c38
Extend the FalkorDB QA demo (#9992)
- Description: Extend the FalkorDB QA demo
  - Tag maintainer: @baskaryan
2023-08-30 10:13:18 -04:00
Eugene Yurtsev
588237ef30
Make document serializable, create utility to create a docstore (#9674)
This PR makes the following changes:

1. Documents become serializable using langhchain serialization
2. Make a utility to create a docstore kw store

Will help to address issue here:
https://github.com/langchain-ai/langchain/issues/9345
2023-08-30 09:45:04 -04:00
Eugene Yurtsev
e8f29be350 x 2023-08-30 09:36:27 -04:00
Buckler89
a28e888b36
fix call _get_keys for custom_evaluator (#9763)
In the function _load_run_evaluators the function _get_keys was not
called if only custom_evaluators parameter is used


- Description: In the function _load_run_evaluators the function
_get_keys was not called if only custom_evaluators parameter is used,
  - Issue: no issue created for this yet,
  - Dependencies: None,
  - Tag maintainer: @vowelparrot,
  - Twitter handle: Buckler89

---------

Co-authored-by: ddroghini <d.droghini@mflgroup.com>
2023-08-30 06:35:23 -07:00
Eugene Yurtsev
cafce9ed23 x 2023-08-30 09:35:00 -04:00
wlleiiwang
8c4e29240c implement vectorstores by tencent vectordb 2023-08-30 16:40:58 +08:00
Bagatur
2d2b097fab
mv chat history (#9725) 2023-08-29 21:41:32 -07:00
Bagatur
d762a6b51f
rm mutable defaults (#9974) 2023-08-29 20:36:27 -07:00
Arjun Aravindan
6a51672164
Update SeleniumURLLoader to use webdriver Service in favor of deprecated executable_path parameter (#9814)
Description: This commit uses the new Service object in Selenium
webdriver as executable_path has been [deprecated and removed in
selenium version
4.11.2](9f5801c82f)
Issue: https://github.com/langchain-ai/langchain/issues/9808
Tag Maintainer: @eyurtsev
2023-08-29 19:45:18 -07:00
William FH
c844aaa7a6
Weakref to tracer (#9954)
Prevent memory/thread leakage
2023-08-29 19:27:22 -07:00
Jurik-001
a05fed9369
Fix add callbacks to spark_sql due to depreciation of callback_manager (#9831)
Description: Due to depreciation (regarding to line 109 in
[langchain/libs/langchain/langchain/chains/base.py](https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/chains/base.py)
of callback_manager i replaced several parts

Issue: None
Dependencies: 
Maintainer: @baskaryan

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-29 19:23:44 -07:00
dafu
c26deb6b38
fixed openai_functions api_response format args err
root cause: args may not have a key (params) resulting in an error
2023-08-30 09:58:24 +08:00
axiangcoding
ffa5625134
feat(llms): improve ERNIE-Bot chat model (#9833)
- Description: improve ERNIE-Bot chat model, add request timeout and
more testcases.
  - Issue: None
  - Dependencies: None
  - Tag maintainer: @baskaryan

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-29 18:20:06 -07:00
Bagatur
d966ba63e2
fixed GoogleCloudEnterpriseSearchRetriever returning an empty array (#9858)
`GoogleCloudEnterpriseSearchRetriever` returned an empty array of
documents earlier, fixed
2023-08-29 17:49:48 -07:00
Bagatur
ec362ecbe2
Fixed regex bug in RetrievalQAWithSources in previous update (#9898)
- Description: In my previous PR, I had modified the code to catch all
kinds of [SOURCES, sources, Source, Sources]. However, this change
included checking for a colon or a white space which should actually
have been only checking for a colon.
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
2023-08-29 17:32:24 -07:00
Nikhil Suresh
56a0165a4e cleaned up unit test example 2023-08-29 23:37:54 +00:00
William FH
cedfad541d
don't emit none from eval config (#9963) 2023-08-29 16:14:32 -07:00
Nikhil Suresh
b31475c622 minor updates to regex 2023-08-29 23:13:31 +00:00
Bagatur
8fb0a9594c
Add LLMonitor Callback Handler Integration - open-source observability & analytics (#9870)
Adds support for [llmonitor](https://llmonitor.com) callbacks.

It enables:
- Requests tracking / logging / analytics
- Error debugging
- Cost analytics
- User tracking

Let me know if anythings neds to be changed for merge.

Thank you!
2023-08-29 15:49:01 -07:00