Commit Graph

3086 Commits (224199083bbc644e9bfac60cfedfab7b9841eb54)
 

Author SHA1 Message Date
Sam Coward 224199083b
Fix missing chain classname in StdOutCallbackHandler.on_chain_start (#6124)
Retrieves the name of the class from new location as of commit
18af149e91


Co-authored-by: Zander Chase <130414180+vowelparrot@users.noreply.github.com>
1 year ago
lucasiscovici af3f401015
update base class of ListStepContainer to BaseStepContainer (#6232)
update base class of ListStepContainer to BaseStepContainer

Fixes #6231
1 year ago
Matt Adams 98e1bbfbbd
Add missing dependencies to apify.ipynb (#6331)
Fixes errors caused by missing dependencies when running the notebook.
1 year ago
Ma Donghao 6f62e5461c
Update the parser regex of map_rerank (#6419)
Sometimes the score responded by chatgpt would be like 'Respone
example\nScore: 90 (fully answers the question, but could provide more
detail on the specific error message)'
For the score contains not only numbers, it raise a ValueError like 


Update the RegexParser from `.*` to `\d*` would help us to ignore the
text after number.

Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago
Bagatur b08f903755
fix chroma init bug (#7639) 1 year ago
Nir Gazit f307ca094b
fix(memory): allow internal chains to use memory (#6769)
Fixed #6768.

This is a workaround only. I think a better longer-term solution is for
chains to declare how many input variables they *actually* need (as
opposed to ones that are in the prompt, where some may be satisfied by
the memory). Then, a wrapping chain can check the input match against
the actual input variables.

@hwchase17
1 year ago
Francisco Ingham 488d2d5da9
Entity extraction improvements (#6342)
Added fix to avoid irrelevant attributes being returned plus an example
of extracting unrelated entities and an exampe of using an 'extra_info'
attribute to extract unstructured data for an entity.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago
Nir Gazit a8bbfb2da3
feat(agents): allow trimming of intermediate steps to last N (#6476)
Added an option to trim intermediate steps to last N steps. This is
especially useful for long-running agents. Users can explicitly specify
N or provide a function that does custom trimming/manipulation on
intermediate steps. I've mimicked the API of the `handle_parsing_errors`
parameter.
1 year ago
Zeeland 92ef77da35
fix: remove useless variable k (#6524)
remove useless variable k

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago
Bagatur 7f8ff2a317
add tagger nb (#7637) 1 year ago
Sidchat95 c5e50c40c9
Fix Document Similarity Check with passed Threshold (#6845)
Converting the Similarity obtained in the
similarity_search_with_score_by_vector method whilst comparing to the
passed
threshold. This is because the passed threshold is a number between 0 to
1 and is already in the relevance_score_fn format.
As of now, the function is comparing two different scoring parameters
and that wouldn't work.

Dependencies
None

Issue:
Different scores being compared in
similarity_search_with_score_by_vector method in FAISS.

Tag maintainer
@hwchase17



<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description: a description of the change, 
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @dev2049
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @dev2049
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @vowelparrot
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago
Jacob Ajit a08baa97c5
Use modern OpenAI endpoints for embeddings (#6573)
- Description: 

LangChain passes
[engine](https://github.com/hwchase17/langchain/blob/master/langchain/embeddings/openai.py#L256)
and not `model` as a field when making OpenAI requests. Within the
`openai` Python library, for OpenAI requests, this [makes a
call](https://github.com/openai/openai-python/blob/main/openai/api_resources/abstract/engine_api_resource.py#L58)
to an endpoint of the form
`https://api.openai.com/v1/engines/{engine_id}/embeddings`.

These endpoints are
[deprecated](https://help.openai.com/en/articles/6283125-what-happened-to-engines)
in favor of endpoints of the format
`https://api.openai.com/v1/embeddings`, where `model` is passed as a
parameter in the request body.

While these deprecated endpoints continue to function for now, they may
not be supported indefinitely and should be avoided in favor of the
newer API format.

It appears that `engine` was passed in instead of `model` to make both
Azure OpenAI and OpenAI calls work similarly. However, the inclusion of
`engine`
[causes](https://github.com/openai/openai-python/blob/main/openai/api_resources/abstract/engine_api_resource.py#L58)
OpenAI to use the deprecated endpoint, requiring a diverging code path
for Azure OpenAI calls where `engine` is passed in additionally (Azure
OpenAI requires `engine` to specify a deployment, and can optionally
take in `model`).

In the long-term, it may be worth considering spinning off Azure OpenAI
embeddings into a separate class for ease of use and maintenance,
similar to the [implementation for chat
models](https://github.com/hwchase17/langchain/blob/master/langchain/chat_models/azure_openai.py).
1 year ago
Jacob Lee cdb93ab5ca
Adds OpenAI functions powered document metadata tagger (#7521)
Adds a new document transformer that automatically extracts metadata for
a document based on an input schema. I also moved
`document_transformers.py` to `document_transformers/__init__.py` to
group it with this new transformer - it didn't seem to cause issues in
the notebook, but let me know if I've done something wrong there.

Also had a linter issue I couldn't figure out:

```
MacBook-Pro:langchain jacoblee$ make lint
poetry run mypy .
docs/dist/conf.py: error: Duplicate module named "conf" (also at "./docs/api_reference/conf.py")
docs/dist/conf.py: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#mapping-file-paths-to-modules for more info
docs/dist/conf.py: note: Common resolutions include: a) using `--exclude` to avoid checking one of them, b) adding `__init__.py` somewhere, c) using `--explicit-package-bases` or adjusting MYPYPATH
Found 1 error in 1 file (errors prevented further checking)
make: *** [lint] Error 2
```

@rlancemartin @baskaryan

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago
Jason Fan 8effd90be0
Add new types of document transformers (#7379)
- Description: Add two new document transformers that translates
documents into different languages and converts documents into q&a
format to improve vector search results. Uses OpenAI function calling
via the [doctran](https://github.com/psychic-api/doctran/tree/main)
library.
  - Issue: N/A
  - Dependencies: `doctran = "^0.0.5"`
  - Tag maintainer: @rlancemartin @eyurtsev @hwchase17 
  - Twitter handle: @psychicapi or @jfan001

Notes
- Adheres to the `DocumentTransformer` abstraction set by @dev2049 in
#3182
- refactored `EmbeddingsRedundantFilter` to put it in a file under a new
`document_transformers` module
- Added basic docs for `DocumentInterrogator`, `DocumentTransformer` as
well as the existing `EmbeddingsRedundantFilter`

---------

Co-authored-by: Lance Martin <lance@langchain.dev>
Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago
Piyush Jain f11d845dee
Fixed validation error when credentials_profile_name, or region_name is not passed (#7629)
## Summary
This PR corrects the checks for credentials_profile_name, and
region_name attributes. This was causing validation exceptions when
either of these values were missing during creation of the retriever
class.

Fixes #7571 

#### Requested reviewers:
@baskaryan
1 year ago
Jamie Broomall 0e1d7a27c6
WhyLabsCallbackHandler updates (#7621)
Updates to the WhyLabsCallbackHandler and example notebook
- Update dependency to langkit 0.0.6 which defines new helper methods
for callback integrations
- Update WhyLabsCallbackHandler to use the new `get_callback_instance`
so that the callback is mostly defined in langkit
- Remove much of the implementation of the WhyLabsCallbackHandler here
in favor of the callback instance

This does not change the behavior of the whylabs callback handler
implementation but is a reorganization that moves some of the
implementation externally to our optional dependency package, and should
make future updates easier.

@agola11
1 year ago
Gaurang Pawar 53722dcfdc
Fixed a typo in pinecone_hybrid_search.ipynb (#7627)
Fixed a small typo in documentation
1 year ago
Bagatur 1d4db1327a
fix openai structured chain with pydantic (#7622)
should return pydantic class
1 year ago
Bagatur ee70d4a0cd
mv tutorials (#7614) 1 year ago
William FH 9b215e761e
Stop warning when parent run ID not present (#7611) 1 year ago
William FH 2f848294cb
Rm Warning that Tracing is Experimental (#7612) 1 year ago
Yaohui Wang d85c33a5c3
Fix the markdown rendering issue with a code block inside a markdown code block (#6625)
### Description

- Fix the markdown rendering issue with a code block inside a markdown,
using a different number of backticks for the delimiters.

Current doc site:
<https://python.langchain.com/docs/modules/data_connection/document_transformers/text_splitters/code_splitter#markdown>

After fix:
<img width="480" alt="image"
src="https://github.com/hwchase17/langchain/assets/3115235/d9921d59-64e6-4a34-9c62-79743667f528">


### Who can review

PTAL @dev2049 

Co-authored-by: Yaohui Wang <wangyaohui.01@bytedance.com>
1 year ago
Yaroslav Halchenko 0d92a7f357
codespell: workflow, config + some (quite a few) typos fixed (#6785)
Probably the most  boring PR to review ;)

Individual commits might be easier to digest

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
1 year ago
Sam 931e68692e
Adds a chain around sympy for symbolic math (#6834)
- Description: Adds a new chain that acts as a wrapper around Sympy to
give LLMs the ability to do some symbolic math.
- Dependencies: SymPy

---------

Co-authored-by: sreiswig <sreiswig@github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago
Bharat Ramanathan be29a6287d
feat: add model architecture back to wandb tracer (#6806)
# Description

This PR adds model architecture to the `WandbTracer` from the Serialized
Run kwargs. This allows visualization of the calling parameters of an
Agent, LLM and Tool in Weights & Biases.
    1. Safely serialize the run objects to WBTraceTree model_dict
    2. Refactors the run processing logic to be more organized.

- Twitter handle: @parambharat

---------

Co-authored-by: Bharat Ramanathan <ramanathan.parameshwaran@gohuddl.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago
Alex Iribarren adc96d60b6
Implement Function Callback tracer (#6835)
Description: I wanted to be able to redirect debug output to a function,
but it wasn't very easy. I figured it would make sense to implement a
`FunctionCallbackHandler`, and reimplement `ConsoleCallbackHandler` as a
subclass that calls the `print` function. Now I can create a simple
subclass in my project that calls `logging.info` or whatever I need.

Tag maintainer: @agola11
Twitter handle: `@andandaraalex`
1 year ago
Ducasse-Arthur 93a84f6182
Update bedrock.py - support of other endpoint url (esp. for users of … (#7592)
Added an _endpoint_url_ attribute to Bedrock(LLM) class - I have access
to Bedrock only via us-west-2 endpoint and needed to change the endpoint
url, this could be useful to other users
1 year ago
Bagatur 22525bad65
bump 231 (#7584) 1 year ago
Subsegment 6e1000dc8d
docs : Use more meaningful cnosdb examples (#7587)
This change makes the ecosystem integrations cnosdb documentation more
realistic and easy to understand.

- change examples of question and table
- modify typo and format
1 year ago
Samuel ROZE f3c9bf5e4b
fix(typo): Clarify the point of `llm_chain` (#7593)
Fixes a typo introduced in
https://github.com/hwchase17/langchain/pull/7080 by @hwchase17.

In the example (visible on [the online
documentation](https://api.python.langchain.com/en/latest/chains/langchain.chains.conversational_retrieval.base.ConversationalRetrievalChain.html#langchain-chains-conversational-retrieval-base-conversationalretrievalchain)),
the `llm_chain` variable is unused as opposed to being used for the
question generator. This change makes it clearer.
1 year ago
Alec Flett 6cdd4b5edc
only add handlers if they are new (#7504)
When using callbacks, there are times when callbacks can be added
redundantly: for instance sometimes you might need to create an llm with
specific callbacks, but then also create and agent that uses a chain
that has those callbacks already set. This means that "callbacks" might
get passed down again to the llm at predict() time, resulting in
duplicate calls to the `on_llm_start` callback.

For the sake of simplicity, I made it so that langchain never adds an
exact handler/callbacks object in `add_handler`, thus avoiding the
duplicate handler issue.

Tagging @hwchase17 for callback review

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago
ausboss 50316f6477
Adding LLM wrapper for Kobold AI (#7560)
- Description: add wrapper that lets you use KoboldAI api in langchain
  - Issue: n/a
  - Dependencies: none extra, just what exists in lanchain
  - Tag maintainer: @baskaryan 
  - Twitter handle: @zanzibased
---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago
Rohit Kumar Singh 603a0bea29
Fixes incorrect docstore creation in faiss.py (#7026)
- **Description**: Current implementation assumes that the length of
`texts` and `ids` should be same but if the passed `ids` length is not
equal to the passed length of `texts`, current code
`dict(zip(index_to_id.values(), documents))` is not failing or giving
any warning and silently creating docstores only for the passed `ids`
i.e. if `ids = ['A']` and `texts=["I love Open Source","I love
langchain"]` then only one `docstore` will be created. But either two
docstores should be created assuming same id value for all the elements
of `texts` or an error should be raised.
  
- **Issue**: My change fixes this by using dictionary comprehension
instead of `zip`. This was if lengths of `ids` and `texts` mismatches an
explicit `IndexError` will be raised.
  
@rlancemartin, @eyurtsev
1 year ago
Tommy Hyeonwoo Kim 3f7213586e
add supported properties for notiondb document loader's metadata (#7570)
fix #7569

add following properties for Notion DB document loader's metadata
- `unique_id`
- `status`
- `people`

@rlancemartin, @eyurtsev (Since this is a change related to
`DataLoaders`)

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago
Junlin Zhou 5f17c57174
Update chat agents' output parser to extract action by regex (#7511)
Currently `ChatOutputParser` extracts actions by splitting the text on
"```", and then load the second part as a json string.

But sometimes the LLM will wrap the action in markdown code block like:

````markdown
```json
{
  "action": "foo",
  "action_input": "bar"
}
```
````

Splitting text on "```" will cause `OutputParserException` in such case.

This PR changes the behaviour to extract the `$JSON_BLOB` by regex, so
that it can handle both ` ``` ``` ` and ` ```json ``` `

@hinthornw

---------

Co-authored-by: Junlin Zhou <jlzhou@zjuici.com>
1 year ago
Bagatur ebcb144342
unit test sqlalachemy (#7582) 1 year ago
Harrison Chase 641fd74baa
Harrison/pg vector move (#7580) 1 year ago
os1ma 2667ddc686
Fix `make docs_build` and related scripts (#7276)
**Description: a description of the change**

Fixed `make docs_build` and related scripts which caused errors. There
are several changes.

First, I made the build of the documentation and the API Reference into
two separate commands. This is because it takes less time to build. The
commands for documents are `make docs_build`, `make docs_clean`, and
`make docs_linkcheck`. The commands for API Reference are `make
api_docs_build`, `api_docs_clean`, and `api_docs_linkcheck`.

It looked like `docs/.local_build.sh` could be used to build the
documentation, so I used that. Since `.local_build.sh` was also building
API Rerefence internally, I removed that process. `.local_build.sh` also
added some Bash options to stop in error or so. Futher more added `cd
"${SCRIPT_DIR}"` at the beginning so that the script will work no matter
which directory it is executed in.

`docs/api_reference/api_reference.rst` is removed, because which is
generated by `docs/api_reference/create_api_rst.py`, and added it to
.gitignore.

Finally, the description of CONTRIBUTING.md was modified.

**Issue: the issue # it fixes (if applicable)**

https://github.com/hwchase17/langchain/issues/6413

**Dependencies: any dependencies required for this change**

`nbdoc` was missing in group docs so it was added. I installed it with
the `poetry add --group docs nbdoc` command. I am concerned if any
modifications are needed to poetry.lock. I would greatly appreciate it
if you could pay close attention to this file during the review.

**Tag maintainer**
- General / Misc / if you don't know who to tag: @baskaryan

If this PR needs any additional changes, I'll be happy to make them!

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago
Pharbie 74c28df363
Update Pinecone Upsert method usage (#7358)
Description: Refactor the upsert method in the Pinecone class to allow
for additional keyword arguments. This change adds flexibility and
extensibility to the method, allowing for future modifications or
enhancements. The upsert method now accepts the `**kwargs` parameter,
which can be used to pass any additional arguments to the Pinecone
index. This change has been made in both the `upsert` method in the
`Pinecone` class and the `upsert` method in the
`similarity_search_with_score` class method. Falls in line with the
usage of the upsert method in
[Pinecone-Python-Client](4640c4cf27/pinecone/index.py (L73))
Issue: [This feature request in Pinecone
Repo](https://github.com/pinecone-io/pinecone-python-client/issues/184)

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @baskaryan
  - Memory: @hwchase17

---------

Co-authored-by: kwesi <22204443+yankskwesi@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Lance Martin <122662504+rlancemartin@users.noreply.github.com>
1 year ago
Kazuki Maeda 5c3fe8b0d1
Enhance Makefile with 'format_diff' Option and Improved Readability (#7394)
### Description:

This PR introduces a new option format_diff to the existing Makefile.
This option allows us to apply the formatting tools (Black and isort)
only to the changed Python and ipynb files since the last commit. This
will make our development process more efficient as we only format the
codes that we modify. Along with this change, comments were added to
make the Makefile more understandable and maintainable.

### Issue:

N/A

### Dependencies:

Add dependency to black.

### Tag maintainer:

@baskaryan

### Twitter handle:

[kzk_maeda](https://twitter.com/kzk_maeda)

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago
Bagatur 2babe3069f
Revert pinecone v4 support (#7566)
Revert 9d13dcd
1 year ago
schop-rob e811c5e8c6
Add OpenAI organization ID to docs (#7398)
Description: I added an example of how to reference the OpenAI API
Organization ID, because I couldn't find it before. In the example, it
is mentioned how to achieve this using environment variables as well as
parameters for the OpenAI()-class
Issue: -
Dependencies: -
Twitter @schop-rob
1 year ago
Kenny 8741e55e7c
Template formats documentation (#7404)
Simple addition to the documentation, adding the correct import
statement & showcasing using Python FStrings.
1 year ago
Fielding Johnston 00c466627a
minor bug fix: properly await AsyncRunManager's method call in MulitRouteChain (#7487)
This simply awaits `AsyncRunManager`'s method call in `MulitRouteChain`.
Noticed this while playing around with Langchain's implementation of
`MultiPromptChain`. @baskaryan

cheers

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago
tonomura cc0585af42
Improvement/add finish reason to generation info in chat open ai (#7478)
Description: ChatOpenAI model does not return finish_reason in
generation_info.
Issue: #2702
Dependencies: None
Tag maintainer: @baskaryan 

Thank you

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago
Junlin Zhou b96ac13f3d
Minor update to reference other sql tool by tool names instead of hard coded string. (#7514)
<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description: a description of the change, 
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @baskaryan
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @baskaryan
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @hinthornw
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->

Currently there are 4 tools in SQL agent-toolkits, and 2 of them have
reference to the other 2.

This PR change the reference from hard coded string to `{tool.name}`

Co-authored-by: Junlin Zhou <jlzhou@zjuici.com>
1 year ago
OwenElliott 9cb2347453
Fix broken link from Marqo Ecosystem (#7510)
Small fix to a link from the Marqo page in the ecosystem.

The link was not updated correctly when the documentation structure
changed to html pages instead of links to notebooks.
1 year ago
Matt Robinson c4d53f98dc
docs: update unstructured docstrings (#7561)
### Summary

Updates the docstrings in the Unstructured document loaders to display
more useful information on the integrations page.
1 year ago
Ben Auffarth 2c2f0e15a6
clarify about api key (#7540)
I found it unclear, where to get the API keys for JinaChat. Mentioning
this in the docstring should be helpful.
#7490 

Twitter handle: benji1a

@delgermurun
1 year ago
Jona Sassenhagen 0ea7224535
[Minor] Remove tagger from spacy sentencizer (#7534)
@svlandeg gave me a tip for how to improve a bit on
https://github.com/hwchase17/langchain/pull/7442 for some extra speed
and memory gains. The tagger isn't needed for sentencization, so can be
disabled too.
1 year ago