Commit Graph

1583 Commits

Author SHA1 Message Date
Bagatur
a8c68d4ffa
Type LLMChain.llm as runnable (#12385) 2023-10-27 11:52:01 -07:00
Bagatur
d12b88557a
Bagatur/bump 325 (#12440) 2023-10-27 11:49:09 -07:00
Eugene Yurtsev
cadfce295f
Deprecate PythonRepl tools and Pandas/Xorbits/Spark DataFrame/Python/CSV agents (#12427)
See discussion here:
https://github.com/langchain-ai/langchain/discussions/11680

The code is available for usage from langchain_experimental. The reason
for the deprecation is that the agents are relying on a Python REPL. The
code can only be run safely with appropriate sandboxing.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-27 14:16:42 -04:00
Harrison Chase
0ca539eb85
Clean up deprecated agents and update __init__ in experimental (#12231)
Update init paths in experimental
2023-10-27 13:52:50 -04:00
Holt Skinner
134f085824
feat: Add Google Speech to Text API Document Loader (#12298)
- Add Document Loader for Google Speech to Text
  - Similar Structure to [Assembly AI Document Loader][1]

[1]:
https://python.langchain.com/docs/integrations/document_loaders/assemblyai
2023-10-27 09:34:26 -07:00
David Duong
52c194ec3a
Fix templates typos (#12428) 2023-10-27 09:32:57 -07:00
Massimiliano Pronesti
c8195769f2
fix(openai-callback): completion count logic (#12383)
The changes introduced in #12267 and #12190 broke the cost computation
of the `completion` tokens for fine-tuned models because of the early
return. This PR aims at fixing this.
@baskaryan.
2023-10-27 09:08:54 -07:00
Stefan Langenbach
b22da81af8
Mask API key for Aleph Alpha LLM (#12377)
- **Description:** Add masking of API Key for Aleph Alpha LLM when
printed.
- **Issue**: #12165
- **Dependencies:** None
- **Tag maintainer:** @eyurtsev

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2023-10-27 11:32:43 -04:00
William FH
4254028c52
Str Evaluator Mapper (#12401) 2023-10-26 21:38:47 -07:00
William FH
fcad1d2965
Add space (#12395) 2023-10-26 20:32:23 -07:00
William FH
922d7910ef
Wfh/json schema evaluation (#12389)
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2023-10-26 20:32:05 -07:00
Christian Kasim Loan
a35445c65f
johnsnowlabs embeddings support (#11271)
- **Description:** Introducing the
[JohnSnowLabsEmbeddings](https://www.johnsnowlabs.com/)
  - **Dependencies:** johnsnowlabs
  - **Tag maintainer:** @C-K-Loan
- **Twitter handle:** https://twitter.com/JohnSnowLabs
https://twitter.com/ChristianKasimL

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-26 20:22:50 -07:00
SteveLiao
c08b622b2d
Add HTML Title and Page Language into metadata for AsyncHtmlLoader (#11326)
**Description:** 
Revise `libs/langchain/langchain/document_loaders/async_html.py` to
store the HTML Title and Page Language in the `metadata` of
`AsyncHtmlLoader`.
2023-10-26 20:22:31 -07:00
Shorthills AI
25c98dbba9
Fixed some grammatical and Exception types issues (#12015)
Fixed some grammatical issues and Exception types.

@baskaryan , @eyurtsev

---------

Co-authored-by: Sanskar Tanwar <142409040+SanskarTanwarShorthillsAI@users.noreply.github.com>
Co-authored-by: UpneetShorthillsAI <144228282+UpneetShorthillsAI@users.noreply.github.com>
Co-authored-by: HarshGuptaShorthillsAI <144897987+HarshGuptaShorthillsAI@users.noreply.github.com>
Co-authored-by: AdityaKalraShorthillsAI <143726711+AdityaKalraShorthillsAI@users.noreply.github.com>
Co-authored-by: SakshiShorthillsAI <144228183+SakshiShorthillsAI@users.noreply.github.com>
2023-10-26 21:12:38 -04:00
William FH
923696b664
Wfh/json edit dist (#12361)
Compare predicted json to reference. First canonicalize (sort keys, rm
whitespace separators), then return normalized string edit distance.

Not a silver bullet but maybe an easy way to capture structure
differences in a less flakey way

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2023-10-26 18:10:28 -07:00
Erick Friis
4db8d82c55
CLI CI 2 (#12387)
Will run all CI because of _test change, but future PRs against CLI will
only trigger the new CLI one

Has a bunch of file changes related to formatting/linting.

No mypy yet - coming soon
2023-10-26 17:01:31 -07:00
Tyler Hutcherson
231d553824
Update broken redis tests (#12371)
Update broken redis tests -- tiny PR :) 
- **Description:** Fixes Redis tests on master (look like it was broken
by https://github.com/langchain-ai/langchain/pull/11257)
  - **Issue:** None,
  - **Dependencies:** No
  - **Tag maintainer:** @baskaryan @Spartee 
  - **Twitter handle:** N/A

Co-authored-by: Sam Partee <sam.partee@redis.com>
2023-10-26 16:13:14 -07:00
Erick Friis
03e79e62c2
cli fix (#12380) 2023-10-26 15:29:49 -07:00
Bagatur
76230d2c08
fireworks scheduled integration tests (#12373) 2023-10-26 14:24:42 -07:00
Josh Phillips
01c5cd365b
Fix SupbaseVectoreStore write operation timeout (#12318)
**Description**
This small change will make chunk_size a configurable parameter for
loading documents into a Supabase database.

**Issue**
https://github.com/langchain-ai/langchain/issues/11422

**Dependencies**
No chanages

**Twitter**
@ j1philli

**Reminder**
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.

---------

Co-authored-by: Greg Richardson <greg.nmr@gmail.com>
2023-10-26 14:19:17 -07:00
Bagatur
b10cefb160
lint fix: rm init (#12374) 2023-10-26 14:16:25 -07:00
Harrison Chase
b43996e553
Harrison/improve cli (#12368) 2023-10-26 13:53:59 -07:00
Harrison Chase
9ce38726a2
fix some stuff (#12292)
Co-authored-by: Erick Friis <erick@langchain.dev>
2023-10-26 13:30:36 -07:00
Cynthia Yang
6ce276e099
Support Fireworks batching (#8) (#12052)
Description

* Add _generate and _agenerate to support Fireworks batching.
* Add stop words test cases
* Opt out retry mechanism

Issue - Not applicable
Dependencies - None
Tag maintainer - @baskaryan
2023-10-26 16:01:08 -04:00
Tyler Hutcherson
2f0c9d8269
Fix redis vectorfield schema defaults (#12223)
- **Description:** refactors the redis vector field schema to properly
handle default values, includes a new unit test suite.
  - **Issue:** N/A
  - **Dependencies:** nothing new.
  - **Tag maintainer:** @baskaryan @Spartee 
  - **Twitter handle:** this is a tiny fix/improvement :) 

This issue was causing some clients/cuatomers issues when building a
vector index on Redis on smaller db instances (due to fault default
values in index configuration). It would raise an error like:

```redis.exceptions.ResponseError: Vector index initial capacity 20000 exceeded server limit (852 with the given parameters)```

This PR will address this moving forward.
2023-10-26 12:17:58 -07:00
Jakub Novák
9544d64ad8
E2B tool - Improve description wuth uploaded files info (#12355) 2023-10-26 11:44:24 -07:00
Bagatur
c6a733802b
bump 324 and 35 (#12352) 2023-10-26 10:10:26 -07:00
Nuno Campos
683e97766d
Fix json key output parser in partial (streaming) mode (#12332)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-10-26 17:45:04 +01:00
Nikhil Jha
dff24285ea
Comprehend Moderation 0.2 (#11730)
This PR replaces the previous `Intent` check with the new `Prompt
Safety` check. The logic and steps to enable chain moderation via the
Amazon Comprehend service, allowing you to detect and redact PII, Toxic,
and Prompt Safety information in the LLM prompt or answer remains
unchanged.
This implementation updates the code and configuration types with
respect to `Prompt Safety`.


### Usage sample

```python
from langchain_experimental.comprehend_moderation import (BaseModerationConfig, 
                                 ModerationPromptSafetyConfig, 
                                 ModerationPiiConfig, 
                                 ModerationToxicityConfig
)

pii_config = ModerationPiiConfig(
    labels=["SSN"],
    redact=True,
    mask_character="X"
)

toxicity_config = ModerationToxicityConfig(
    threshold=0.5
)

prompt_safety_config = ModerationPromptSafetyConfig(
    threshold=0.5
)

moderation_config = BaseModerationConfig(
    filters=[pii_config, toxicity_config, prompt_safety_config]
)

comp_moderation_with_config = AmazonComprehendModerationChain(
    moderation_config=moderation_config, #specify the configuration
    client=comprehend_client,            #optionally pass the Boto3 Client
    verbose=True
)

template = """Question: {question}

Answer:"""

prompt = PromptTemplate(template=template, input_variables=["question"])

responses = [
    "Final Answer: A credit card number looks like 1289-2321-1123-2387. A fake SSN number looks like 323-22-9980. John Doe's phone number is (999)253-9876.", 
    "Final Answer: This is a really shitty way of constructing a birdhouse. This is fucking insane to think that any birds would actually create their motherfucking nests here."
]
llm = FakeListLLM(responses=responses)

llm_chain = LLMChain(prompt=prompt, llm=llm)

chain = ( 
    prompt 
    | comp_moderation_with_config 
    | {llm_chain.input_keys[0]: lambda x: x['output'] }  
    | llm_chain 
    | { "input": lambda x: x['text'] } 
    | comp_moderation_with_config 
)

try:
    response = chain.invoke({"question": "A sample SSN number looks like this 123-456-7890. Can you give me some more samples?"})
except Exception as e:
    print(str(e))
else:
    print(response['output'])

```

### Output

```python
> Entering new AmazonComprehendModerationChain chain...
Running AmazonComprehendModerationChain...
Running pii Validation...
Running toxicity Validation...
Running prompt safety Validation...

> Finished chain.


> Entering new AmazonComprehendModerationChain chain...
Running AmazonComprehendModerationChain...
Running pii Validation...
Running toxicity Validation...
Running prompt safety Validation...

> Finished chain.
Final Answer: A credit card number looks like 1289-2321-1123-2387. A fake SSN number looks like XXXXXXXXXXXX John Doe's phone number is (999)253-9876.
```

---------

Co-authored-by: Jha <nikjha@amazon.com>
Co-authored-by: Anjan Biswas <anjanavb@amazon.com>
Co-authored-by: Anjan Biswas <84933469+anjanvb@users.noreply.github.com>
2023-10-26 09:42:18 -07:00
Blake (Yung Cher Ho)
b9410f2b6f
Takeoff pro support (#12070)
**Description:**
This PR adds support for the [Pro version of Titan Takeoff
Server](https://docs.titanml.co/docs/category/pro-features). Users of
the Pro version will have to import the TitanTakeoffPro model, which is
different from TitanTakeoff.

**Issue:**
Also minor fixes to docs for Titan Takeoff (Community version)

**Dependencies:**
No additional dependencies

 **Twitter handle:** @becoming_blake

@baskaryan @hwchase17
2023-10-26 09:39:32 -07:00
Leonid Kuligin
4e47fe1dce
fixed error message and a check for processor name (#12200)
Replace this entire comment with:
- **Description:** a small fix on error description / a check for
processor name
  - **Issue:** the issue #11407
2023-10-26 09:38:25 -07:00
Nir Kopler
9298aff783
Finetuned openai azure models cost calculation (#12267)
**Description:**
Add cost calculation for fine tuned **Azure** with relevant unit tests.
see
https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/fine-tuning?tabs=turbo&pivots=programming-language-studio
for more information.
this PR is the result of this PR:
https://github.com/langchain-ai/langchain/pull/12190

Twitter handle: @nirkopler
2023-10-26 09:38:10 -07:00
gnakw
20fe515f20
Fix the exception from langchain.utilities import ArceeWrapper (#12342)
- **Description:** Fix the exception from langchain.utilities import
ArceeWrapper
2023-10-26 09:19:43 -07:00
Qihui Xie
6720458c7d
add allowed_operators property in QdrantTranslator (#12328)
- **Description:** 
This PR adds `allowd_operators` property to `QdrantTranslator` to fix
the `TypeError: can only join an iterable` bug. This property is
required in `get_query_constructor_prompt` in
`query_constructor\base.py`:
```
allowed_operators=" | ".join(allowed_operators),
```
  - **Issue:** 
#12061

---------

Co-authored-by: XIE Qihui <qihui.xie@bopufund.com>
2023-10-26 09:18:29 -07:00
Bagatur
f5a57fc1ef
fix self query constructor (#12349) 2023-10-26 09:18:15 -07:00
Vasek Mlejnsky
cdd75b687e
e2b tool - fix initialization and improve tool description (#12345) 2023-10-26 08:47:50 -07:00
Harrison Chase
8ec7aade9f
add docs for templates (#12346) 2023-10-26 08:28:01 -07:00
Erick Friis
ebf998acb6
Templates (#12294)
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Lance Martin <lance@langchain.dev>
Co-authored-by: Jacob Lee <jacoblee93@gmail.com>
2023-10-25 18:47:42 -07:00
Erick Friis
43257a295c
CLI Git Improvements (#12311)
- delete repo sources like pip
- git dep fixes
- error messaging
2023-10-25 18:30:02 -07:00
William FH
1d568e1add
Better wrap traceable (#12303)
If user function is wrapped as a traceable function, this will help hand
off the trace between the two.

Also update handling fields to reflect optional values
2023-10-25 16:34:23 -07:00
Eugene Yurtsev
5a71b81609
Relax type annotation for custom input/output types (#12300)
This is needed to be able to do stuff like:

```python
runnable.with_types(input_type=List[str])
```
2023-10-25 19:00:22 -04:00
William FH
988f6d9912
Rm langchain server (#12305) 2023-10-25 15:26:46 -07:00
wemysschen
3f16acc538
Add baidu cloud vector search in vectorstore and fix some unit test in vectorstores (#11605)
**Description:** 
Add baidu cloud vector search in vectorstore

---------

Co-authored-by: root <root@icoding-cwx.bcc-szzj.baidu.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-25 13:44:19 -07:00
mrbean
b7e559c7e1
use snippet search optionally (#12236)
Add an additional flag which allows for hitting our new endpoint.
2023-10-25 13:37:28 -07:00
felixocker
cce132d146
fix sparql queries for relations in schema description (#9136)
- **Description**: Fix for the SPARQL QA chain: fixed SPARQL queries for
retrieving information about relations in the graph to create a textual
description of the schema for the language model. This should resolve
#8907
- **Issue**: #8907
- **Dependencies**: None
- **Tag maintainer**: @baskaryan, @hwchase17
2023-10-25 13:36:57 -07:00
Donato Azevedo
d9f1bcf366
Strips leading/trailing whitespace before parsing xml (#12297)
**Description:** When llms output leading or trailing whitespace for xml
(when using XMLOutputParser) the parser would raise a `ValueError: Could
not parse output: ...`. However, leading or trailing whitespace are
"ignorable" in the sense of XML standard.

**Issue:** I did not find an issue related.

**Dependencies:** None

**Tag maintainer:**

**Twitter handle:** donatoaz

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

Done, updated unit test and ran `make docker_test`.
2023-10-25 13:34:58 -07:00
Erick Friis
47070b8314
CLI (#12284)
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-10-25 11:06:58 -07:00
Shwu Ku
07c2649753
response parser for ArceeRetriever (#12270)
- **Description:** Response parser for arcee retriever, 
- **Issue:** follow-up pr on #11578 and
[discussion](https://github.com/arcee-ai/arcee-python/issues/15#issuecomment-1759874053),
  - **Dependencies:** NA

This pr implements a parser for the response from ArceeRetreiver to
convert to langchain `Document`. This closes the loop of generation and
retrieval for Arcee DALMs in langchain.

The reference for the response parser is
[api-docs:retrieve](https://api.arcee.ai/docs#/v2/retrieve_model)

Attaching screenshot of working implementation:
<img width="1984" alt="Screenshot 2023-10-25 at 7 42 34 PM"
src="https://github.com/langchain-ai/langchain/assets/65639964/026987b9-34b2-4e4b-b87d-69fcd0c6641a">
\*api key deleted

---
Successful tests, lints, etc.
```shell
Re-run pytest with --snapshot-update to delete unused snapshots.
==================================================================================================================== slowest 5 durations =====================================================================================================================
1.56s call     tests/unit_tests/schema/runnable/test_runnable.py::test_retrying
0.63s call     tests/unit_tests/schema/runnable/test_runnable.py::test_map_astream
0.33s call     tests/unit_tests/schema/runnable/test_runnable.py::test_map_stream_iterator_input
0.30s call     tests/unit_tests/schema/runnable/test_runnable.py::test_map_astream_iterator_input
0.20s call     tests/unit_tests/indexes/test_indexing.py::test_cleanup_with_different_batchsize
======================================================================================================= 1265 passed, 270 skipped, 32 warnings in 6.55s =======================================================================================================
[ "." = "" ] || poetry run black .
All done!  🍰 
1871 files left unchanged.
[ "." = "" ] || poetry run ruff --select I --fix .
./scripts/check_pydantic.sh .
./scripts/check_imports.sh
poetry run ruff .
[ "." = "" ] || poetry run black . --check
All done!  🍰 
1871 files would be left unchanged.
[ "." = "" ] || poetry run mypy .
Success: no issues found in 1868 source files
poetry run codespell --toml pyproject.toml
poetry run codespell --toml pyproject.toml -w
```

Co-authored-by: Shubham Kushwaha <shwu@Shubhams-MacBook-Pro.local>
2023-10-25 10:55:13 -07:00
Johanna Appel
c26ec7789f
CohereEmbeddings: Add max_retries and request_timeout (#12275)
Add max_retries and request_timeout to CohereEmbeddings, akin to how it
works in OpenAIEmbeddings.

Since the Cohere client already implements these parameters, we can
simply pass them down.

Uses parameters from these two cohere client objects:

https://github.com/cohere-ai/cohere-python/blob/main/cohere/client.py

https://github.com/cohere-ai/cohere-python/blob/main/cohere/client_async.py
2023-10-25 10:37:25 -07:00
Nuno Campos
7108084947
Remove CLI (#12283)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-10-25 10:33:52 -07:00