Commit Graph

1648 Commits

Author SHA1 Message Date
Predrag Gruevski
5ae51a8a85
Fix typo highlighted by ruff autoformatter. (#12691)
H/t @MichaReiser for spotting it:
https://github.com/langchain-ai/langchain/pull/12585/files#r1378253045
2023-10-31 22:16:06 -04:00
Erick Friis
44c8b159b9
properly increment version in cli (#12685)
Went from 0.0.9 -> 0.0.11 without releasing. Back to 10, then release.
2023-10-31 17:27:43 -07:00
Leonid Ganeline
ddcec005bc
fix for YahooFinanceNewsTool (#12665)
Added YahooFinanceNewsTool to the __init__.py 
It was missed here.
2023-10-31 14:58:09 -07:00
Predrag Gruevski
01a3c9b94e
Use an in-project virtualenv in the CLI package. (#12678)
Keeping it in sync with how our other packages are configured.
2023-10-31 14:51:24 -07:00
Jacob Lee
bd668fcea1
Adds version CLI command (#12619)
Will be automatically bumped with `poetry version patch`.

@efriis @hwchase17

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-10-31 14:50:04 -07:00
Frank
bf5805bb32
Add quip loader (#12259)
- **Description:** implement [quip](https://quip.com) loader
  - **Issue:** https://github.com/langchain-ai/langchain/issues/10352
  - **Dependencies:** No
  -  pass make format, make lint, make test

---------

Co-authored-by: Hao Fan <h_fan@apple.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-31 14:11:24 -07:00
Roman Vasilyev
c9a6940d58
PGVector fix (#12592)
latest release broken, this fixes it

---------

Co-authored-by: Roman Vasilyev <rvasilyev@mozilla.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2023-10-31 17:01:15 -04:00
Predrag Gruevski
e8b99364b3
Use ruff for both linting and formatting in langchain-cli. (#12672)
Prior to this PR, `ruff` was used only for linting and not for
formatting, despite the names of the commands. This PR makes it be used
for both linting code and autoformatting it.
2023-10-31 13:52:25 -07:00
Margaret Qian
acfc485808
Update MosaicML Embedding Input Key (#12657)
This input key was missed in the last update PR:
https://github.com/langchain-ai/langchain/pull/7391

The input/output formats are intended to be like this:

```
{"inputs": [<prompt>]} 

{"outputs": [<output_text>]}
```
2023-10-31 14:43:30 -04:00
Predrag Gruevski
c871cc5055
Remove print() statements which seemed leftover from debugging. (#12648)
Added in #12159 presumably during debugging. Right now they cause a bit of visual noise.
2023-10-31 13:45:48 -04:00
Noam Gat
14e8c74736
LM Format Enforcer Integration + Sample Notebook (#12625)
## Description

This PR adds support for
[lm-format-enforcer](https://github.com/noamgat/lm-format-enforcer) to
LangChain.

![image](https://raw.githubusercontent.com/noamgat/lm-format-enforcer/main/docs/Intro.webp)

The library is similar to jsonformer / RELLM which are supported in
Langchain, but has several advantages such as
- Batching and Beam search support
- More complete JSON Schema support
- LLM has control over whitespace, improving quality
- Better runtime performance due to only calling the LLM's generate()
function once per generate() call.

The integration is loosely based on the jsonformer integration in terms
of project structure.

## Dependencies

No compile-time dependency was added, but if `lm-format-enforcer` is not
installed, a runtime error will occur if it is trying to be used.

## Tests

Due to the integration modifying the internal parameters of the
underlying huggingface transformer LLM, it is not possible to test
without building a real LM, which requires internet access. So, similar
to the jsonformer and RELLM integrations, the testing is via the
notebook.

## Twitter Handle

[@noamgat](https://twitter.com/noamgat)


Looking forward to hearing feedback!

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-31 09:49:01 -07:00
Erick Friis
7f6e751a3d
template updates (#12646) 2023-10-31 09:13:58 -07:00
Predrag Gruevski
f94e24dfd7
Install and use ruff format instead of black for code formatting. (#12585)
Best to review one commit at a time, since two of the commits are 100%
autogenerated changes from running `ruff format`:
- Install and use `ruff format` instead of black for code formatting.
- Output of `ruff format .` in the `langchain` package.
- Use `ruff format` in experimental package.
- Format changes in experimental package by `ruff format`.
- Manual formatting fixes to make `ruff .` pass.
2023-10-31 10:53:12 -04:00
William FH
bfd719f9d8
bind_functions convenience method (#12518)
I always take 20-30 seconds to re-discover where the
`convert_to_openai_function` wrapper lives in our codebase. Chat
langchain [has no
clue](https://smith.langchain.com/public/3989d687-18c7-4108-958e-96e88803da86/r)
what to do either. There's the older `create_openai_fn_chain` , but we
haven't been recommending it in LCEL. The example we show in the
[cookbook](https://python.langchain.com/docs/expression_language/how_to/binding#attaching-openai-functions)
is really verbose.


General function calling should be as simple as possible to do, so this
seems a bit more ergonomic to me (feel free to disagree). Another option
would be to directly coerce directly in the class's init (or when
calling invoke), if provided. I'm not 100% set against that. That
approach may be too easy but not simple. This PR feels like a decent
compromise between simple and easy.

```
from enum import Enum
from typing import Optional

from pydantic import BaseModel, Field


class Category(str, Enum):
    """The category of the issue."""

    bug = "bug"
    nit = "nit"
    improvement = "improvement"
    other = "other"


class IssueClassification(BaseModel):
    """Classify an issue."""

    category: Category
    other_description: Optional[str] = Field(
        description="If classified as 'other', the suggested other category"
    )
    

from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI().bind_functions([IssueClassification])
llm.invoke("This PR adds a convenience wrapper to the bind argument")

# AIMessage(content='', additional_kwargs={'function_call': {'name': 'IssueClassification', 'arguments': '{\n  "category": "improvement"\n}'}})
```
2023-10-31 07:15:37 -07:00
Nuno Campos
3143324984
Improve Runnable type inference for input_schemas (#12630)
- Prefer lambda type annotations over inferred dict schema
- For sequences that start with RunnableAssign infer seq input type as
"input type of 2nd item in sequence - output type of runnable assign"
2023-10-31 13:22:54 +00:00
Nuno Campos
2f563cee20
Add Runnable.with_listeners() (#12549)
- This binds start/end/error listeners to a runnable, which will be
called with the Run object
2023-10-31 11:04:51 +00:00
Bagatur
bcc62d63be
bump 327 (#12623) 2023-10-31 02:18:08 -07:00
Erick Friis
a1fae1fddd
Readme rewrite (#12615)
Co-authored-by: Lance Martin <lance@langchain.dev>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-10-31 00:06:02 -07:00
Yujie Qian
1dbb77d7db
VoyageEmbeddings (#12608)
- **Description:** Integrate VoyageEmbeddings into LangChain, with tests
and docs
  - **Issue:** N/A
  - **Dependencies:** N/A
  - **Tag maintainer:** N/A
  - **Twitter handle:** @Voyage_AI_

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-30 18:37:43 -07:00
chocolate4
92bf40a921
Add a new vector store hippo for langchain #11763 (#12412)
#11763

---------

Co-authored-by: TranswarpHippo <hippo.0.assistant@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-30 18:35:23 -07:00
Karthik Raja A
342d6c7ab6
Multi on client toolkit (#12392)
Replace this entire comment with:
-Add MultiOn close function and update key value and add async
functionality
- solved the key value TabId not found.. (updated to use latest key
value)
  
@hwchase17
2023-10-30 18:34:56 -07:00
Prabin Nepal
b109cb031b
SecretStr for fireworks api (#12475)
- **Description:** This pull request removes secrets present in raw
format,
- **Issue:** Fireworks api key was exposed when printing out the
langchain object
[#12165](https://github.com/langchain-ai/langchain/issues/12165)
 - **Maintainer:** @eyurtsev

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-30 18:17:53 -07:00
Harrison Chase
a32c236c64
bump cli to 009 (#12611) 2023-10-30 18:12:08 -07:00
Martin Schade
0c7f1d8b21
Textract linearizer (#12446)
**Description:** Textract PDF Loader generating linearized output,
meaning it will replicate the structure of the source document as close
as possible based on the features passed into the call (e. g. LAYOUT,
FORMS, TABLES). With LAYOUT reading order for multi-column documents or
identification of lists and figures is supported and with TABLES it will
generate the table structure as well. FORMS will indicate "key: value"
with columms.
  - **Issue:** the issue fixes #12068 
- **Dependencies:** amazon-textract-textractor is added, which provides
the linearization
  - **Tag maintainer:** @3coins 

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-30 18:02:10 -07:00
Erick Friis
f39246bd7e
cli should pull instead of delete+clone (#12607)
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-10-30 16:44:09 -07:00
Harrison Chase
8b5e879171
add a template for the package readme (#12499)
Co-authored-by: Erick Friis <erick@langchain.dev>
2023-10-30 16:39:39 -07:00
Bagatur
9bedda50f2
Bagatur/lakefs loader2 (#12524)
Co-authored-by: Jonathan Rosenberg <96974219+Jonathan-Rosenberg@users.noreply.github.com>
2023-10-30 16:30:27 -07:00
Ackermann Yuriy
99b69fe607
Fixed missing optional tags. Added default key value for Ollama (#12599)
Added missing Optional typings. Added default values for Ollama optional
keys.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-30 16:30:10 -07:00
Bagatur
016813d189
factor out to_secret (#12593) 2023-10-30 15:10:25 -07:00
hsuyuming
630ae24b28
implement get_num_tokens to use google's count_tokens function (#10565)
can get the correct token count instead of using gpt-2 model

**Description:** 
Implement get_num_tokens within VertexLLM to use google's count_tokens
function.
(https://cloud.google.com/vertex-ai/docs/generative-ai/get-token-count).
So we don't need to download gpt-2 model from huggingface, also when we
do the mapreduce chain we can get correct token count.

**Tag maintainer:** 
@lkuligin 
**Twitter handle:** 
My twitter: @abehsu1992626

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-30 15:10:05 -07:00
Pham Vu Thai Minh
33e77a1007
Async support for FAISS (#11333)
Following this tutoral about using OpenAI Embeddings with FAISS

https://python.langchain.com/docs/integrations/vectorstores/faiss

```python
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.document_loaders import TextLoader
from langchain.document_loaders import TextLoader

loader = TextLoader("../../../extras/modules/state_of_the_union.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

embeddings = OpenAIEmbeddings()
```

This works fine

```python
db = FAISS.from_documents(docs, embeddings)
query = "What did the president say about Ketanji Brown Jackson"
docs = db.similarity_search(query)
```

But the async version is not

```python
db = await FAISS.afrom_documents(docs, embeddings)  # NotImplementedError
query = "What did the president say about Ketanji Brown Jackson"

docs = await db.asimilarity_search(query) # this will use await asyncio.get_event_loop().run_in_executor under the hood and will not call OpenAIEmbeddings.aembed_query but call OpenAIEmbeddings.embed_query
```

So this PR add async/await supports for FAISS

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2023-10-30 15:08:53 -07:00
Jeff Zhuo
13b89815a3
Issue: fix the issue #11648 init minimax llm (#12554)
e https://github.com/langchain-ai/langchain/issues/11648 Minimax
llm failed to initialize

The idea of this fix is
https://github.com/langchain-ai/langchain/issues/10917#issuecomment-1765606725

do not use  underscore in python model class

---------

Co-authored-by: zhuojianming@cmcm.com <zhuojianming@cmcm.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-30 14:30:17 -07:00
Florian Valeye
bfb27324cb
[Matching Engine] Update the Matching Engine to include the distance and filters (#12555)
Hello 👋,

This Pull Request adds more capability to the
[MatchingEngine](https://api.python.langchain.com/en/latest/vectorstores/langchain.vectorstores.matching_engine.MatchingEngine.html)
vectorstore of GCP. It includes the
`similarity_search_by_vector_with_relevance_scores` function and also
[filters](https://cloud.google.com/vertex-ai/docs/vector-search/filtering)
to `filter` the namespaces when retrieving the results.

- **Description:** Add
[filter](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.MatchingEngineIndexEndpoint#google_cloud_aiplatform_MatchingEngineIndexEndpoint_find_neighbors)
in `similarity_search` and add
`similarity_search_by_vector_with_relevance_scores` method
  - **Dependencies:** None
  - **Tag maintainer:** Unknown

Thank you!

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-30 14:12:59 -07:00
Harrison Chase
1d51363e49
change project template (#12493) 2023-10-30 14:06:30 -07:00
Holt Skinner
e53b9ccd70
feat: Add Google Cloud Text-to-Speech Tool (#12572)
- Add Tool for [Google Cloud
Text-to-Speech](https://cloud.google.com/text-to-speech)
- Follows similar structure to [Eleven Labs
Text2Speech](https://python.langchain.com/docs/integrations/tools/eleven_labs_tts)

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-30 14:05:39 -07:00
Adilkhan Sarsen
6e702b9c36
Deep memory support in LangChain (#12268)
- Description: adding support to Activeloop's DeepMemory feature that
boosts recall up to 25%. Added Jupyter notebook showcasing the feature
and also made index params explicit.
- Twitter handle: will really appreciate if we could announce this on
twitter.

---------

Co-authored-by: adolkhan <adilkhan.sarsen@alumni.nu.edu.kz>
2023-10-30 12:16:14 -07:00
billytrend-cohere
b1e3843931
Add client_name="langchain" to Cohere usage (#11328)
Hey, we're looking to invest more in adding cohere integrations to
langchain so would love to get more of an idea for how it's used.
Hopefully this pr is acceptable. This week I'm also going to be looking
into adding our new [retrieval augmented generation
product](https://txt.cohere.com/chat-with-rag/) to langchain.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-30 11:20:55 -07:00
Bagatur
37aec1e050
bump 326 (#12569) 2023-10-30 10:11:17 -07:00
Eugene Yurtsev
1b1a2d5740
Image Caption accepts bytes for images (#12561)
Accept bytes for images in image caption

---------

Co-authored-by: webcoderz <19884161+webcoderz@users.noreply.github.com>
2023-10-30 12:29:54 -04:00
Nuno Campos
7897483819
Allow astream_log to be used inside atrace_as_chain_group (#12558)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-10-30 15:55:16 +00:00
Holt Skinner
e05bb938de
Merge pull request #12433
* feat: Add Google Cloud Translation document transformer

* Merge branch 'langchain-ai:master' into google-translate

* Add documentation for Google Translate Document Transformer

* Fix line length error

* Merge branch 'master' into google-translate

* Merge branch 'google-translate' of https://github.com/holtskinner/lan…

* Addressed code review comments

* Merge branch 'master' into google-translate

* Merge branch 'google-translate' of https://github.com/holtskinner/lan…

* Removed extra variable

* Merge branch 'google-translate' of https://github.com/holtskinner/lan…

* Merge branch 'master' into google-translate

* Merge branch 'google-translate' of https://github.com/holtskinner/lan…

* Removed extra import
2023-10-29 21:22:36 -04:00
Samad Koita
d1fdcd4fcb
Masking of API Key for GooseAI LLM (#12496)
Description: Add masking of API Key for GooseAI LLM when printed.
Issue: https://github.com/langchain-ai/langchain/issues/12165
Dependencies: None
Tag maintainer: @eyurtsev

---------

Co-authored-by: Samad Koita <>
2023-10-29 21:21:33 -04:00
Andrew Zhou
64c4a698a8
More comprehensive readthedocs document loader (#12382)
## **Description:**
When building our own readthedocs.io scraper, we noticed a couple
interesting things:

1. Text lines with a lot of nested <span> tags would give unclean text
with a bunch of newlines. For example, for [Langchain's
documentation](https://api.python.langchain.com/en/latest/document_loaders/langchain.document_loaders.readthedocs.ReadTheDocsLoader.html#langchain.document_loaders.readthedocs.ReadTheDocsLoader),
a single line is represented in a complicated nested HTML structure, and
the naive `soup.get_text()` call currently being made will create a
newline for each nested HTML element. Therefore, the document loader
would give a messy, newline-separated blob of text. This would be true
in a lot of cases.

<img width="945" alt="Screenshot 2023-10-26 at 6 15 39 PM"
src="https://github.com/langchain-ai/langchain/assets/44193474/eca85d1f-d2bf-4487-a18a-e1e732fadf19">
<img width="1031" alt="Screenshot 2023-10-26 at 6 16 00 PM"
src="https://github.com/langchain-ai/langchain/assets/44193474/035938a0-9892-4f6a-83cd-0d7b409b00a3">

Additionally, content from iframes, code from scripts, css from styles,
etc. will be gotten if it's a subclass of the selector (which happens
more often than you'd think). For example, [this
page](https://pydeck.gl/gallery/contour_layer.html#) will scrape 1.5
million characters of content that looks like this:

<img width="1372" alt="Screenshot 2023-10-26 at 6 32 55 PM"
src="https://github.com/langchain-ai/langchain/assets/44193474/dbd89e39-9478-4a18-9e84-f0eb91954eac">

Therefore, I wrote a recursive _get_clean_text(soup) class function that
1. skips all irrelevant elements, and 2. only adds newlines when
necessary.

2. Index pages (like [this
one](https://api.python.langchain.com/en/latest/api_reference.html))
would be loaded, chunked, and eventually embedded. This is really bad
not just because the user will be embedding irrelevant information - but
because index pages are very likely to show up in retrieved content,
making retrieval less effective (in our tests). Therefore, I added a
bool parameter `exclude_index_pages` defaulted to False (which is the
current behavior — although I'd petition to default this to True) that
will skip all pages where links take up 50%+ of the page. Through manual
testing, this seems to be the best threshold.



## Other Information:
  - **Issue:** n/a
  - **Dependencies:** n/a
  - **Tag maintainer:** n/a
  - **Twitter handle:** @andrewthezhou

---------

Co-authored-by: Andrew Zhou <andrew@heykona.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-29 16:26:53 -07:00
Peter Vandenabeele
3468c038ba
Add unit tests for document_transformers/beautiful_soup_transformer.py (#12520)
- **Description:**
* Add unit tests for document_transformers/beautiful_soup_transformer.py
* Basic functionality is tested (extract tags, remove tags, drop lines)
    * add a FIXME comment about the order of tags that is not preserved
      (and a passing test, but with the expected tags now out-of-order)
  - **Issue:** None
  - **Dependencies:** None
  - **Tag maintainer:** @rlancemartin 
  - **Twitter handle:** `peter_v`

Please make sure your PR is passing linting and testing before
submitting.

=> OK: I ran `make format`, `make test` (passing after install of
beautifulsoup4) and `make lint`.
2023-10-29 16:24:47 -07:00
Anirudh Gautam
b257e6a4e8
Mask API key for AI21 LLM (#12418)
- **Description:** Added masking of the API Key for AI21 LLM when
printed and improved the docstring for AI21 LLM.
- Updated the AI21 LLM to utilize SecretStr from pydantic to securely
manage API key.
- Made improvements in the docstring of AI21 LLM. It now mentions that
the API key can also be passed as a named parameter to the constructor.
    - Added unit tests.
  - **Issue:** #12165 
  - **Tag maintainer:** @eyurtsev

---------

Co-authored-by: Anirudh Gautam <anirudh@Anirudhs-Mac-mini.local>
2023-10-29 14:53:41 -07:00
silvhua
9dead1034c
_dalle_image_url returns list of urls if n>1 (#11800)
- **Description:** Updated the `_dalle_image_url` method to return a
list of URLs if self.n>1,
  - **Issue:** #10691,
  - **Dependencies:** unsure,
  - **Tag maintainer:** @eyurtsev,
  - **Twitter handle:** @silvhua
---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-29 14:23:23 -07:00
Bagatur
1815ea2fdb
OpenAI runnable constructor (#12455) 2023-10-29 13:40:30 -07:00
William FH
a830b809f3
Patch forward ref bug (#12508)
Currently this gives a bug:
```
from langchain.schema.runnable import RunnableLambda

bound = RunnableLambda(lambda x: x).with_config({"callbacks": []})

# ConfigError: field "callbacks" not yet prepared so type is still a ForwardRef, you might need to call RunnableConfig.update_forward_refs().
```

Rather than deal with cyclic imports and extra load time, etc., I think
it makes sense to just have a separate Callbacks definition here that is
a relaxed typehint.
2023-10-29 00:53:01 -07:00
William FH
36204c2baf
Evaluation Callback Multi Response (#12505)
1. Allow run evaluators to return {"results": [list of evaluation
results]} in the evaluator callback.
2. Allows run evaluators to pick the target run ID to provide feedback
to

(1) means you could do something like a function call that populates a
full rubric in one go (not sure how reliable that is in general though)
rather than splitting off into separate LLM calls - cheaper and less
code to write
(2) means you can provide feedback to runs on subsequent calls.
Immediate use case is if you wanted to add an evaluator to a chat bot
and assign to assign to previous conversation turns


have a corresponding one in the SDK
2023-10-28 23:18:29 -07:00
Harrison Chase
9e0ae56287
various templates improvements (#12500) 2023-10-28 22:13:22 -07:00