Commit Graph

5589 Commits

Author SHA1 Message Date
Prabin Nepal
b109cb031b
SecretStr for fireworks api (#12475)
- **Description:** This pull request removes secrets present in raw
format,
- **Issue:** Fireworks api key was exposed when printing out the
langchain object
[#12165](https://github.com/langchain-ai/langchain/issues/12165)
 - **Maintainer:** @eyurtsev

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-30 18:17:53 -07:00
Harrison Chase
f35a65124a
improve agent templates (#12528)
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2023-10-30 18:15:13 -07:00
Harrison Chase
75bb28afd8
Harrison/pii chatbot (#12523)
the pii detection in the template is pretty basic, will need to be
customized per use case

the chain it "protects" can be swapped out for any chain
2023-10-30 18:13:12 -07:00
Harrison Chase
a32c236c64
bump cli to 009 (#12611) 2023-10-30 18:12:08 -07:00
Erika Cardenas
b97b9eda21
Hybrid Search Weaviate Template (#12606)
- **Description:** This template covers hybrid search in Weaviate
  - **Dependencies:** No
  - **Twitter handle:** @ecardenas300

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-10-30 18:10:48 -07:00
Martin Schade
0c7f1d8b21
Textract linearizer (#12446)
**Description:** Textract PDF Loader generating linearized output,
meaning it will replicate the structure of the source document as close
as possible based on the features passed into the call (e. g. LAYOUT,
FORMS, TABLES). With LAYOUT reading order for multi-column documents or
identification of lists and figures is supported and with TABLES it will
generate the table structure as well. FORMS will indicate "key: value"
with columms.
  - **Issue:** the issue fixes #12068 
- **Dependencies:** amazon-textract-textractor is added, which provides
the linearization
  - **Tag maintainer:** @3coins 

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-30 18:02:10 -07:00
Harrison Chase
a7d5e0ce8a
add guardrails profanity (#12609) 2023-10-30 17:01:23 -07:00
Erick Friis
e933212a3d
run poetry build in working dir (#12610)
Was failing because was trying to build from root:
https://github.com/langchain-ai/langchain/actions/runs/6700033981/job/18205251365
2023-10-30 16:58:34 -07:00
Erick Friis
f39246bd7e
cli should pull instead of delete+clone (#12607)
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-10-30 16:44:09 -07:00
Harrison Chase
8b5e879171
add a template for the package readme (#12499)
Co-authored-by: Erick Friis <erick@langchain.dev>
2023-10-30 16:39:39 -07:00
Bagatur
9bedda50f2
Bagatur/lakefs loader2 (#12524)
Co-authored-by: Jonathan Rosenberg <96974219+Jonathan-Rosenberg@users.noreply.github.com>
2023-10-30 16:30:27 -07:00
Brian McBrayer
3243dcc83e
Fix very small typo (#12603)
- **Description:** this is the world's smallest typo change of a typo I
saw while reading the docs
2023-10-30 16:30:18 -07:00
Ackermann Yuriy
99b69fe607
Fixed missing optional tags. Added default key value for Ollama (#12599)
Added missing Optional typings. Added default values for Ollama optional
keys.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-30 16:30:10 -07:00
Lance Martin
f6f3ca12e7
Codebase RAG fireworks (#12597) 2023-10-30 16:21:56 -07:00
Harrison Chase
481bf6fae6
hosting note (#12589) 2023-10-30 15:31:31 -07:00
David Duong
b5c17ff188
Force List[Tuple[str,str]] to chat history widget (#12530)
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-30 15:19:32 -07:00
David Duong
d39b4b61b6
Batch apply poetry lock --no-update for all templates (#12531)
Ran the following bash script for all templates

```bash
#!/bin/bash

set -e
current_dir="$(pwd)"
for directory in */; do
    if [ -d "$directory" ]; then
        (cd "$directory" && poetry lock --no-update)
    fi
done

cd "$current_dir"
```

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-30 15:18:53 -07:00
Kenzie Mihardja
e914283cf9
add docs to min_chunk_size (#12537)
Minor addition to documentation to elaborate on min_chunk_size.

Co-authored-by: Kenzie Mihardja <kenzie@docugami.com>
2023-10-30 15:13:52 -07:00
Bagatur
016813d189
factor out to_secret (#12593) 2023-10-30 15:10:25 -07:00
hsuyuming
630ae24b28
implement get_num_tokens to use google's count_tokens function (#10565)
can get the correct token count instead of using gpt-2 model

**Description:** 
Implement get_num_tokens within VertexLLM to use google's count_tokens
function.
(https://cloud.google.com/vertex-ai/docs/generative-ai/get-token-count).
So we don't need to download gpt-2 model from huggingface, also when we
do the mapreduce chain we can get correct token count.

**Tag maintainer:** 
@lkuligin 
**Twitter handle:** 
My twitter: @abehsu1992626

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-30 15:10:05 -07:00
Pham Vu Thai Minh
33e77a1007
Async support for FAISS (#11333)
Following this tutoral about using OpenAI Embeddings with FAISS

https://python.langchain.com/docs/integrations/vectorstores/faiss

```python
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.document_loaders import TextLoader
from langchain.document_loaders import TextLoader

loader = TextLoader("../../../extras/modules/state_of_the_union.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

embeddings = OpenAIEmbeddings()
```

This works fine

```python
db = FAISS.from_documents(docs, embeddings)
query = "What did the president say about Ketanji Brown Jackson"
docs = db.similarity_search(query)
```

But the async version is not

```python
db = await FAISS.afrom_documents(docs, embeddings)  # NotImplementedError
query = "What did the president say about Ketanji Brown Jackson"

docs = await db.asimilarity_search(query) # this will use await asyncio.get_event_loop().run_in_executor under the hood and will not call OpenAIEmbeddings.aembed_query but call OpenAIEmbeddings.embed_query
```

So this PR add async/await supports for FAISS

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2023-10-30 15:08:53 -07:00
Lance Martin
26f0ca222d
RAG template for MongoDB Atlas Vector Search (#12526) 2023-10-30 14:31:34 -07:00
Jeff Zhuo
13b89815a3
Issue: fix the issue #11648 init minimax llm (#12554)
e https://github.com/langchain-ai/langchain/issues/11648 Minimax
llm failed to initialize

The idea of this fix is
https://github.com/langchain-ai/langchain/issues/10917#issuecomment-1765606725

do not use  underscore in python model class

---------

Co-authored-by: zhuojianming@cmcm.com <zhuojianming@cmcm.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-30 14:30:17 -07:00
Florian Valeye
bfb27324cb
[Matching Engine] Update the Matching Engine to include the distance and filters (#12555)
Hello 👋,

This Pull Request adds more capability to the
[MatchingEngine](https://api.python.langchain.com/en/latest/vectorstores/langchain.vectorstores.matching_engine.MatchingEngine.html)
vectorstore of GCP. It includes the
`similarity_search_by_vector_with_relevance_scores` function and also
[filters](https://cloud.google.com/vertex-ai/docs/vector-search/filtering)
to `filter` the namespaces when retrieving the results.

- **Description:** Add
[filter](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.MatchingEngineIndexEndpoint#google_cloud_aiplatform_MatchingEngineIndexEndpoint_find_neighbors)
in `similarity_search` and add
`similarity_search_by_vector_with_relevance_scores` method
  - **Dependencies:** None
  - **Tag maintainer:** Unknown

Thank you!

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-30 14:12:59 -07:00
Predrag Gruevski
3c5c384f1a
Test-publish to test PyPI and separate jobs to limit permissions. (#12578)
Before making a new `langchain` release, we want to test that everything
works as expected. This PR lets us publish `langchain` to test PyPI,
then install it from there and run checks to ensure everything works
normally before publishing it "for real".

It also takes the opportunity to refactor the build process, splitting
up the build, release-creation, and PyPI upload steps into separate jobs
that do not share their elevated permissions with each other.
2023-10-30 17:10:14 -04:00
Harrison Chase
1d51363e49
change project template (#12493) 2023-10-30 14:06:30 -07:00
Holt Skinner
e53b9ccd70
feat: Add Google Cloud Text-to-Speech Tool (#12572)
- Add Tool for [Google Cloud
Text-to-Speech](https://cloud.google.com/text-to-speech)
- Follows similar structure to [Eleven Labs
Text2Speech](https://python.langchain.com/docs/integrations/tools/eleven_labs_tts)

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-30 14:05:39 -07:00
Bagatur
1f2c672d4a
add routing by embedding doc (#12580) 2023-10-30 13:03:16 -07:00
William FH
199630ff93
Replace You with DDG in xml agent (#12504)
You requires an email to get an API key which IMO is too much friction.
Duckduck go is free and easy to install.
2023-10-30 12:51:00 -07:00
Adilkhan Sarsen
6e702b9c36
Deep memory support in LangChain (#12268)
- Description: adding support to Activeloop's DeepMemory feature that
boosts recall up to 25%. Added Jupyter notebook showcasing the feature
and also made index params explicit.
- Twitter handle: will really appreciate if we could announce this on
twitter.

---------

Co-authored-by: adolkhan <adilkhan.sarsen@alumni.nu.edu.kz>
2023-10-30 12:16:14 -07:00
Lance Martin
c57945e0a8
Formatting on ntbks (#12576) 2023-10-30 11:32:31 -07:00
Lance Martin
08103e6d48
Minor template cleaning (#12573) 2023-10-30 11:27:44 -07:00
billytrend-cohere
b1e3843931
Add client_name="langchain" to Cohere usage (#11328)
Hey, we're looking to invest more in adding cohere integrations to
langchain so would love to get more of an idea for how it's used.
Hopefully this pr is acceptable. This week I'm also going to be looking
into adding our new [retrieval augmented generation
product](https://txt.cohere.com/chat-with-rag/) to langchain.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-30 11:20:55 -07:00
Bagatur
37aec1e050
bump 326 (#12569) 2023-10-30 10:11:17 -07:00
Eugene Yurtsev
1b1a2d5740
Image Caption accepts bytes for images (#12561)
Accept bytes for images in image caption

---------

Co-authored-by: webcoderz <19884161+webcoderz@users.noreply.github.com>
2023-10-30 12:29:54 -04:00
Nuno Campos
7897483819
Allow astream_log to be used inside atrace_as_chain_group (#12558)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-10-30 15:55:16 +00:00
Tomaz Bratanic
8e88ba16a8
Update neo4j template readmes (#12540) 2023-10-30 07:57:53 -07:00
Bagatur
b2138508cb
google translate nb formatting (#12534) 2023-10-29 21:27:04 -07:00
Holt Skinner
e05bb938de
Merge pull request #12433
* feat: Add Google Cloud Translation document transformer

* Merge branch 'langchain-ai:master' into google-translate

* Add documentation for Google Translate Document Transformer

* Fix line length error

* Merge branch 'master' into google-translate

* Merge branch 'google-translate' of https://github.com/holtskinner/lan…

* Addressed code review comments

* Merge branch 'master' into google-translate

* Merge branch 'google-translate' of https://github.com/holtskinner/lan…

* Removed extra variable

* Merge branch 'google-translate' of https://github.com/holtskinner/lan…

* Merge branch 'master' into google-translate

* Merge branch 'google-translate' of https://github.com/holtskinner/lan…

* Removed extra import
2023-10-29 21:22:36 -04:00
Samad Koita
d1fdcd4fcb
Masking of API Key for GooseAI LLM (#12496)
Description: Add masking of API Key for GooseAI LLM when printed.
Issue: https://github.com/langchain-ai/langchain/issues/12165
Dependencies: None
Tag maintainer: @eyurtsev

---------

Co-authored-by: Samad Koita <>
2023-10-29 21:21:33 -04:00
Andrew Zhou
64c4a698a8
More comprehensive readthedocs document loader (#12382)
## **Description:**
When building our own readthedocs.io scraper, we noticed a couple
interesting things:

1. Text lines with a lot of nested <span> tags would give unclean text
with a bunch of newlines. For example, for [Langchain's
documentation](https://api.python.langchain.com/en/latest/document_loaders/langchain.document_loaders.readthedocs.ReadTheDocsLoader.html#langchain.document_loaders.readthedocs.ReadTheDocsLoader),
a single line is represented in a complicated nested HTML structure, and
the naive `soup.get_text()` call currently being made will create a
newline for each nested HTML element. Therefore, the document loader
would give a messy, newline-separated blob of text. This would be true
in a lot of cases.

<img width="945" alt="Screenshot 2023-10-26 at 6 15 39 PM"
src="https://github.com/langchain-ai/langchain/assets/44193474/eca85d1f-d2bf-4487-a18a-e1e732fadf19">
<img width="1031" alt="Screenshot 2023-10-26 at 6 16 00 PM"
src="https://github.com/langchain-ai/langchain/assets/44193474/035938a0-9892-4f6a-83cd-0d7b409b00a3">

Additionally, content from iframes, code from scripts, css from styles,
etc. will be gotten if it's a subclass of the selector (which happens
more often than you'd think). For example, [this
page](https://pydeck.gl/gallery/contour_layer.html#) will scrape 1.5
million characters of content that looks like this:

<img width="1372" alt="Screenshot 2023-10-26 at 6 32 55 PM"
src="https://github.com/langchain-ai/langchain/assets/44193474/dbd89e39-9478-4a18-9e84-f0eb91954eac">

Therefore, I wrote a recursive _get_clean_text(soup) class function that
1. skips all irrelevant elements, and 2. only adds newlines when
necessary.

2. Index pages (like [this
one](https://api.python.langchain.com/en/latest/api_reference.html))
would be loaded, chunked, and eventually embedded. This is really bad
not just because the user will be embedding irrelevant information - but
because index pages are very likely to show up in retrieved content,
making retrieval less effective (in our tests). Therefore, I added a
bool parameter `exclude_index_pages` defaulted to False (which is the
current behavior — although I'd petition to default this to True) that
will skip all pages where links take up 50%+ of the page. Through manual
testing, this seems to be the best threshold.



## Other Information:
  - **Issue:** n/a
  - **Dependencies:** n/a
  - **Tag maintainer:** n/a
  - **Twitter handle:** @andrewthezhou

---------

Co-authored-by: Andrew Zhou <andrew@heykona.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-29 16:26:53 -07:00
Peter Vandenabeele
3468c038ba
Add unit tests for document_transformers/beautiful_soup_transformer.py (#12520)
- **Description:**
* Add unit tests for document_transformers/beautiful_soup_transformer.py
* Basic functionality is tested (extract tags, remove tags, drop lines)
    * add a FIXME comment about the order of tags that is not preserved
      (and a passing test, but with the expected tags now out-of-order)
  - **Issue:** None
  - **Dependencies:** None
  - **Tag maintainer:** @rlancemartin 
  - **Twitter handle:** `peter_v`

Please make sure your PR is passing linting and testing before
submitting.

=> OK: I ran `make format`, `make test` (passing after install of
beautifulsoup4) and `make lint`.
2023-10-29 16:24:47 -07:00
Bagatur
d31d705407
update contributing (#12532) 2023-10-29 16:22:18 -07:00
Bagatur
0b4b9e61fc
Bagatur/fix doc ci (#12529) 2023-10-29 16:15:18 -07:00
Bagatur
2424fff3f1
notebook fmt (#12498) 2023-10-29 15:50:09 -07:00
Harrison Chase
56cc5b847c
Harrison/add descriptions (#12522) 2023-10-29 15:11:37 -07:00
Anirudh Gautam
b257e6a4e8
Mask API key for AI21 LLM (#12418)
- **Description:** Added masking of the API Key for AI21 LLM when
printed and improved the docstring for AI21 LLM.
- Updated the AI21 LLM to utilize SecretStr from pydantic to securely
manage API key.
- Made improvements in the docstring of AI21 LLM. It now mentions that
the API key can also be passed as a named parameter to the constructor.
    - Added unit tests.
  - **Issue:** #12165 
  - **Tag maintainer:** @eyurtsev

---------

Co-authored-by: Anirudh Gautam <anirudh@Anirudhs-Mac-mini.local>
2023-10-29 14:53:41 -07:00
Nico Baier
35d726dc15
docs(prompt_templates): fix typo in prompt template (#12497)
- **Description:** Fixes a small typo in the [Prompt template
document](https://python.langchain.com/docs/modules/model_io/prompts/prompt_templates/)
  - **Dependencies:** none
2023-10-29 14:52:37 -07:00
silvhua
9dead1034c
_dalle_image_url returns list of urls if n>1 (#11800)
- **Description:** Updated the `_dalle_image_url` method to return a
list of URLs if self.n>1,
  - **Issue:** #10691,
  - **Dependencies:** unsure,
  - **Tag maintainer:** @eyurtsev,
  - **Twitter handle:** @silvhua
---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-29 14:23:23 -07:00
Bagatur
1815ea2fdb
OpenAI runnable constructor (#12455) 2023-10-29 13:40:30 -07:00