Commit Graph

3291 Commits

Author SHA1 Message Date
Christophe Bornet
6f477e3cb6
docs: Remove chromadb from required dependency in examples with VectorstoreIndexCreator (#19578) 2024-03-26 11:12:21 -04:00
Piyush Jain
72ba738bf5
community[minor]: Improvements for NeptuneRdfGraph, Improve discovery of graph schema using database statistics (#19546)
Fixes linting for PR
[19244](https://github.com/langchain-ai/langchain/pull/19244)

---------

Co-authored-by: mhavey <mchavey@gmail.com>
2024-03-26 10:36:51 -04:00
aditya thomas
fc6b92bb9a
docs: add cohere to the list of partners (#19552)
**Description:** Add Cohere to the list of LangChain partners
**Issue:** The Cohere partner package was recently added
[#19049](https://github.com/langchain-ai/langchain/pull/19049)
**Dependencies:** None
2024-03-26 10:22:03 -04:00
Aayush Kataria
03c38005cb
community[patch]: Fixing some caching issues for AzureCosmosDBSemanticCache (#18884)
Fixing some issues for AzureCosmosDBSemanticCache
- Added the entry for "AzureCosmosDBSemanticCache" which was missing in
langchain/cache.py
- Added application name when creating the MongoClient for the
AzureCosmosDBVectorSearch, for tracking purposes.

@baskaryan, can you please review this PR, we need this to go in asap.
These are just small fixes which we found today in our testing.
2024-03-25 19:06:17 -07:00
miri-bar
55db737302
ai21[minor]: AI21 Labs Semantic Text Splitter support (#19510)
Description: Added support for AI21 Labs model - Segmentation, as a Text
Splitter
Dependencies: ai21, langchain-text-splitter
Twitter handle: https://github.com/AI21Labs

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-03-26 01:39:37 +00:00
Anindyadeep
b2a11ce686
community[minor]: Prem AI langchain integration (#19113)
### Prem SDK integration in LangChain

This PR adds the integration with [PremAI's](https://www.premai.io/)
prem-sdk with langchain. User can now access to deployed models
(llms/embeddings) and use it with langchain's ecosystem. This PR adds
the following:

### This PR adds the following:

- [x]  Add chat support
- [X]  Adding embedding support
- [X]  writing integration tests
    - [X]  writing tests for chat 
    - [X]  writing tests for embedding
- [X]  writing unit tests
    - [X]  writing tests for chat 
    - [X]  writing tests for embedding
- [X]  Adding documentation
    - [X]  writing documentation for chat
    - [X]  writing documentation for embedding
- [X] run `make test`
- [X] run `make lint`, `make lint_diff` 
- [X]  Final checks (spell check, lint, format and overall testing)

---------

Co-authored-by: Anindyadeep Sannigrahi <anindyadeepsannigrahi@Anindyadeeps-MacBook-Pro.local>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-03-26 01:37:19 +00:00
Alessandro D'Armiento
37eb3a4a9e
docs: Some import nits (#19130)
- **Description:** fixes some minor issues in the documentation

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-03-26 01:25:44 +00:00
Anthony Shaw
6c9b0f96f3
docs: Add guidance for splitting Chinese, Japanese, and Thai (#19295)
The existing default list of separators for the `RecursiveTextSplitter`
assumes spaces are word boundaries. Some languages [don't use spaces
between
words](https://en.wikipedia.org/wiki/Category:Writing_systems_without_word_boundaries)
(Chinese, Japanese, Thai, Burmese).

This PR extends the documentation to explain how to cater for those
languages by adding additional punctuation to the separators and
zero-width spaces which are used by some typesetters and will assist the
splitter to not split in words.

Ideally, **these separators could be a constant in the module** but for
now, defining them in the documentation is a start.
2024-03-26 00:34:00 +00:00
Ian
d5415dbd68
docs: improve tidb integrations documents (#19321)
This PR aims to enhance the documentation for TiDB integration, driven
by feedback from our users. It provides detailed introductions to key
features, ensuring developers can fully leverage TiDB for AI application
development.
2024-03-25 17:08:23 -07:00
Dmitry Tyumentsev
08b769d539
community[patch]: YandexGPT Use recent yandexcloud sdk version (#19341)
Fixed inability to work with [yandexcloud
SDK](https://pypi.org/project/yandexcloud/) version higher 0.265.0
2024-03-25 17:05:57 -07:00
Tridib Roy Arjo
d667b1ea8f
docs: Update async_chromium.ipynb (#19514)
In Jupyter, asyncio would throw an error before `.load()` unless
`nest_asyncio` is applied (Issue #8494 mentioned this)

+Minor typo fixes..
2024-03-26 00:02:50 +00:00
Bob Lin
5b6b1f9e1d
docs: Fix several sample code errors (#19382) 2024-03-25 16:59:52 -07:00
Hamid Ali
c281ec8887
docs: Fix broken link in semantic-chunker.ipynb (#19464)
Corrected a broken link within the semantic-chunker.ipynb notebook,
ensuring that users can access the referenced resource.

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-03-25 23:39:32 +00:00
Ikko Eltociear Ashimine
980658cb47
docs: Update streaming.ipynb (#19500)
Fixed typo.

occuring -> occurring
2024-03-25 16:21:45 -07:00
Leonid Kuligin
91f4c80143
docs: fixed links (#19503)
- [ ] **PR title**: "docs: fixed broken links"


- [ ] **PR message**:
    - **Description:** fixed links in the documentation
2024-03-25 16:19:28 -07:00
Mikelarg
dac2e0165a
community[minor]: Added GigaChat Embeddings support + updated previous GigaChat integration (#19516)
- **Description:** Added integration with
[GigaChat](https://developers.sber.ru/portal/products/gigachat)
embeddings. Also added support for extra fields in GigaChat LLM and
fixed docs.
2024-03-25 16:08:37 -07:00
Erica Clark
a1ff21f90f
docs: Update local llms article to use invoke instead of deprecated __call__ (#19528)
- **Description:** Since the implicit `__call__` has been deprecated in
favor of `invoke`, the local_llms article also needed to be updated.
This article was my introduction to Lanchain, and as it was helpful in
getting me setup with running LLMs locally, it is nice to not have any
warnings when running the example code. With this change, the warnings
go away when running the example code.
  - **Issue:** N/A
  - **Dependencies:** N/A
  - **Twitter handle:** clarkerican
2024-03-25 15:51:39 -07:00
billytrend-cohere
63343b4987
cohere[patch]: add cohere as a partner package (#19049)
Description: adds support for langchain_cohere

---------

Co-authored-by: Harry M <127103098+harry-cohere@users.noreply.github.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-03-25 20:23:47 +00:00
Igor Muniz Soares
743f888580
community[minor]: Dappier chat model integration (#19370)
**Description:** 

This PR adds [Dappier](https://dappier.com/) for the chat model. It
supports generate, async generate, and batch functionalities. We added
unit and integration tests as well as a notebook with more details about
our chat model.


**Dependencies:** 
    No extra dependencies are needed.
2024-03-25 07:29:05 +00:00
Hugoberry
96dc180883
community[minor]: Add DuckDB as a vectorstore (#18916)
DuckDB has a cosine similarity function along list and array data types,
which can be used as a vector store.
- **Description:** The latest version of DuckDB features a cosine
similarity function, which can be used with its support for list or
array column types. This PR surfaces this functionality to langchain.
    - **Dependencies:** duckdb 0.10.0
    - **Twitter handle:** @igocrite

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-03-25 07:02:35 +00:00
Ethan Yang
fa6397d76a
docs: Add OpenVINO llms docs (#19489)
Add OpenVINOpipeline instructions in docs. OpenVINO users can find more
details in this page.
2024-03-24 23:57:30 -07:00
Lance Martin
db7403d667
docs: Remove non-rendering images & output spamming from doc ntbks (#19475)
Looking at tokens / page of our docs, we see a few outliers:
<img width="761" alt="image"
src="https://github.com/langchain-ai/langchain/assets/122662504/677aa2d6-0a29-45e4-882a-db2bbf46d02b">

It is due to non-rendering images in one case, and output spamming. 

Clean these, along with other cases of excessing output spamming in
docs.

All get sucked into chat-langchain for retrieval.
2024-03-24 23:47:38 -07:00
aditya thomas
b43a9d5808
docs: adding voyageai to the list of partner packages (#19376)
**Description:** Adding VoyageAI to the list of partners
**Issue:** A standalone langchain-voyageai package has been added
**Dependencies:** None
2024-03-22 17:08:15 -07:00
Zeeland
2549df00cd
docs: fix error bilibili url (#19375)
Thank you for contributing to LangChain!

bilibili-api-python use https://github.com/Nemo2011/bilibili-api repo.
Change to the correct address.

- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
2024-03-22 17:06:17 -07:00
aditya thomas
375ab7bf59
docs: update module imports for fireworks documentation (#19377)
**Description:** Update module imports for Fireworks documentation
**Issue:** Module imports not present or in incorrect location
**Dependencies:** None
2024-03-22 17:05:27 -07:00
aditya thomas
0cc0467267
docs: update import paths and move to lcel for llama.cpp examples (#19391)
**Description:** Update import paths and move to lcel for llama.cpp
examples
**Issue:** Update import paths to reflect package refactoring and move
chains to LCEL in examples
**Dependencies:** None

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-03-23 00:04:12 +00:00
fengjial
3b52ee05d1
community[patch]: fix bugs in baiduvectordb as vectorstore (#19380)
fix small bugs in vectorstore/baiduvectordb
2024-03-22 17:03:59 -07:00
Cailin Wang
5402aef32e
docs: Add partition parameter to DashVector (#19385)
**Description**: Add `partition` parameter to DashVector
dashvector.ipynb
**Related PR**: https://github.com/langchain-ai/langchain/pull/19023
**Twitter handle**: @CailinWang_

---------

Co-authored-by: root <root@Bluedot-AI>
2024-03-22 17:00:29 -07:00
aditya thomas
16ef88a87d
docs: moving FireworksEmbeddings documentation to docs folder (#19398)
**Description:** Moving FireworksEmbeddings documentation to the
location docs/integration/text_embedding/ from langchain_fireworks/docs/
**Issue:** FireworksEmbeddings documentation was not in the correct
location
**Dependencies:** None

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-03-22 23:24:22 +00:00
Ray Bell
7d36ee38b7
docs: point to titantic dataset on web (#19455)
Updated `pd.read_csv("titantic.csv")` to
`pd.read_csv("https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/titanic.csv")`
i.e. it will read it
https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/titanic.csv
and allow anyone to run the code.

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-03-22 22:22:41 +00:00
Ray Bell
f959fad56e
docs: use invoke instead of run (#19457)
Updated the deprecated run with invoke

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-03-22 15:08:26 -07:00
老阿張
9dfce56b31
docs: Fix typo in infino.ipynb (#18640)
Description: "conquerer should be conqueror "? 🤔
Issue: Typo
Dependencies: Nope
Twitter handle: laoazhang
2024-03-20 07:51:58 -07:00
aditya thomas
e46419c851
docs: contribute / integrations code examples update (#19319)
**Description:** Update to make the code examples consistent with the
actual use
**Issue:** Code examples were different from actual use in the LangChain
code
**Dependencies:** Changes on top of
https://github.com/langchain-ai/langchain/pull/19294

Note: If these changes are acceptable, please merge them after
https://github.com/langchain-ai/langchain/pull/19294.
2024-03-20 09:27:53 -04:00
Brace Sproul
40f846e65d
docs[minor]: Add chat model selection tabs component (#19296)
<img width="1728" alt="image"
src="https://github.com/langchain-ai/langchain/assets/46789226/45e70a92-c2ee-48c8-9964-100eed22687b">
2024-03-19 18:12:46 -07:00
Nithish Raghunandanan
7ad0a3f2a7
community: add Couchbase Vector Store (#18994)
- **Description:** Added support for Couchbase Vector Search to
LangChain.
- **Dependencies:** couchbase>=4.1.12
- **Twitter handle:** @nithishr

---------

Co-authored-by: Nithish Raghunandanan <nithishr@users.noreply.github.com>
2024-03-19 12:39:51 -07:00
Chris Papademetrious
305d74c67a
core: implement a batch_size parameter for CacheBackedEmbeddings (#18070)
**Description:**

Currently, `CacheBackedEmbeddings` computes vectors for *all* uncached
documents before updating the store. This pull request updates the
embedding computation loop to compute embeddings in batches, updating
the store after each batch.

I noticed this when I tried `CacheBackedEmbeddings` on our 30k document
set and the cache directory hadn't appeared on disk after 30 minutes.

The motivation is to minimize compute/data loss when problems occur:

* If there is a transient embedding failure (e.g. a network outage at
the embedding endpoint triggers an exception), at least the completed
vectors are written to the store instead of being discarded.
* If there is an issue with the store (e.g. no write permissions), the
condition is detected early without computing (and discarding!) all the
vectors.

**Issue:**
Implements enhancement #18026.

**Testing:**
I was unable to run unit tests; details in [this
post](https://github.com/langchain-ai/langchain/discussions/15019#discussioncomment-8576684).

---------

Signed-off-by: chrispy <chrispy@synopsys.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-03-19 18:55:43 +00:00
Christophe Bornet
30e4a35d7a
community: Use langchain-astradb for AstraDB caches (#18419)
- [x] Needs https://github.com/langchain-ai/langchain-datastax/pull/4
- [x] Needs a new release of langchain-astradb
2024-03-19 14:04:36 -04:00
Brace Sproul
17c62e0f3a
ci[minor]: Bump LC scripts package, add retry option (#19285)
The `retryFailed` option will retry all failed links, once at a time
with the goal of not triggering bot protection

`microsoft.com` is now hard coded into the whitelist
2024-03-19 10:42:59 -07:00
Erick Friis
7eb376d5fc
docs: integration deprecation docs (#19283) 2024-03-19 17:11:15 +00:00
HatsuneMK00
4761c09e94
docs: update slack toolkit ipynb in integration (#19219)
Thank you for contributing to LangChain!

- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- **PR message**:
- **Description:** Update the slack toolkit doc to use an agent that
support multiple inputs. Using ReAct agent will cause a ValidationError
when invoking the slack tools. This is because the agent return a string
like `'{"channel": "C05LDF54S21", "message": "Hello, world!"}'` but the
ReAct agent does not support multiple inputs.
- **Issue:** This is related to this
[Discussion#18083](https://github.com/langchain-ai/langchain/discussions/18083)
    - **Dependencies:** No dependencies required

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-03-19 10:39:09 -04:00
Vittorio Rigamonti
9b2f9ee952
community: VectorStore Infinispan, adding autoconfiguration (#18967)
**Description**:
this PR enable VectorStore autoconfiguration for Infinispan: if
metadatas are only of basic types, protobuf
config will be automatically generated for the user.
2024-03-18 21:33:45 -07:00
Anthony Shaw
bb0dd8f82f
docs: Embellish article on splitting by tokens with more examples and missing details (#18997)
**Description**

This PR adds some missing details from the "Split by tokens" page in the
documentation. Specifically:

- The `.from_tiktoken_encoder()` class methods for both the
`CharacterTextSplitter` and `RecursiveCharacterTextSplitter` default to
the old `gpt-2` encoding. I've added a comment to suggest specifying
`model_name` or `encoding`
- The docs didn't mention that the `from_tiktoken_encoder()` class
method passes additional kwargs down to the constructor of the splitter.
I only discovered this by reading the source code
- Added an example of using the `.from_tiktoken_encoder()` class method
with `RecursiveCharacterTextSplitter` which is the recommended approach
for most scenarios above `CharacterTextSplitter`
- Added a warning that `TokenTextSplitter` can split characters which
have multiple tokens (e.g. 猫 has 3 cl100k_base tokens) between multiple
chunks which creates malformed Unicode strings and should not be used in
these situations.

Side note: I think the default argument of `gpt2` for
`.from_tiktoken_encoder()` should be updated?

**Twitter handle** anthonypjshaw

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-03-18 21:28:17 -07:00
Simon Stone
58c7687174
langchain: preserve document metadata in FlashrankRerank (#19148)
**Description:** Preserves document metadata in `FlashrankRerank`
    - **Issue:** #19142
    - **Dependencies:** None
    - **Twitter handle:** n/a

---------

Co-authored-by: Simon Stone <simon.stone@dartmouth.edu>
2024-03-19 04:15:18 +00:00
Simon Stone
dc4ce82ddd
docs: fix import path for FlashrankRerank example notebook (#19146)
**Description:** Fixes the import paths for the `FlashrankRerank`
example notebook.
 **Issue:** #19139 
 **Dependencies:** None
 **Twitter handle:** n/a

---------

Co-authored-by: Simon Stone <simon.stone@dartmouth.edu>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-03-18 21:03:00 -07:00
Saurav Kumar
bde199d128
Updating format of pip install (#19198)
Thank you for contributing to LangChain!

- [x] **PR title**: "Updating format of pip install in two files of
docs/cookbook"
- pip install is not reflecting properly in some of the files in
cookbook
- Example:
[docs/expression_language/cookbook/sql_db](https://python.langchain.com/docs/expression_language/cookbook/sql_db)


- [x] **PR message**: Updating format of pip install in two files of
docs/cookbook
    - **Description:** a description of the change
    - **Issue:** #19197 

- Note - let's do squash merge for the PR

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
2024-03-19 04:01:24 +00:00
HowardChan
ae3c7f702c
docs:Make url as a markdown link (#19212)
**Description**: same as the title

Co-authored-by: ChenZhengHao <chenzhenghao@mail.teletraan.io>
2024-03-19 03:47:52 +00:00
Estephania Calvo Carvajal
94e58dd827
docs:Fix links to LangSmith docs on Evaluation page (#19210) (#19216)
- **Description:** Same as the title
- **Issue:** #19210
2024-03-18 22:27:43 +00:00
Kenzie Mihardja
21f75991d4
deprecate community docugami loader (#19230)
Thank you for contributing to LangChain!

- [x] **PR title**: "community: deprecate DocugamiLoader"

- [x] **PR message**: Deprecate the langchain_community and use the
docugami_langchain DocugamiLoader

---------

Co-authored-by: Kenzie Mihardja <kenzie28@cs.washington.edu>
2024-03-18 12:56:47 -07:00
Anubhav Madhav
9235dade90
docs: provided hyperlinks to text and fixed grammar (#19092)
1) Provided links to text in the prompt (Refer Page Link 1, Page Link 2
and Page Link 3)
2) Fixed Grammar in Considerations of Model I/O Concepts documentation
page - Update concepts.mdx (Page Link 4)

*Issues are on the following pages:*
Page Link 1:
https://python.langchain.com/docs/modules/model_io/concepts#prompttemplate
Page Link 2:
https://python.langchain.com/docs/modules/model_io/concepts#messageprompttemplate
Page Link 3:
https://python.langchain.com/docs/modules/model_io/concepts#chatprompttemplate
Page Link 4:
https://python.langchain.com/docs/modules/model_io/concepts#considerations


**Fix 1**:
Description: Fixed Grammar in Considerations of Model I/O Documentation
Page
Issue: "to work well with the model are you using" # "to work well with
the model you are using"
Dependencies: None
Twitter handle: @Anubhav_Madhav (https://twitter.com/Anubhav_Madhav)

**Fix 2**:
Description: Provided links to text in the prompt (Refer Page Link 1,
Page Link 2 and Page Link 3)
Issue: links not provided # links have been provided to the text
Dependencies: None
Twitter handle: @Anubhav_Madhav (https://twitter.com/Anubhav_Madhav)
baskaryan, efriis, eyurtsev, hwchase17.


*For Fix 1*
Refer to the first word 'This" word in the image attached with this PR.
PFA
<img width="839" alt="Screenshot 2024-03-15 at 3 04 17 AM"
src="https://github.com/langchain-ai/langchain/assets/42323737/94e8db16-249f-48c3-a1d1-dee8d36067fa">


If no one reviews your PR within a few days, please @-mention one of

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-03-17 01:37:42 +00:00
inpyeong
7c092f479f
docs: Update why.ipynb (#19173)
I think that cell type for pip command may be 'code'.
Please check, thank you :)

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
2024-03-16 22:21:51 +00:00