Commit Graph

3663 Commits (a6e6e9bb86d3667803dfe693522a3a4c991f6c7d)
 

Author SHA1 Message Date
Leonid Ganeline 0613ed5b95
docstrings for `LLMs` (#7976)
docstrings for the `llms/`:
- added missed docstrings
- update existing docstrings to consistent format (no `Wrappers`!)
@baskaryan
1 year ago
Jeff Huber 5694e7b8cf
Update chroma notebook (#7978)
Fix up the Chroma notebook
- remove `.persist()` -- this is no longer in Chroma as of `0.4.0`
- update output to match `0.4.0`
- other cleanup work
1 year ago
Harutaka Kawamura 4a5894db47
Fix incorrect field name in MLflow AI Gateway config example (#7983) 1 year ago
Kacper Łukawski 19e8472521
Add async Qdrant to async_agent.ipynb (#7993)
I added Qdrant to the async API docs. This is the only vector store that
supports full async API.

@baskaryan @rlancemartin, @eyurtsev
1 year ago
Nuno Campos 8edb1db9dc
Fix key errors in weaviate hybrid retriever init (#7988) 1 year ago
Harrison Chase df84e1bb64
pass callbacks along baby ai (#7908) 1 year ago
William FH a4c5914c9a
Bump LS Version (#7970) 1 year ago
Bagatur 5d021c0962
nb fix (#7962) 1 year ago
Julien Salinas 3adab5e5be
Integrate NLP Cloud embeddings endpoint (#7931)
Add embeddings for [NLPCloud](https://docs.nlpcloud.com/#embeddings).

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Lance Martin <lance@langchain.dev>
1 year ago
Bagatur 854a2be0ca
Add debugging guide (#7956) 1 year ago
Brendan Collins 9aef79c2e3
Add Geopandas.GeoDataFrame Document Loader (#3817)
Work in Progress.
WIP
Not ready...

Adds Document Loader support for
[Geopandas.GeoDataFrames](https://geopandas.org/)

Example:
- [x] stub out `GeoDataFrameLoader` class
- [x] stub out integration tests
- [ ] Experiment with different geometry text representations
- [ ] Verify CRS is successfully added in metadata
- [ ] Test effectiveness of searches on geometries
- [ ] Test with different geometry types (point, line, polygon with
multi-variants).
- [ ] Add documentation

---------

Co-authored-by: Lance Martin <lance@langchain.dev>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Lance Martin <122662504+rlancemartin@users.noreply.github.com>
1 year ago
Lance Martin dfc533aa74
Add llama-v2 to local document QA (#7952) 1 year ago
Bagatur d9b5bcd691
bump (#7948) 1 year ago
Bagatur f97535b33e
fix (#7947) 1 year ago
Adilkhan Sarsen 7bb843477f
Removed kwargs from add_texts (#7595)
Removing **kwargs argument from add_texts method in DeepLake vectorstore
as it confuses users and doesn't fail when user is typing incorrect
parameters.

Also added small test to ensure the change is applies correctly.

Guys could pls take a look: @rlancemartin, @eyurtsev, this is a small
PR.

Thx so much!
1 year ago
Bagatur 4d8b48bdb3
bump 236 (#7938) 1 year ago
Harutaka Kawamura f6839a8682
Add integration for MLflow AI Gateway (#7113)
<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description: a description of the change, 
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @baskaryan
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @baskaryan
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @hinthornw
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->


- Adds integration for MLflow AI Gateway (this will be shipped in MLflow
2.5 this week).


Manual testing:

```sh
# Move to mlflow repo
cd /path/to/mlflow

# install langchain
pip install git+https://github.com/harupy/langchain.git@gateway-integration

# launch gateway service
mlflow gateway start --config-path examples/gateway/openai/config.yaml

# Then, run the examples in this PR
```
1 year ago
David Preti 6792a3557d
Update openai.py compatibility with azure 2023-07-01-preview (#7937)
Fixed missing "content" field in azure. 
Added a check for "content" in _dict (missing for azure
api=2023-07-01-preview)
@baskaryan

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago
王斌(Bin Wang) b65102bdb2
fix: pgvector search_type of similarity_score_threshold not working (#7771)
- Description: VectorStoreRetriever->similarity_score_threshold with
search_type of "similarity_score_threshold" not working with the
following two minor issues,
- Issue: 1. In line 237 of `vectorstores/base.py`, "score_threshold" is
passed to `_similarity_search_with_relevance_scores` as in the kwargs,
while score_threshold is not a valid argument of this method. As a fix,
before calling `_similarity_search_with_relevance_scores`,
score_threshold is popped from kwargs. 2. In line 596 to 607 of
`vectorstores/pgvector.py`, it's checking the distance_strategy against
the string in Enum. However, self.distance_strategy will get the
property of distance_strategy from line 316, where the callable function
is passed. To solve this issue, self.distance_strategy is changed to
self._distance_strategy to avoid calling the property method.,
  - Dependencies: No,
  - Tag maintainer: @rlancemartin, @eyurtsev,
  - Twitter handle: No

---------

Co-authored-by: Bin Wang <bin@arcanum.ai>
1 year ago
William FH 9d7e57f5c0
Docs Nit (#7918) 1 year ago
Wilson Leao Neto 8bb33f2296
Exposes Kendra result item DocumentAttributes in the document metadata (#7781)
- Description: exposes the ResultItem DocumentAttributes as document
metadata with key 'document_attributes' and refactors
AmazonKendraRetriever by providing a ResultItem base class in order to
avoid duplicate code;
- Tag maintainer: @3coins @hupe1980 @dev2049 @baskaryan
- Twitter handle: wilsonleao

### Why?
Some use cases depend on specific document attributes returned by the
retriever in order to improve the quality of the overall completion and
adjust what will be displayed to the user. For the sake of consistency,
we need to expose the DocumentAttributes as document metadata so we are
sure that we are using the values returned by the kendra request issued
by langchain.

I would appreciate your review @3coins @hupe1980 @dev2049. Thank you in
advance!

### References
- [Amazon Kendra
DocumentAttribute](https://docs.aws.amazon.com/kendra/latest/APIReference/API_DocumentAttribute.html)
- [Amazon Kendra
DocumentAttributeValue](https://docs.aws.amazon.com/kendra/latest/APIReference/API_DocumentAttributeValue.html)

---------

Co-authored-by: Piyush Jain <piyushjain@duck.com>
1 year ago
Wilson Leao Neto efa67ed0ef
fix #7782: check title and excerpt separately for page_content (#7783)
- Description: check title and excerpt separately for page_content so
that if title is empty but excerpt is present, the page_content will
only contain the excerpt
  - Issue: #7782 
  - Tag maintainer: @3coins @baskaryan 
  - Twitter handle: wilsonleao
1 year ago
Leonid Ganeline d92926cbc2
docstrings `chains` (#7892)
Added/updated docstrings.
1 year ago
Leonid Ganeline 4a810756f8
docstrings `chains` (#7892)
Added/updated docstrings.

@baskaryan
1 year ago
Jarek Kazmierczak f2ef3ff54a
Google Cloud Enterprise Search retriever (#7857)
Added a retriever that encapsulated Google Cloud Enterprise Search.


---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago
Alonso Silva Allende 1152f4d48b
Allow chat models that do not return token usage (#7907)
- Description: It allows to use chat models that do not return token
usage
- Issue: [#7900](https://github.com/hwchase17/langchain/issues/7900)
- Dependencies: None
- Tag maintainer: @agola11 @hwchase17 
- Twitter handle: @alonsosilva

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: William FH <13333726+hinthornw@users.noreply.github.com>
1 year ago
Zizhong Zhang bdf0c2267f
docs(custom_chain) fix typo (#7898)
Fix typo in the document of custom_chain
1 year ago
Jeff Huber 2139d0197e
upgrade chroma to 0.4.0 (#7749)
** This should land Monday the 17th ** 

Chroma is upgrading from `0.3.29` to `0.4.0`. `0.4.0` is easier to
build, more durable, faster, smaller, and more extensible. This comes
with a few changes:

1. A simplified and improved client setup. Instead of having to remember
weird settings, users can just do `EphemeralClient`, `PersistentClient`
or `HttpClient` (the underlying direct `Client` implementation is also
still accessible)

2. We migrated data stores away from `duckdb` and `clickhouse`. This
changes the api for the `PersistentClient` that used to reference
`chroma_db_impl="duckdb+parquet"`. Now we simply set
`is_persistent=true`. `is_persistent` is set for you to `true` if you
use `PersistentClient`.

3. Because we migrated away from `duckdb` and `clickhouse` - this also
means that users need to migrate their data into the new layout and
schema. Chroma is committed to providing extension notification and
tooling around any schema and data migrations (for example - this PR!).

After upgrading to `0.4.0` - if users try to access their data that was
stored in the previous regime, the system will throw an `Exception` and
instruct them how to use the migration assistant to migrate their data.
The migration assitant is a pip installable CLI: `pip install
chroma_migrate`. And is runnable by calling `chroma_migrate`

-- TODO ADD here is a short video demonstrating how it works. 

Please reference the readme at
[chroma-core/chroma-migrate](https://github.com/chroma-core/chroma-migrate)
to see a full write-up of our philosophy on migrations as well as more
details about this particular migration.

Please direct any users facing issues upgrading to our Discord channel
called
[#get-help](https://discord.com/channels/1073293645303795742/1129200523111841883).
We have also created a [email
listserv](https://airtable.com/shrHaErIs1j9F97BE) to notify developers
directly in the future about breaking changes.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago
Gergely Papp 10246375a5
Gpapp/chromadb (#7891)
- Description: version check to make sure chromadb >=0.4.0 does not
throw an error, and uses the default sqlite persistence engine when the
directory is set,
  - Issue: the issue #7887 

For attention of
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago
Lance Martin 41c841ec85
Add Llama-v2 to Llama.cpp notebook (#7913) 1 year ago
Bagatur b9639f6067
fix docs (#7911) 1 year ago
Jeff Huber dc8b790214
Improve vector store onboarding exp (#6698)
This PR
- fixes the `similarity_search_by_vector` example, makes the code run
and adds the example to mirror `similarity_search`
- reverts back to chroma from faiss to remove sharp edges / create a
happy path for new developers. (1) real metadata filtering, (2) expected
functionality like `update`, `delete`, etc to serve beyond the most
trivial use cases

@hwchase17

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago
Bagatur 25a2bdfb70
add pr template instructions (#7904) 1 year ago
Hanit 0d23c0c82a
Allowing additional params for OpenAIEmbeddings. (#7752)
(#7654)

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago
Lance Martin 862268175e
Add llama-v2 to docs (#7893) 1 year ago
TRY-ER 21d1c988a9
Try er/redis index retrieval retry00 (#7773)
Replace this comment with:
- Description: Modified the code to return the document id from the
redis document search as metadata.
  - Issue: the issue # it fixes retrieval of id as metadata as string 
  - Tag maintainer: @rlancemartin, @eyurtsev

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago
shibuiwilliam 177baef3a1
Add test for svm retriever (#7768)
# What
- This is to add unit test for svm retriever.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago
Filip Michalsky 69b9db2b5e
Notebook update: sales agent with tools (#7753)
- Description: This is an update to a previously published notebook. 
Sales Agent now has access to tools, and this notebook shows how to use
a Product Knowledge base
  to reduce hallucinations and act as a better sales person!
  - Issue: N/A
  - Dependencies: `chromadb openai tiktoken`
  - Tag maintainer:  @baskaryan @hinthornw
  - Twitter handle: @FilipMichalsky
1 year ago
shibuiwilliam f29a5d4bcc
add test for knn retriever (#7769)
# What
- This is to add test for knn retriever.
---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago
Orgil 75d3f1e5e6
remove unused import in voice assistant doc (#7757)
Description: Removed unused import in voice_assistant doc. 
Tag maintainer: @baskaryan
1 year ago
maciej-skorupka c6d1d6d7fc
feat: moving azure OpenAI API version to the latest 2023-05-15 (#7764)
Moving to the latest non-preview Azure OpenAI API version=2023-05-15.
The previous 2023-03-15-preview doesn't have support, SLA etc. For
instance, OpenAI SDK has moved to this version
https://github.com/openai/openai-python/releases/tag/v0.27.7

@baskaryan
1 year ago
satorioh 259a409998
docs(zilliz): connection_args add token description for serverless cl… (#7810)
Description:

Currently, Zilliz only support dedicated clusters using a pair of
username and password for connection. Regarding serverless clusters,
they can connect to them by using API keys( [ see official note
detail](https://docs.zilliz.com/docs/manage-cluster-credentials)), so I
add API key(token) description in Zilliz docs to make it more obvious
and convenient for this group of users to better utilize Zilliz. No
changes done to code.

---------

Co-authored-by: Robin.Wang <3Jg$94sbQ@q1>
Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago
shibuiwilliam 235264a246
Add/test faiss (#7809)
# What
- Add missing test cases to faiss vectore stores
1 year ago
maciej-skorupka 5de7815310
docs: added comment from azure llm to azure chat about GPT-4 (#7884)
Azure GPT-4 models can't be accessed via LLM model. It's easy to miss
that and a lot of discussions about that are on the Internet. Therefore
I added a comment in Azure LLM docs that mentions that and points to
Azure Chat OpenAI docs.
@baskaryan

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago
Leonid Ganeline 4a05b7f772
docstrings `prompts` (#7844)
Added missed docstrings in `prompts`
@baskaryan
1 year ago
Bill Zhang dda11d2a05
WeaviateHybridSearchRetriever option to enable scores. (#7861)
Description: This PR adds the option to retrieve scores and explanations
in the WeaviateHybridSearchRetriever. This feature improves the
usability of the retriever by allowing users to understand the scoring
logic behind the search results and further refine their search queries.

Issue: This PR is a solution to the issue #7855 
Dependencies: This PR does not introduce any new dependencies.

Tag maintainer: @rlancemartin, @eyurtsev

I have included a unit test for the added feature, ensuring that it
retrieves scores and explanations correctly. I have also included an
example notebook demonstrating its use.
1 year ago
Leonid Ganeline 527210972e
docstrings `output_parsers` (#7859)
Added/updated the docstrings from `output_parsers`
 @baskaryan
1 year ago
Jonathan Pedoeem c460c29a64
Adding Docs for `PromptLayerCallbackHandler` (#7860)
Here I am adding documentation for the `PromptLayerCallbackHandler`.
When we created the initial PR for the callback handler the docs were
causing issues, so we merged without the docs.
1 year ago
ljeagle 3902b85657
Add metadata and page_content filters of documents in AwaDB (#7862)
1. Add the metadata filter of documents.
2. Add the text page_content filter of documents
3. fix the bug of similarity_search_with_score

Improvement and fix bug of AwaDB
Fix the conflict https://github.com/hwchase17/langchain/pull/7840
@rlancemartin @eyurtsev  Thanks!

---------

Co-authored-by: vincent <awadb.vincent@gmail.com>
1 year ago
German Martin f1eaa9b626
Lost in the middle: We have been ordering documents the WRONG way. (for long context) (#7520)
Motivation, it seems that when dealing with a long context and "big"
number of relevant documents we must avoid using out of the box score
ordering from vector stores.
See: https://arxiv.org/pdf/2306.01150.pdf

So, I added an additional parameter that allows you to reorder the
retrieved documents so we can work around this performance degradation.
The relevance respect the original search score but accommodates the
lest relevant document in the middle of the context.
Extract from the paper (one image speaks 1000 tokens):

![image](https://github.com/hwchase17/langchain/assets/1821407/fafe4843-6e18-4fa6-9416-50cc1d32e811)
This seems to be common to all diff arquitectures. SO I think we need a
good generic way to implement this reordering and run some test in our
already running retrievers.
It could be that my approach is not the best one from the architecture
point of view, happy to have a discussion about that.
For me this was the best place to introduce the change and start
retesting diff implementations.

@rlancemartin, @eyurtsev

---------

Co-authored-by: Lance Martin <lance@langchain.dev>
1 year ago