Commit Graph

3227 Commits

Author SHA1 Message Date
shibuiwilliam
177baef3a1
Add test for svm retriever (#7768)
# What
- This is to add unit test for svm retriever.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-18 09:57:24 -07:00
Filip Michalsky
69b9db2b5e
Notebook update: sales agent with tools (#7753)
- Description: This is an update to a previously published notebook. 
Sales Agent now has access to tools, and this notebook shows how to use
a Product Knowledge base
  to reduce hallucinations and act as a better sales person!
  - Issue: N/A
  - Dependencies: `chromadb openai tiktoken`
  - Tag maintainer:  @baskaryan @hinthornw
  - Twitter handle: @FilipMichalsky
2023-07-18 09:53:12 -07:00
shibuiwilliam
f29a5d4bcc
add test for knn retriever (#7769)
# What
- This is to add test for knn retriever.
---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-18 09:52:11 -07:00
Orgil
75d3f1e5e6
remove unused import in voice assistant doc (#7757)
Description: Removed unused import in voice_assistant doc. 
Tag maintainer: @baskaryan
2023-07-18 09:51:28 -07:00
maciej-skorupka
c6d1d6d7fc
feat: moving azure OpenAI API version to the latest 2023-05-15 (#7764)
Moving to the latest non-preview Azure OpenAI API version=2023-05-15.
The previous 2023-03-15-preview doesn't have support, SLA etc. For
instance, OpenAI SDK has moved to this version
https://github.com/openai/openai-python/releases/tag/v0.27.7

@baskaryan
2023-07-18 09:50:15 -07:00
satorioh
259a409998
docs(zilliz): connection_args add token description for serverless cl… (#7810)
Description:

Currently, Zilliz only support dedicated clusters using a pair of
username and password for connection. Regarding serverless clusters,
they can connect to them by using API keys( [ see official note
detail](https://docs.zilliz.com/docs/manage-cluster-credentials)), so I
add API key(token) description in Zilliz docs to make it more obvious
and convenient for this group of users to better utilize Zilliz. No
changes done to code.

---------

Co-authored-by: Robin.Wang <3Jg$94sbQ@q1>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-18 09:31:39 -07:00
shibuiwilliam
235264a246
Add/test faiss (#7809)
# What
- Add missing test cases to faiss vectore stores
2023-07-18 08:30:35 -07:00
maciej-skorupka
5de7815310
docs: added comment from azure llm to azure chat about GPT-4 (#7884)
Azure GPT-4 models can't be accessed via LLM model. It's easy to miss
that and a lot of discussions about that are on the Internet. Therefore
I added a comment in Azure LLM docs that mentions that and points to
Azure Chat OpenAI docs.
@baskaryan

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-18 08:05:41 -07:00
Leonid Ganeline
4a05b7f772
docstrings prompts (#7844)
Added missed docstrings in `prompts`
@baskaryan
2023-07-18 07:58:22 -07:00
Bill Zhang
dda11d2a05
WeaviateHybridSearchRetriever option to enable scores. (#7861)
Description: This PR adds the option to retrieve scores and explanations
in the WeaviateHybridSearchRetriever. This feature improves the
usability of the retriever by allowing users to understand the scoring
logic behind the search results and further refine their search queries.

Issue: This PR is a solution to the issue #7855 
Dependencies: This PR does not introduce any new dependencies.

Tag maintainer: @rlancemartin, @eyurtsev

I have included a unit test for the added feature, ensuring that it
retrieves scores and explanations correctly. I have also included an
example notebook demonstrating its use.
2023-07-18 07:57:17 -07:00
Leonid Ganeline
527210972e
docstrings output_parsers (#7859)
Added/updated the docstrings from `output_parsers`
 @baskaryan
2023-07-18 07:51:44 -07:00
Jonathan Pedoeem
c460c29a64
Adding Docs for PromptLayerCallbackHandler (#7860)
Here I am adding documentation for the `PromptLayerCallbackHandler`.
When we created the initial PR for the callback handler the docs were
causing issues, so we merged without the docs.
2023-07-18 07:51:16 -07:00
ljeagle
3902b85657
Add metadata and page_content filters of documents in AwaDB (#7862)
1. Add the metadata filter of documents.
2. Add the text page_content filter of documents
3. fix the bug of similarity_search_with_score

Improvement and fix bug of AwaDB
Fix the conflict https://github.com/hwchase17/langchain/pull/7840
@rlancemartin @eyurtsev  Thanks!

---------

Co-authored-by: vincent <awadb.vincent@gmail.com>
2023-07-18 07:50:17 -07:00
German Martin
f1eaa9b626
Lost in the middle: We have been ordering documents the WRONG way. (for long context) (#7520)
Motivation, it seems that when dealing with a long context and "big"
number of relevant documents we must avoid using out of the box score
ordering from vector stores.
See: https://arxiv.org/pdf/2306.01150.pdf

So, I added an additional parameter that allows you to reorder the
retrieved documents so we can work around this performance degradation.
The relevance respect the original search score but accommodates the
lest relevant document in the middle of the context.
Extract from the paper (one image speaks 1000 tokens):

![image](https://github.com/hwchase17/langchain/assets/1821407/fafe4843-6e18-4fa6-9416-50cc1d32e811)
This seems to be common to all diff arquitectures. SO I think we need a
good generic way to implement this reordering and run some test in our
already running retrievers.
It could be that my approach is not the best one from the architecture
point of view, happy to have a discussion about that.
For me this was the best place to introduce the change and start
retesting diff implementations.

@rlancemartin, @eyurtsev

---------

Co-authored-by: Lance Martin <lance@langchain.dev>
2023-07-18 07:45:15 -07:00
Bagatur
6a32f93669
add ls link (#7847) 2023-07-18 07:39:26 -07:00
Leonid Ganeline
17956ff08e
docstrings agents (#7866)
Added/Updated docstrings for `agents`
@baskaryan
2023-07-18 02:23:24 -07:00
William FH
c6f2d27789
Docs Nits (#7874)
Add links to reference docs
2023-07-18 01:50:14 -07:00
William FH
3179ee3a56
Evals docs (#7460)
Still don't have good "how to's", and the guides / examples section
could be further pruned and improved, but this PR adds a couple examples
for each of the common evaluator interfaces.

- [x] Example docs for each implemented evaluator
- [x] "how to make a custom evalutor" notebook for each low level APIs
(comparison, string, agent)
- [x] Move docs to modules area
- [x] Link to reference docs for more information
- [X] Still need to finish the evaluation index page
- ~[ ] Don't have good data generation section~
- ~[ ] Don't have good how to section for other common scenarios / FAQs
like regression testing, testing over similar inputs to measure
sensitivity, etc.~
2023-07-18 01:00:01 -07:00
William FH
d87564951e
LS0010 (#7871)
Bump langsmith version. Has some additional UX improvements
2023-07-18 00:28:37 -07:00
William FH
e294ba475a
Some mitigations for RCE in PAL chain (#7870)
Some docstring / small nits to #6003

---------

Co-authored-by: BoazWasserman <49598618+boazwasserman@users.noreply.github.com>
Co-authored-by: HippoTerrific <49598618+HippoTerrific@users.noreply.github.com>
Co-authored-by: Or Raz <orraz1994@gmail.com>
2023-07-17 22:58:47 -07:00
Nicolas
46330da2e7
docs: Mendable: Fixes pretty sources not working (#7863)
This new version fixes the"Verified Sources" display that got broken.
Instead of displaying the full URL, it shows the title of the page the
source is from.
2023-07-17 18:23:46 -07:00
Leonid Ganeline
f5ae8f1980
docstrings tools (#7848)
Added docstrings in `tools`.

 @baskaryan
2023-07-17 17:50:19 -07:00
Leonid Ganeline
74b701f42b
docstrings retrievers (#7858)
Added/updated docstrings `retrievers`

@baskaryan
2023-07-17 17:47:17 -07:00
Jasper
5b4d53e8ef
Add text_content kwarg to BrowserlessLoader (#7856)
Added keyword argument to toggle between getting the text content of a
site versus its HTML when using the `BrowserlessLoader`
2023-07-17 17:02:19 -07:00
William FH
2aa3cf4e5f
update notebook (#7852) 2023-07-17 14:46:42 -07:00
Matt Robinson
3c489be773
feat: optional post-processing for Unstructured loaders (#7850)
### Summary

Adds a post-processing method for Unstructured loaders that allows users
to optionally modify or clean extracted elements.

### Testing

```python
from langchain.document_loaders import UnstructuredFileLoader
from unstructured.cleaners.core import clean_extra_whitespace

loader = UnstructuredFileLoader(
    "./example_data/layout-parser-paper.pdf",
    mode="elements",
    post_processors=[clean_extra_whitespace],
)

docs = loader.load()
docs[:5]
```


### Reviewrs
  - @rlancemartin
  - @eyurtsev
  - @hwchase17
2023-07-17 12:13:05 -07:00
Bagatur
2a315dbee9
fix nb (#7843) 2023-07-17 09:39:11 -07:00
Bagatur
3f1302a4ab
bump 235 (#7836) 2023-07-17 09:37:20 -07:00
Mike Lambert
9cdea4e0e1
Update to Anthropic's claude-v2 (#7793) 2023-07-17 08:55:49 -07:00
Bagatur
98c48f303a
fix (#7838) 2023-07-17 07:53:11 -07:00
Bagatur
111bd7ddbe
specify comparators (#7805) 2023-07-17 07:30:48 -07:00
Dayuan Jiang
ee40d37098
add bm25 module (#7779)
- Description: Add a BM25 Retriever that do not need Elastic search
- Dependencies: rank_bm25(if it is not installed it will be install by
using pip, just like TFIDFRetriever do)
  - Tag maintainer: @rlancemartin, @eyurtsev
  - Twitter handle: DayuanJian21687

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-17 07:30:17 -07:00
Liu Ming
fa0a9e502a
Add LLM for ChatGLM(2)-6B API (#7774)
Description:
Add LLM for ChatGLM-6B & ChatGLM2-6B API

Related Issue: 
Will the langchain support ChatGLM? #4766
Add support for selfhost models like ChatGLM or transformer models #1780

Dependencies: 
No extra library install required. 
It wraps api call to a ChatGLM(2)-6B server(start with api.py), so api
endpoint is required to run.

Tag maintainer:  @mlot 

Any comments on this PR would be appreciated.
---------

Co-authored-by: mlot <limpo2000@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-17 07:27:17 -07:00
sseide
25e3d3f283
Support Redis Sentinel database connections (#5196)
# Support Redis Sentinel database connections

This PR adds the support to connect not only to Redis standalone servers
but High Availability Replication sets too
(https://redis.io/docs/management/sentinel/)
Redis Replica Sets have on Master allowing to write data and 2+ replicas
with read-only access to the data. The additional Redis Sentinel
instances monitor all server and reconfigure the RW-Master on the fly if
it comes unavailable.

Therefore all connections must be made through the Sentinels the query
the current master for a read-write connection. This PR adds basic
support to also allow a redis connection url specifying a Sentinel as
Redis connection.

Redis documentation and Jupyter notebook with Redis examples are updated
to mention how to connect to a redis Replica Set with Sentinels

        - 

Remark - i did not found test cases for Redis server connections to add
new cases here. Therefor i tests the new utility class locally with
different kind of setups to make sure different connection urls are
working as expected. But no test case here as part of this PR.
2023-07-17 07:18:51 -07:00
Yifei Song
2e47412073
Add Xorbits agent (#7647)
- [Xorbits](https://doc.xorbits.io/en/latest/) is an open-source
computing framework that makes it easy to scale data science and machine
learning workloads in parallel. Xorbits can leverage multi cores or GPUs
to accelerate computation on a single machine, or scale out up to
thousands of machines to support processing terabytes of data.

- This PR added support for the Xorbits agent, which allows langchain to
interact with Xorbits Pandas dataframe and Xorbits Numpy array.
- Dependencies: This change requires the Xorbits library to be installed
in order to be used.
`pip install xorbits`
- Request for review: @hinthornw
- Twitter handle: https://twitter.com/Xorbitsio
2023-07-17 07:09:51 -07:00
Ankush Gola
ff3aada0b2
minor langsmith notebook fixes (#7814)
<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description: a description of the change, 
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @baskaryan
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @baskaryan
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @hinthornw
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->
2023-07-16 21:27:03 -07:00
William FH
ca79044948
Export Tracer from callbacks (#7812)
Improve discoverability
2023-07-16 20:58:13 -07:00
William FH
beb38f4f4d
Share client in evaluation callback (#7807)
Guarantee the evaluator traces go to same endpoint
2023-07-16 17:47:38 -07:00
William FH
1db13e8a85
Fix chat example output mapper (#7808)
Was only serializing when no key was provided
2023-07-16 17:47:05 -07:00
William FH
c58d35765d
Add examples to docstrings (#7796)
and:
- remove dataset name from autogenerated project name
- print out project name to view
2023-07-16 12:05:56 -07:00
William FH
ed97af423c
Accept LLM via constructor (#7794) 2023-07-16 08:46:36 -07:00
Ankush Gola
c4ece52dac
update LangSmith notebook (#7767)
<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description: a description of the change, 
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @baskaryan
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @baskaryan
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @hinthornw
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->
2023-07-15 21:05:09 -07:00
Kenny
0d058d4046
Add try except block to OpenAIWhisperParser (#7505) 2023-07-15 15:42:00 -07:00
William FH
4cb9f1eda8
Update langsmith version (#7759) 2023-07-15 12:01:41 -07:00
Lance Martin
1d06eee3b5
Fix ntbk link in docs (#7755)
Minor fix to running to
[docs](https://python.langchain.com/docs/use_cases/question_answering/local_retrieval_qa).
2023-07-15 09:11:18 -07:00
William FH
2e3d77c34e
Fix eval loader when overriding arguments (#7734)
- Update the negative criterion descriptions to prevent bad predictions
- Add support for normalizing the string distance
- Fix potential json deserializing into float issues in the example
mapper
2023-07-15 08:30:32 -07:00
Bagatur
c871c04270
bump 234 (#7754) 2023-07-15 10:49:51 -04:00
Gordon Clark
96f3dff050
MediaWiki docloader improvements + unit tests (#5879)
Starting over from #5654 because I utterly borked the poetry.lock file.

Adds new paramerters for to the MWDumpLoader class:

* skip_redirecst (bool) Tells the loader to skip articles that redirect
to other articles. False by default.
* stop_on_error (bool) Tells the parser to skip any page that causes a
parse error. True by default.
* namespaces (List[int]) Tells the parser which namespaces to parse.
Contains namespaces from -2 to 15 by default.

Default values are chosen to preserve backwards compatibility.

Sample dump XML and full unit test coverage (with extended tests that
pass!) also included!

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-15 10:49:36 -04:00
Xavier
4c8106311f
Add pip install langsmith for Quick Install part of README (#7694)
**Issue**
When I use conda to install langchain, a dependency error throwed -
"ModuleNotFoundError: No module named 'langsmith'"

**Updated**
Run `pip install langsmith` when install langchain with conda

Co-authored-by: xaver.xu <xavier.xu@batechworks.com>
2023-07-15 10:27:32 -04:00
Mohammad Mohtashim
b8b8a138df
Simple Import fix in Tools Exception Docs (#7740)
Issue: #7720
 @hinthornw
2023-07-15 10:25:34 -04:00