Commit Graph

119 Commits

Author SHA1 Message Date
sergerdn
6dc86ad48f
feat: add pytest-vcr for recording HTTP interactions in integration tests (#2445)
Using `pytest-vcr` in integration tests has several benefits. Firstly,
it removes the need to mock external services, as VCR records and
replays HTTP interactions on the fly. Secondly, it simplifies the
integration test setup by eliminating the need to set up and tear down
external services in some cases. Finally, it allows for more reliable
and deterministic integration tests by ensuring that HTTP interactions
are always replayed with the same response.
Overall, `pytest-vcr` is a valuable tool for simplifying integration
test setup and improving their reliability

This commit adds the `pytest-vcr` package as a dependency for
integration tests in the `pyproject.toml` file. It also introduces two
new fixtures in `tests/integration_tests/conftest.py` files for managing
cassette directories and VCR configurations.

In addition, the
`tests/integration_tests/vectorstores/test_elasticsearch.py` file has
been updated to use the `@pytest.mark.vcr` decorator for recording and
replaying HTTP interactions.

Finally, this commit removes the `documents` fixture from the
`test_elasticsearch.py` file and replaces it with a new fixture defined
in `tests/integration_tests/vectorstores/conftest.py` that yields a list
of documents to use in any other tests.

This also includes my second attempt to fix issue :
https://github.com/hwchase17/langchain/issues/2386

Maybe related https://github.com/hwchase17/langchain/issues/2484
2023-04-07 07:28:57 -07:00
Alex Rad
bd780a8223
Add support for rwkv (#2422)
This adds support for running RWKV with pytorch. 

https://github.com/hwchase17/langchain/issues/2398

This does not yet support  rwkv.cpp
2023-04-06 14:41:06 -07:00
Davit Buniatyan
b4914888a7
Deep Lake upgrade to include attribute search, distance metrics, returning scores and MMR (#2455)
### Features include

- Metadata based embedding search
- Choice of distance metric function (`L2` for Euclidean, `L1` for
Nuclear, `max` L-infinity distance, `cos` for cosine similarity, 'dot'
for dot product. Defaults to `L2`
- Returning scores
- Max Marginal Relevance Search
- Deleting samples from the dataset

### Notes
- Added numerous tests, let me know if you would like to shorten them or
make smarter

---------

Co-authored-by: Davit Buniatyan <d@activeloop.ai>
2023-04-06 12:47:33 -07:00
leo-gan
fd69cc7e42
Removed duplicate BaseModel dependencies (#2471)
Removed duplicate BaseModel dependencies in class inheritances.
Also, sorted imports by `isort`.
2023-04-06 12:45:16 -07:00
sergerdn
b410dc76aa
fix: elasticsearch (#2402)
- Create a new docker-compose file to start an Elasticsearch instance
for integration tests.
- Add new tests to `test_elasticsearch.py` to verify Elasticsearch
functionality.
- Include an optional group `test_integration` in the `pyproject.toml`
file. This group should contain dependencies for integration tests and
can be installed using the command `poetry install --with
test_integration`. Any new dependencies should be added by running
`poetry add some_new_deps --group "test_integration" `

Note:
New tests running in live mode, which involve end-to-end testing of the
OpenAI API. In the future, adding `pytest-vcr` to record and replay all
API requests would be a nice feature for testing process.More info:
https://pytest-vcr.readthedocs.io/en/latest/

Fixes https://github.com/hwchase17/langchain/issues/2386
2023-04-05 06:51:32 -07:00
Harrison Chase
0a9f04bad9
Harrison/gpt4all (#2366)
Co-authored-by: William FH <13333726+hinthornw@users.noreply.github.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2023-04-04 06:49:17 -07:00
Harrison Chase
e90d007db3
Harrison/msg files (#2375)
Co-authored-by: Sahil Masand <masand.sahil@gmail.com>
Co-authored-by: Sahil Masand <masands@cbh.com.au>
2023-04-04 06:48:34 -07:00
Kacper Łukawski
585f60a5aa
Qdrant update to 1.1.1 & docs polishing (#2388)
This PR updates Qdrant to 1.1.1 and introduces local mode, so there is
no need to spin up the Qdrant server. By that occasion, the Qdrant
example notebooks also got updated, covering more cases and answering
some commonly asked questions. All the Qdrant's integration tests were
switched to local mode, so no Docker container is required to launch
them.
2023-04-04 06:48:21 -07:00
Harrison Chase
d85f57ef9c
Harrison/llama (#2314)
Co-authored-by: RJ Adriaansen <adriaansen@eshcc.eur.nl>
2023-04-02 14:57:45 -07:00
akmhmgc
94b2f536f3
Modify output for wikipedia api wrapper (#2287)
## Description
Thanks for the quick maintenance for great repository!!
I modified wikipedia api wrapper

## Details
- Add output for missing search results
- Add tests
2023-04-02 14:00:27 -07:00
Sam Cordner-Matthews
1ddd6dbf0b
Add ability to pass kwargs to loader classes in DirectoryLoader, add ability to modify encoding and BeautifulSoup behaviour in BSHTMLLoader (#2275)
Solves #2247. Noted that the only test I added checks for the
BeautifulSoup behaviour change. Happy to add a test for
`DirectoryLoader` if deemed necessary.
2023-04-01 12:48:27 -07:00
Harrison Chase
b35260ed47
Harrison/memory base (#2122)
@3coins + @zoltan-fedor.... heres the pr + some minor changes i made.
thoguhts? can try to get it into tmrws release

---------

Co-authored-by: Zoltan Fedor <zoltan.0.fedor@gmail.com>
Co-authored-by: Piyush Jain <piyushjain@duck.com>
2023-03-29 10:10:09 -07:00
Ankush Gola
ccee1aedd2
add async support for anthropic (#2114)
should not be merged in before
https://github.com/anthropics/anthropic-sdk-python/pull/11 gets released
2023-03-28 22:49:14 -04:00
Harrison Chase
3e879b47c1
Harrison/gitbook (#2044)
Co-authored-by: Irene López <45119610+ireneisdoomed@users.noreply.github.com>
2023-03-28 15:28:33 -07:00
Honkware
aff33d52c5
Add OpenWeatherMap API Tool (#2083)
Added tool for OpenWeatherMap API
2023-03-28 12:02:14 -07:00
Charlie Holtz
f16c1fb6df
Add replicate take 2 (#2077)
This PR adds a replicate integration to langchain. 

It's an updated version of
https://github.com/hwchase17/langchain/pull/1993, but with updates to
match latest replicate-python code.
https://github.com/replicate/replicate-python.

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Zeke Sikelianos <zeke@sikelianos.com>
2023-03-28 11:56:57 -07:00
Harrison Chase
f281033362
rm pandas dependency (#2102) 2023-03-28 08:38:19 -07:00
Harrison Chase
410bf37fb8
Harrison/big query (#2100)
Co-authored-by: lu-cashmoney <lucas.corley@gmail.com>
2023-03-28 08:17:22 -07:00
Harrison Chase
eff5eed719
Harrison/jina (#2043)
Co-authored-by: numb3r3 <wangfelix87@gmail.com>
Co-authored-by: felix-wang <35718120+numb3r3@users.noreply.github.com>
2023-03-28 08:16:17 -07:00
Harrison Chase
f74a1bebf5
Harrison/duckdb (#2064)
Co-authored-by: Trent Hauck <trent@trenthauck.com>
2023-03-27 19:51:34 -07:00
Ankush Gola
b7ebb8fe30
enable streaming in anthropic llm wrapper (#2065) 2023-03-27 20:25:00 -04:00
Harrison Chase
a0cd6672aa
Harrison/site map (#2061)
Co-authored-by: Tim Asp <707699+timothyasp@users.noreply.github.com>
2023-03-27 16:28:08 -07:00
Mario Kostelac
e7d6de6b1c
(ChatOpenAI) Add model_name to LLMResult.llm_output (#1960)
This makes sure OpenAI and ChatOpenAI have the same llm_output, and
allow tracking usage per model. Same work for OpenAI was done in
https://github.com/hwchase17/langchain/pull/1713.
2023-03-24 08:51:16 -07:00
Harrison Chase
6e1b5b8f7e
Harrison/figma doc loader (#1908)
Co-authored-by: Ismail Pelaseyed <homanp@gmail.com>
2023-03-22 19:57:46 -07:00
Eli
12f868b292
Propagate "filter" arg in Chroma similarity_search (#1869)
Technically a duplicate fix to #1619 but with unit tests and a small
documentation update
- Propagate `filter` arg in Chroma `similarity_search` to delegated call
to `similarity_search_with_score`
- Add `filter` arg to `similarity_search_by_vector`
- Clarify doc strings on FakeEmbeddings
2023-03-22 19:40:10 -07:00
Maurício Maia
f155d9d3ec
Add metadata filter to PGVector search (#1872)
Add ability to filter pgvector documents by metadata.
2023-03-22 15:21:40 -07:00
Maurício Maia
2212520a6c
Add PGVector collection metadata (#1887)
The `CollectionStore` for `PGVector` has a `cmetadata` field but it's
never used. This PR add the ability to save metadata information to the
collection.
2023-03-22 11:27:07 -07:00
LeoGrin
3701b2901e
use namespace argument in Pinecone constructor (#1757)
Fix #1756

Use the `namespace` argument of `Pinecone.from_exisiting_index` to set
the default value of `namespace` for other methods. Leads to more
expected behavior and easier integration in chains.

For the test, I've added a line to delete and rebuild the
`langchain-demo` index at the beginning of the test. I'm not 100% sure
if it's a good idea but it makes the test reproducible.
2023-03-18 19:55:38 -07:00
Mario Kostelac
aff44d0a98
(OpenAI) Add model_name to LLMResult.llm_output (#1713)
Given that different models have very different latencies and pricings,
it's benefitial to pass the information about the model that generated
the response. Such information allows implementing custom callback
managers and track usage and price per model.

Addresses https://github.com/hwchase17/langchain/issues/1557.
2023-03-16 21:55:55 -07:00
Daniel Chalef
b157e0c1c3
Add HTML document_loader that includes page title metadata (#1720)
This `BSHTMLLoader` document_loader loads an HTML document, extracts
text and adds the page title to the returned Document's metadata. The
loader uses the already installed bs4 package to extract both text
content and the page title.

Included in this PR is an example HTML file and an integration test that
tests against this file.

---------

Co-authored-by: Daniel Chalef <daniel.chalef@private.org>
2023-03-16 21:47:17 -07:00
Kacper Łukawski
4a327dd1d6
Implement basic metadata filtering in Qdrant (#1689)
This PR implements a basic metadata filtering mechanism similar to the
ones in Chroma and Pinecone. It still cannot express complex conditions,
as there are no operators, but some users requested to have that feature
available.
2023-03-15 07:31:39 -07:00
Harrison Chase
0b29e68c17
Harrison/pgvector (#1679)
Co-authored-by: Aman Kumar <krsingh.aman@gmail.com>
2023-03-14 21:13:58 -07:00
Xin Qiu
4e13cef05a
feat: add redisearch vectorstore (#1307)
# Description

Add `RediSearch` vectorstore for LangChain

RediSearch: [RediSearch quick
start](https://redis.io/docs/stack/search/quick_start/)

# How to use

```
from langchain.vectorstores.redisearch import RediSearch

rds = RediSearch.from_documents(docs, embeddings,redisearch_url="redis://localhost:6379")
```
2023-03-14 18:06:03 -07:00
Tim Asp
b3234bf3b0
cleanup: unify 3 different pdf loaders, rename PagedPDFSplitter (#1615)
`OnlinePDFLoader` and `PagedPDFSplitter` lived separate from the rest of
the pdf loaders.

Because they're all similar, I propose moving all to `pdy.py` and the
same docs/examples page.

Additionally, `PagedPDFSplitter` naming doesn't match the pattern the
rest of the loaders follow, so I renamed to `PyPDFLoader` and had it
inherit from `BasePDFLoader` so it can now load from remote file
sources.
2023-03-13 23:06:50 -07:00
Harrison Chase
cb04ba0136
Add support for intermediate steps to SQLDatabaseSequentialChain (#1583) (#1601)
for https://github.com/hwchase17/langchain/issues/1582

I simply added the `return_intermediate_steps` and changed the
`output_keys` function.

I added 2 simple tests, 1 for SQLDatabaseSequentialChain without the
intermediate steps and 1 with

Co-authored-by: brad-nemetski <115185478+brad-nemetski@users.noreply.github.com>
2023-03-11 15:44:41 -08:00
Harrison Chase
3ee32a01ea
Harrison/prompt layer (#1547)
Co-authored-by: Jonathan Pedoeem <jonathanped@gmail.com>
Co-authored-by: AbuBakar <abubakarsohail123@gmail.com>
2023-03-08 21:24:27 -08:00
Harrison Chase
c844d1fd46
Harrison/chunk size (#1549)
Co-authored-by: Florian Leuerer <31259070+floleuerer@users.noreply.github.com>
2023-03-08 21:24:18 -08:00
Harrison Chase
357d808484
Harrison/remote paths pdf (#1544)
Co-authored-by: Tim Asp <707699+timothyasp@users.noreply.github.com>
2023-03-08 20:53:37 -08:00
Ankush Gola
27104d4921
fix ChatOpenAI.agenerate (#1504) 2023-03-07 15:22:05 -08:00
Harrison Chase
7bec461782
Harrison/memory refactor (#1478)
moves memory to own module, factors out common stuff
2023-03-07 07:59:37 -08:00
Harrison Chase
0e21463f07
(rfc) chat models (#1424)
Co-authored-by: Ankush Gola <ankush.gola@gmail.com>
2023-03-06 08:34:24 -08:00
Harrison Chase
a1b9dfc099
Harrison/similarity search chroma (#1434)
Co-authored-by: shibuiwilliam <shibuiyusuke@gmail.com>
2023-03-04 08:10:15 -08:00
Tim Asp
23231d65a9
Add PyMuPDF PDF loader (#1426)
Different PDF libraries have different strengths and weaknesses. PyMuPDF
does a good job at extracting the most amount of content from the doc,
regardless of the source quality, extremely fast (especially compared to
Unstructured).

https://pymupdf.readthedocs.io/en/latest/index.html
2023-03-03 20:59:28 -08:00
Nuno Campos
499e76b199
Allow the regular openai class to be used for ChatGPT models (#1393)
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-03-02 09:04:18 -08:00
Kacper Łukawski
9ac442624c
Add Qdrant named arguments (#1386)
This PR:
- Increases `qdrant-client` version to 1.0.4
- Introduces custom content and metadata keys (as requested in #1087)
- Moves all the `QdrantClient` parameters into the method parameters to
simplify code completion
2023-03-02 07:05:14 -08:00
Ankush Gola
fe30be6fba
add async and streaming support to OpenAIChat (#1378)
title says it all
2023-03-01 21:55:43 -08:00
Tim Asp
72ef69d1ba
Add new iFixit document loader (#1333)
iFixit is a wikipedia-like site that has a huge amount of open content
on how to fix things, questions/answers for common troubleshooting and
"things" related content that is more technical in nature. All content
is licensed under CC-BY-SA-NC 3.0

Adding docs from iFixit as context for user questions like "I dropped my
phone in water, what do I do?" or "My macbook pro is making a whining
noise, what's wrong with it?" can yield significantly better responses
than context free response from LLMs.
2023-02-27 20:40:20 -08:00
Harrison Chase
166cda2cc6
Harrison/deeplake (#1316)
Co-authored-by: Davit Buniatyan <d@activeloop.ai>
2023-02-26 22:35:04 -08:00
Harrison Chase
aaad6cc954
Harrison/atlas db (#1315)
Co-authored-by: Brandon Duderstadt <brandonduderstadt@gmail.com>
2023-02-26 22:11:38 -08:00
Enrico Shippole
9becdeaadf
Add Writer, Banana, Modal, StochasticAI (#1270)
Add LLM wrappers and examples for Banana, Writer, Modal, Stochastic AI

Added rigid json format for Banana and Modal
2023-02-24 06:58:58 -08:00