langchain/tests/unit_tests
Janos Tolgyesi 5f4552391f
Add SKLearnVectorStore (#5305)
# Add SKLearnVectorStore

This PR adds SKLearnVectorStore, a simply vector store based on
NearestNeighbors implementations in the scikit-learn package. This
provides a simple drop-in vector store implementation with minimal
dependencies (scikit-learn is typically installed in a data scientist /
ml engineer environment). The vector store can be persisted and loaded
from json, bson and parquet format.

SKLearnVectorStore has soft (dynamic) dependency on the scikit-learn,
numpy and pandas packages. Persisting to bson requires the bson package,
persisting to parquet requires the pyarrow package.

## Before submitting

Integration tests are provided under
`tests/integration_tests/vectorstores/test_sklearn.py`

Sample usage notebook is provided under
`docs/modules/indexes/vectorstores/examples/sklear.ipynb`

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-28 08:17:42 -07:00
..
agents Improving Resilience of MRKL Agent (#5014) 2023-05-22 11:08:08 -07:00
callbacks fixing total cost finetuned model giving zero (#5144) 2023-05-24 10:04:08 -07:00
chains Callbacks Refactor [base] (#3256) 2023-04-30 11:14:09 -07:00
chat_models Add ChatModel, LLM, and Embeddings for Google's PaLM APIs (#3575) 2023-05-01 15:23:16 -07:00
client Separate Runner Functions from Client (#5079) 2023-05-22 05:28:47 +00:00
data Prompt from file proof of concept using plain text (#127) 2022-11-13 13:15:30 -08:00
docstore Add DocstoreFn - lookup doc via arbitrary function (#3760) 2023-04-28 19:50:32 -07:00
document_loaders Bibtex integration for document loader and retriever (#5137) 2023-05-25 00:21:31 -07:00
evaluation Adding an in-context QA evaluation chain + chain of thought reasoning chain for improved accuracy (#2444) 2023-04-06 22:32:41 -07:00
examples feat #4479: TextLoader auto detect encoding and improved exceptions (#4927) 2023-05-18 09:55:14 -04:00
llms Add Invocation Params (#4509) 2023-05-11 15:34:06 -07:00
memory Zep sdk version (#5267) 2023-05-25 13:42:10 -07:00
output_parsers add enum output parser (#5165) 2023-05-27 20:58:23 -07:00
prompts fix prompt saving (#4987) 2023-05-20 08:21:52 -07:00
retrievers tfidf retriever (#5114) 2023-05-24 10:02:09 -07:00
tools Add AzureCognitiveServicesToolkit to call Azure Cognitive Services API (#5012) 2023-05-23 06:45:48 -07:00
utilities Fix graphql tool (#4984) 2023-05-19 15:27:50 -07:00
vectorstores Add SKLearnVectorStore (#5305) 2023-05-28 08:17:42 -07:00
__init__.py initial commit 2022-10-24 14:51:15 -07:00
conftest.py Add pytest --only-extended and --only-core options (#4494) 2023-05-12 11:35:22 -04:00
test_bash.py Add Mastodon toots loader (#5036) 2023-05-22 16:43:07 -07:00
test_depedencies.py Catch changes to test group (#4802) 2023-05-16 14:48:56 -04:00
test_document_transformers.py Contextual compression retriever (#2915) 2023-04-20 17:01:14 -07:00
test_formatting.py initial commit 2022-10-24 14:51:15 -07:00
test_math_utils.py add get_top_k_cosine_similarity method to get max top k score and index (#5059) 2023-05-22 11:55:48 -07:00
test_pytest_config.py Block sockets for unit-tests (#4803) 2023-05-16 14:41:24 -04:00
test_python.py option for csv agent to not include df in prompt (#4610) 2023-05-12 21:55:22 -07:00
test_schema.py [simple][test] Added test case for schema.py (#3692) 2023-04-28 20:42:24 -07:00
test_sql_database_schema.py Suppress duckdb warning in unit tests explicitly (#3653) 2023-04-27 14:29:41 -04:00
test_sql_database.py sql: do not hard code the LIMIT clause in the table_info section (#1563) 2023-03-13 23:08:27 -07:00
test_text_splitter.py Improve effeciency of TextSplitter.split_documents, iterate once (#5111) 2023-05-22 23:00:24 -04:00