Commit Graph

64 Commits (e0a13e93550e06489846f68997348ed3949f9d0a)

Author SHA1 Message Date
Harrison Chase ad3c5dd186
Harrison/databerry (#2688)
Co-authored-by: Georges Petrov <georgesm.petrov@gmail.com>
1 year ago
Filip Haltmayer b286d0e63f
Adding milvus/zilliz into docs (#2686)
Adding Milvus and Zilliz to integrations.md and creating an ecosystems
doc for Zilliz.

Signed-off-by: Filip Haltmayer <filip.haltmayer@zilliz.com>
1 year ago
Dmitri Melikyan 1931d4495e
Update Graphsignal ecosystem page (#2662)
Added/updated information due to new automatic data recording feature.
1 year ago
William FH 10ff1fda8e
Add Streaming for GPT4All (#2642)
- Adds  support for callback handlers in GPT4All models
- Updates notebook and docs
1 year ago
Harrison Chase 7aba18ea77
Harrison/docs cleanup (#2633) 1 year ago
Davit Buniatyan aaac7071a3
Deep Lake retriever example analyzing Twitter the-algorithm source code (#2602)
Improvements to Deep Lake Vector Store
- much faster view loading of embeddings after filters with
`fetch_chunks=True`
- 2x faster ingestion
- use np.float32 for embeddings to save 2x storage, LZ4 compression for
text and metadata storage (saves up to 4x storage for text data)
- user defined functions as filters

Docs
- Added retriever full example for analyzing twitter the-algorithm
source code with GPT4
- Added a use case for code analysis (please let us know your thoughts
how we can improve it)

---------

Co-authored-by: Davit Buniatyan <d@activeloop.ai>
1 year ago
Alex Rad bd780a8223
Add support for rwkv (#2422)
This adds support for running RWKV with pytorch. 

https://github.com/hwchase17/langchain/issues/2398

This does not yet support  rwkv.cpp
1 year ago
qued 5b34931948
docs: update unstructured detectron install instructions (#2498)
Updated recommended `detectron2` version to install for use with
`unstructured`.

Should now match version in [Unstructured
README](https://github.com/Unstructured-IO/unstructured/blob/main/README.md#eight_pointed_black_star-quick-start).
1 year ago
Davit Buniatyan b4914888a7
Deep Lake upgrade to include attribute search, distance metrics, returning scores and MMR (#2455)
### Features include

- Metadata based embedding search
- Choice of distance metric function (`L2` for Euclidean, `L1` for
Nuclear, `max` L-infinity distance, `cos` for cosine similarity, 'dot'
for dot product. Defaults to `L2`
- Returning scores
- Max Marginal Relevance Search
- Deleting samples from the dataset

### Notes
- Added numerous tests, let me know if you would like to shorten them or
make smarter

---------

Co-authored-by: Davit Buniatyan <d@activeloop.ai>
1 year ago
felix-wang b6a101d121
fix: add jina jupyter notebook (#2477)
As the title, add the missing link to the example notebook.
1 year ago
Harrison Chase 8a4709582f cr 1 year ago
Harrison Chase c7b083ab56
bump version to 131 (#2391) 1 year ago
Harrison Chase 0a9f04bad9
Harrison/gpt4all (#2366)
Co-authored-by: William FH <13333726+hinthornw@users.noreply.github.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
1 year ago
Shrined 10dab053b4
Add Enum for agent types (#2321)
This pull request adds an enum class for the various types of agents
used in the project, located in the `agent_types.py` file. Currently,
the project is using hardcoded strings for the initialization of these
agents, which can lead to errors and make the code harder to maintain.
With the introduction of the new enums, the code will be more readable
and less error-prone.

The new enum members include:

- ZERO_SHOT_REACT_DESCRIPTION
- REACT_DOCSTORE
- SELF_ASK_WITH_SEARCH
- CONVERSATIONAL_REACT_DESCRIPTION
- CHAT_ZERO_SHOT_REACT_DESCRIPTION
- CHAT_CONVERSATIONAL_REACT_DESCRIPTION

In this PR, I have also replaced the hardcoded strings with the
appropriate enum members throughout the codebase, ensuring a smooth
transition to the new approach.
1 year ago
Yunlei Liu 9cceb4a02a
Llama.cpp doc update: fix ipynb path (#2364) 1 year ago
Harrison Chase d85f57ef9c
Harrison/llama (#2314)
Co-authored-by: RJ Adriaansen <adriaansen@eshcc.eur.nl>
1 year ago
Harrison Chase 2eeaccf01c
Harrison/apify (#2215)
Co-authored-by: Jiří Moravčík <jiri.moravcik@gmail.com>
1 year ago
Matt Robinson 3dfe1cf60e
feat: document loader for epublications (#2202)
### Summary

Adds a new document loader for processing e-publications. Works with
`unstructured>=0.5.4`. You need to have
[`pandoc`](https://pandoc.org/installing.html) installed for this loader
to work.

### Testing

```python
from langchain.document_loaders import UnstructuredEPubLoader

loader = UnstructuredEPubLoader("winter-sports.epub", mode="elements")
data = loader.load()
data[0]
```
1 year ago
Harrison Chase 33a001933a
Harrison/clear ml (#2179)
Co-authored-by: Victor Sonck <victor.sonck@gmail.com>
1 year ago
Harrison Chase fe804d2a01
Harrison/aim integration (#2178)
Co-authored-by: Hovhannes Tamoyan <hovhannes.tamoyan@gmail.com>
Co-authored-by: Gor Arakelyan <arakelyangor10@gmail.com>
1 year ago
blob42 031e32f331
searx: implement async + helper tool providing json results (#2129)
- implemented `arun` and `aresults`. Reuses aiosession if available.
- helper tools `SearxSearchRun` and `SearxSearchResults`
- update doc

Co-authored-by: blob42 <spike@w530>
1 year ago
Charlie Holtz f16c1fb6df
Add replicate take 2 (#2077)
This PR adds a replicate integration to langchain. 

It's an updated version of
https://github.com/hwchase17/langchain/pull/1993, but with updates to
match latest replicate-python code.
https://github.com/replicate/replicate-python.

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Zeke Sikelianos <zeke@sikelianos.com>
1 year ago
Harrison Chase eff5eed719
Harrison/jina (#2043)
Co-authored-by: numb3r3 <wangfelix87@gmail.com>
Co-authored-by: felix-wang <35718120+numb3r3@users.noreply.github.com>
1 year ago
Ace Eldeib 4be2f9d75a
fix: numerous broken documentation links (#2070)
seems linkchecker isn't catching them because it runs on generated html.
at that point the links are already missing.
the generation process seems to strip invalid references when they can't
be re-written from md to html.

I used https://github.com/tcort/markdown-link-check to check the doc
source directly.

There are a few false positives on localhost for development.
1 year ago
Harrison Chase 705431aecc
big docs refactor (#1978)
Co-authored-by: Ankush Gola <ankush.gola@gmail.com>
2 years ago
Enwei Jiao 4f364db9a9
Add milvus for ecosystem (#1951) 2 years ago
Harrison Chase a581bce379
remove key (#1863) 2 years ago
Simon Zhou 3674074eb0
Add Qdrant to ecosystem page (#1830)
Add [Qdrant](https://qdrant.tech/) to [LangChain
ecosystem](https://langchain.readthedocs.io/en/latest/ecosystem.html)
page.
2 years ago
Harrison Chase 76c7b1f677
Harrison/wandb (#1764)
Co-authored-by: Anish Shah <93145909+ash0ts@users.noreply.github.com>
2 years ago
libra 8a95fdaee1
Fix all the bug in init Tool in docs (#1725)
Fix all the example in the docs when init `Tool`

Test by render with jupyter
2 years ago
Jonathan Pedoeem 606605925d
Adding ability to `return_pl_id` to all PromptLayer Models in LangChain (#1699)
PromptLayer now has support for [several different tracking
features.](https://magniv.notion.site/Track-4deee1b1f7a34c1680d085f82567dab9)
In order to use any of these features you need to have a request id
associated with the request.

In this PR we add a boolean argument called `return_pl_id` which will
add `pl_request_id` to the `generation_info` dictionary associated with
a generation.

We also updated the relevant documentation.
2 years ago
Harrison Chase 0b29e68c17
Harrison/pgvector (#1679)
Co-authored-by: Aman Kumar <krsingh.aman@gmail.com>
2 years ago
Matt Robinson 63aa28e2a6
feat: allow the unstructured kwargs to be passed in to Unstructured document loaders (#1667)
### Summary

Allows users to pass in `**unstructured_kwargs` to Unstructured document
loaders. Implemented with the `strategy` kwargs in mind, but will pass
in other kwargs like `include_page_breaks` as well. The two currently
supported strategies are `"hi_res"`, which is more accurate but takes
longer, and `"fast"`, which processes faster but with lower accuracy.
The `"hi_res"` strategy is the default. For PDFs, if `detectron2` is not
available and the user selects `"hi_res"`, the loader will fallback to
using the `"fast"` strategy.


### Testing

#### Make sure the `strategy` kwarg works

Run the following in iPython to verify that the `"fast"` strategy is
indeed faster.

```python
from langchain.document_loaders import UnstructuredFileLoader

loader = UnstructuredFileLoader("layout-parser-paper-fast.pdf", strategy="fast", mode="elements")
%timeit loader.load()

loader = UnstructuredFileLoader("layout-parser-paper-fast.pdf", mode="elements")
%timeit loader.load()
```

On my system I get:

```python
In [3]: from langchain.document_loaders import UnstructuredFileLoader

In [4]: loader = UnstructuredFileLoader("layout-parser-paper-fast.pdf", strategy="fast", mode="elements")

In [5]: %timeit loader.load()
247 ms ± 369 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [6]: loader = UnstructuredFileLoader("layout-parser-paper-fast.pdf", mode="elements")

In [7]: %timeit loader.load()
2.45 s ± 31 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
```

#### Make sure older versions of `unstructured` still work

Run `pip install unstructured==0.5.3` and then verify the following runs
without error:

```python
from langchain.document_loaders import UnstructuredFileLoader

loader = UnstructuredFileLoader("layout-parser-paper-fast.pdf",  mode="elements")
loader.load()
```
2 years ago
Andriy Mulyar c9189d354a
AtlasDB vector store documentation updates. (#1572)
- Updated errors in the AtlasDB vector store documentation
- Removed extraneous output logs in example notebook.
2 years ago
Harrison Chase 3ee32a01ea
Harrison/prompt layer (#1547)
Co-authored-by: Jonathan Pedoeem <jonathanped@gmail.com>
Co-authored-by: AbuBakar <abubakarsohail123@gmail.com>
2 years ago
gidler 494c9d341a
[DOCS] Assorted wording, punctuation, and consistency revisions (#1443)
Contributing some small fixes I noticed while reading through the
documentation.

Thank you for a creating and maintaining this project!
2 years ago
Tom Dyson e3354404ad
Fix link to Pinecone notebook (#1492) 2 years ago
blob42 3d54b05863
searx: add install instructions, update doc and notebooks (#1420)
- Added instructions on setting up self hosted searx
- Add notebook example with agent
- Use `localhost:8888` as example url to stay consistent since public
instances are not really usable.

Co-authored-by: blob42 <spike@w530>
2 years ago
Harrison Chase 166cda2cc6
Harrison/deeplake (#1316)
Co-authored-by: Davit Buniatyan <d@activeloop.ai>
2 years ago
Harrison Chase aaad6cc954
Harrison/atlas db (#1315)
Co-authored-by: Brandon Duderstadt <brandonduderstadt@gmail.com>
2 years ago
Harrison Chase 81abcae91a
Harrison/banana fix (#1311)
Co-authored-by: Erik Dunteman <44653944+erik-dunteman@users.noreply.github.com>
2 years ago
Enrico Shippole 9becdeaadf
Add Writer, Banana, Modal, StochasticAI (#1270)
Add LLM wrappers and examples for Banana, Writer, Modal, Stochastic AI

Added rigid json format for Banana and Modal
2 years ago
Matt Robinson 10e73a3723
docs: remove nltk download steps (#1253)
### Summary

Updates the docs to remove the `nltk` download steps from
`unstructured`. As of `unstructured` `0.4.14`, this is handled
automatically in the relevant modules within `unstructured`.
2 years ago
Justin Torre 5bc6dc076e
added caching and properties docs (#1255) 2 years ago
Iskren Ivov Chernev 8e3cd3e0dd
Add DeepInfra LLM support (#1232)
DeepInfra is an Inference-as-a-Service provider. Add a simple wrapper
using HTTPS requests.
2 years ago
Dmitri Melikyan b7765a95a0
docs: add Graphsignal ecosystem page (#1228)
Adds a Graphsignal ecosystem page
2 years ago
Ikko Eltociear Ashimine 334b553260
Update petals.md (#1225)
Huggingface -> Hugging Face
2 years ago
Matt Robinson 3d5f56a8a1
docs: add quotes to `unstructured[local-inference]` install instructions (#1208)
### Summary

Corrects the install instruction for local inference to `pip install
"unstructured[local-inference]"`
2 years ago
Harrison Chase d90a287d8f
Harrison/updating docs (#1196) 2 years ago
Naveen Tatikonda 0118706fd6
Add Support for OpenSearch Vector database (#1191)
### Description
This PR adds a wrapper which adds support for the OpenSearch vector
database. Using opensearch-py client we are ingesting the embeddings of
given text into opensearch cluster using Bulk API. We can perform the
`similarity_search` on the index using the 3 popular searching methods
of OpenSearch k-NN plugin:

- `Approximate k-NN Search` use approximate nearest neighbor (ANN)
algorithms from the [nmslib](https://github.com/nmslib/nmslib),
[faiss](https://github.com/facebookresearch/faiss), and
[Lucene](https://lucene.apache.org/) libraries to power k-NN search.
- `Script Scoring` extends OpenSearch’s script scoring functionality to
execute a brute force, exact k-NN search.
- `Painless Scripting` adds the distance functions as painless
extensions that can be used in more complex combinations. Also, supports
brute force, exact k-NN search like Script Scoring.

### Issues Resolved 
https://github.com/hwchase17/langchain/issues/1054

---------

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
2 years ago