Commit Graph

3935 Commits (2b663089b5f6f16890c134d14981db7a0eb446ba)
 

Author SHA1 Message Date
Naveen Tatikonda 0118706fd6
Add Support for OpenSearch Vector database (#1191)
### Description
This PR adds a wrapper which adds support for the OpenSearch vector
database. Using opensearch-py client we are ingesting the embeddings of
given text into opensearch cluster using Bulk API. We can perform the
`similarity_search` on the index using the 3 popular searching methods
of OpenSearch k-NN plugin:

- `Approximate k-NN Search` use approximate nearest neighbor (ANN)
algorithms from the [nmslib](https://github.com/nmslib/nmslib),
[faiss](https://github.com/facebookresearch/faiss), and
[Lucene](https://lucene.apache.org/) libraries to power k-NN search.
- `Script Scoring` extends OpenSearch’s script scoring functionality to
execute a brute force, exact k-NN search.
- `Painless Scripting` adds the distance functions as painless
extensions that can be used in more complex combinations. Also, supports
brute force, exact k-NN search like Script Scoring.

### Issues Resolved 
https://github.com/hwchase17/langchain/issues/1054

---------

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
2 years ago
Andrew White c5015d77e2
Allow k to be higher than doc size in max_marginal_relevance_search (#1187)
Fixes issue #1186. For some reason, #1117 didn't seem to fix it.
2 years ago
Zach Schillaci 159c560c95
Refactor some loops into list comprehensions (#1185) 2 years ago
Harrison Chase 926c121b98
Harrison/text splitter docs (#1188) 2 years ago
Harrison Chase 91446a5e9b
clean up text splitting docs (#1184) 2 years ago
Harrison Chase a5a14405ad
bump version to 0091 (#1181) 2 years ago
Harrison Chase 5a954efdd7
update gallery with slack bot (#1177) 2 years ago
Harrison Chase 4766b20223
clean up loaders (#1178) 2 years ago
blob42 9962bda70b
searx_search: docs updates (#1175)
- fix notebook formatting, remove empty cells and add scrolling for long
text

---------

Co-authored-by: blob42 <spike@w530>
2 years ago
Harrison Chase 4f3fbd7267
improve docs for indexes (#1146) 2 years ago
Harrison Chase 28781a6213
Harrison/markdown splitter (#1169)
Co-authored-by: Michael Chen <flamingdescent@gmail.com>
Co-authored-by: Michael Chen <michaelchen@stripe.com>
2 years ago
Harrison Chase 37dd34bea5
fix path (#1168) 2 years ago
Nan Wang e8f224fd3a
docs: add missing links to toc (#1163)
add missing links to toc

---------

Signed-off-by: Nan Wang <nan.wang@jina.ai>
2 years ago
Nick afe884fb96
AI21 documentation incorrectly titled Cohere (#1167) 2 years ago
Ji ed37fbaeff
for ChatVectorDBChain, add top_k_docs_for_context to allow control how many chunks of context will be retrieved (#1155)
given that we allow user define chunk size, think it would be useful for
user to define how many chunks of context will be retrieved.
2 years ago
Harrison Chase 955c89fccb
pass in prompts to vectordbqa (#1158) 2 years ago
Harrison Chase 65cc81c479
directory loader improvements (#1162) 2 years ago
Harrison Chase 05a05bcb04
bump version to 0.0.90 (#1157) 2 years ago
Harrison Chase 9d6d8f85da
Harrison/self hosted runhouse (#1154)
Co-authored-by: Donny Greenberg <dongreenberg2@gmail.com>
Co-authored-by: John Dagdelen <jdagdelen@users.noreply.github.com>
Co-authored-by: Harrison Chase <harrisonchase@Harrisons-MBP.attlocal.net>
Co-authored-by: Andrew White <white.d.andrew@gmail.com>
Co-authored-by: Peng Qu <82029664+pengqu123@users.noreply.github.com>
Co-authored-by: Matt Robinson <mthw.wm.robinson@gmail.com>
Co-authored-by: jeff <tangj1122@gmail.com>
Co-authored-by: Harrison Chase <harrisonchase@Harrisons-MacBook-Pro.local>
Co-authored-by: zanderchase <zander@unfold.ag>
Co-authored-by: Charles Frye <cfrye59@gmail.com>
Co-authored-by: zanderchase <zanderchase@gmail.com>
Co-authored-by: Shahriar Tajbakhsh <sh.tajbakhsh@gmail.com>
Co-authored-by: Stefan Keselj <skeselj@princeton.edu>
Co-authored-by: Francisco Ingham <fpingham@gmail.com>
Co-authored-by: Dhruv Anand <105786647+dhruv-anand-aintech@users.noreply.github.com>
Co-authored-by: cragwolfe <cragcw@gmail.com>
Co-authored-by: Anton Troynikov <atroyn@users.noreply.github.com>
Co-authored-by: William FH <13333726+hinthornw@users.noreply.github.com>
Co-authored-by: Oliver Klingefjord <oliver@klingefjord.com>
Co-authored-by: blob42 <contact@blob42.xyz>
Co-authored-by: blob42 <spike@w530>
Co-authored-by: Enrico Shippole <henryshippole@gmail.com>
Co-authored-by: Ibis Prevedello <ibiscp@gmail.com>
Co-authored-by: jped <jonathanped@gmail.com>
Co-authored-by: Justin Torre <justintorre75@gmail.com>
Co-authored-by: Ivan Vendrov <ivan@anthropic.com>
Co-authored-by: Sasmitha Manathunga <70096033+mmz-001@users.noreply.github.com>
Co-authored-by: Ankush Gola <9536492+agola11@users.noreply.github.com>
Co-authored-by: Matt Robinson <mrobinson@unstructuredai.io>
Co-authored-by: Jeff Huber <jeffchuber@gmail.com>
Co-authored-by: Akshay <64036106+akshayvkt@users.noreply.github.com>
Co-authored-by: Andrew Huang <jhuang16888@gmail.com>
Co-authored-by: rogerserper <124558887+rogerserper@users.noreply.github.com>
Co-authored-by: seanaedmiston <seane999@gmail.com>
Co-authored-by: Hasegawa Yuya <52068175+Hase-U@users.noreply.github.com>
Co-authored-by: Ivan Vendrov <ivendrov@gmail.com>
Co-authored-by: Chen Wu (吴尘) <henrychenwu@cmu.edu>
Co-authored-by: Dennis Antela Martinez <dennis.antela@gmail.com>
Co-authored-by: Maxime Vidal <max.vidal@hotmail.fr>
Co-authored-by: Rishabh Raizada <110235735+rishabh-ti@users.noreply.github.com>
2 years ago
CG80499 af8f5c1a49
Added constitutional chain. (#1147)
- Added self-critique constitutional chain based on this
[paper](https://www.anthropic.com/constitutional.pdf).
2 years ago
Harrison Chase a83ba44efa
Harrison/ver0089 (#1144) 2 years ago
Ankush Gola 7b5e160d28
Make Tools own model, add ToolKit Concept (#1095)
Follow-up of @hinthornw's PR:

- Migrate the Tool abstraction to a separate file (`BaseTool`).
- `Tool` implementation of `BaseTool` takes in function and coroutine to
more easily maintain backwards compatibility
- Add a Toolkit abstraction that can own the generation of tools around
a shared concept or state

---------

Co-authored-by: William FH <13333726+hinthornw@users.noreply.github.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Francisco Ingham <fpingham@gmail.com>
Co-authored-by: Dhruv Anand <105786647+dhruv-anand-aintech@users.noreply.github.com>
Co-authored-by: cragwolfe <cragcw@gmail.com>
Co-authored-by: Anton Troynikov <atroyn@users.noreply.github.com>
Co-authored-by: Oliver Klingefjord <oliver@klingefjord.com>
Co-authored-by: William Fu-Hinthorn <whinthorn@Williams-MBP-3.attlocal.net>
Co-authored-by: Bruno Bornsztein <bruno.bornsztein@gmail.com>
2 years ago
Harrison Chase 45b5640fe5
fix sql (#1141) 2 years ago
Sam Hogan 85c1449a96
Fix typo in HyDE docs (#1142) 2 years ago
kekayan 9111f4ca8a
fix chatvectordbchain to use pinecone namespace (#1139)
In the similarity search, the pinecone namespace is not used, which
makes the bot return _I don't know_ where the embeddings are stored in
the pinecone namespace. Now we can query by passing the namespace
optionally.
```result = qa({"question": query, "chat_history": chat_history, "namespace":"01gshyhjcfgkq1q5wxjtm17gjh"})```
2 years ago
Harrison Chase fb3c73d194
add srt loader (#1140) 2 years ago
Francisco Ingham 3f29742adc
Sql alchemy commands used in table info (#1135)
This approach has several advantages:

* it improves the readability of the code
* removes incompatibilities between SQL dialects
* fixes a bug with `datetime` values in rows and `ast.literal_eval`

Huge thanks and credits to @jzluo for finding the weaknesses in the
current approach and for the thoughtful discussion on the best way to
implement this.

---------

Co-authored-by: Francisco Ingham <>
Co-authored-by: Jon Luo <20971593+jzluo@users.noreply.github.com>
2 years ago
Harrison Chase 483821ea3b
fix docs (#1133) 2 years ago
Harrison Chase ee3590cb61
instruct embeddings docs (#1131) 2 years ago
Noah Gundotra 8c5fbab72d
[Integration Tests] Cast fake embeddings to ALL float values (#1102)
Pydantic validation breaks tests for example (`test_qdrant.py`) because
fake embeddings contain an integer.

This PR casts the embeddings array to all floats.

Now the `qdrant` test passes, `poetry run pytest
tests/integration_tests/vectorstores/test_qdrant.py`
2 years ago
Harrison Chase d5f3dfa1e1
Harrison/hn loader (#1130)
Co-authored-by: William X <william.y.xuan@gmail.com>
2 years ago
Tom Bocklisch 47c3221fda
Max marginal relecance search fails if there are not enough docs (#1117)
Implementation fails if there are not enough documents. Added the same
check as used for similarity search.

Current implementation raises
```  
File ".venv/lib/python3.9/site-packages/langchain/vectorstores/faiss.py", line 160, in max_marginal_relevance_search
    _id = self.index_to_docstore_id[i]
KeyError: -1
```
2 years ago
Harrison Chase 511d41114f
return source documents for chat vector db chain (#1128) 2 years ago
Jon Luo c39ef70aa4
fix for database compatibility when getting table DDL (#1129)
#1081 introduced a method to get DDL (table definitions) in a manner
specific to sqlite3, thus breaking compatibility with other non-sqlite3
databases. This uses the sqlite3 command if the detected dialect is
sqlite, and otherwise uses the standard SQL `SHOW CREATE TABLE`. This
should fix #1103.
2 years ago
yakigac 1ed708391e
Fix a bug that shows "KeyError 'items'" (#1118)
Fix KeyError 'items' when no result found.

## Problem

When no result found for a query, google search crashed with `KeyError
'items'`.

## Solution

I added a check for an empty response before accessing the 'items' key.
It will handle the case correctly.

## Other

my twitter: yakigac
(I don't mind even if you don't mention me for this PR. But just because
last time my real name was shout out :) )
2 years ago
Matt Robinson 2bee8d4941
feat: add support for `.ppt` files in `UnstructuredPowerPointLoader` (#1124)
###  Summary

Adds support for older `.ppt` file in the PowerPoint loader. 

### Testing

The following should work on `unstructured==0.4.11` using the example
docs from the `unstructured` repo.

```python
from langchain.document_loaders import UnstructuredPowerPointLoader

filename = "../unstructured/example-docs/fake-power-point.pptx"
loader = UnstructuredPowerPointLoader(filename)
loader.load()

filename = "../unstructured/example-docs/fake-power-point.ppt"
loader = UnstructuredPowerPointLoader(filename)
loader.load()
```

Now downgrade `unstructured` to version `0.4.10`. The following should
work:

```python
from langchain.document_loaders import UnstructuredPowerPointLoader

filename = "../unstructured/example-docs/fake-power-point.pptx"
loader = UnstructuredPowerPointLoader(filename)
loader.load()
```

and the following should give you a `ValueError` and invite you to
upgrade `unstructured`.


```python
from langchain.document_loaders import UnstructuredPowerPointLoader

filename = "../unstructured/example-docs/fake-power-point.ppt"
loader = UnstructuredPowerPointLoader(filename)
loader.load()
```
2 years ago
Matt Robinson b956070f08
docs: add an unstructured section to the ecosystem page (#1125)
### Summary

Adds an Unstructured section to the ecosystem page.
2 years ago
Hasegawa Yuya 383c67c1b2
Fix Issue #1100 (#1101)
https://github.com/hwchase17/langchain/issues/1100
When faiss data and doc.index are created in past versions, error occurs
that say there was no attribute. So I put hasattr in the check as a
simple solution.

However, increasing the number of such checks is not good for
conservatism, so I think there is a better solution.


Also, the code for the batch process was left out, so I put it back in.
2 years ago
Harrison Chase 3f50feb280
fix telegram imports (#1110) 2 years ago
trigaten 6fafcd0a70
Strange behavior with LLM import requirements (#1104)
This import works fine:
```python
from langchain import Anthropic
```
This import does not:
```python
from langchain import AI21
```

```
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: cannot import name 'AI21' from 'langchain' (/opt/anaconda3/envs/fed_nlp/lib/python3.9/site-packages/langchain/__init__.py)
```

I think there is a slight documentation inconsistency here:
https://langchain.readthedocs.io/en/latest/reference/modules/llms.html

This PR starts to solve that. Should all the import examples be
`from langchain.llms import X` instead of `from langchain import X`?
2 years ago
Kacper Łukawski ab1a3cccac
Hotfix: Qdrant content retrieval (revert: #1088) (#1093)
The #1088 introduced a bug in Qdrant integration. That PR reverts those
changes and provides class attributes to ensure consistent payload keys.
In addition to that, an exception will be thrown if any of texts is None
(that could have been an issue reported in #1087)
2 years ago
Harrison Chase 6322b6f657
bump version 0.0.88 (#1090) 2 years ago
Francisco Ingham 3462130e2d
Modify number of types of chains (#1089)
Changed number of types of chains to make it consistent with the rest of
the docs
2 years ago
Rishabh Raizada 5d11e5da40
Update qdrant.py (#1088)
Fixes #1087
2 years ago
Harrison Chase 7745505482
chat qa with sources (#1084) 2 years ago
Harrison Chase badeeb37b0
fix stuff count (#1083) 2 years ago
Harrison Chase 971458c5de
docs for batch size (#1082) 2 years ago
Harrison Chase 5e10e19bfe
Harrison/align table (#1081)
Co-authored-by: Francisco Ingham <fpingham@gmail.com>
2 years ago
Harrison Chase c60954d0f8
Harrison/telegram loader (#1080)
Co-authored-by: Maxime Vidal <max.vidal@hotmail.fr>
2 years ago
Dennis Antela Martinez a1c296bc3c
docs: increase width (#1049)
This addresses #948.

I set the documentation max width to 2560px, but can be adjusted - see
screenshot below.

<img width="1741" alt="Screenshot 2023-02-14 at 13 05 57"
src="https://user-images.githubusercontent.com/23406704/218749076-ea51e90a-a220-4558-b4fe-5a95b39ebf15.png">
2 years ago