Commit Graph

324 Commits

Author SHA1 Message Date
Harrison Chase
1cd8996074
Harrison/summarizer chain (#1356)
Co-authored-by: Tim Asp <707699+timothyasp@users.noreply.github.com>
2023-03-01 20:59:07 -08:00
Harrison Chase
4b5e850361
chatgpt wrapper (#1367) 2023-03-01 11:47:01 -08:00
Harrison Chase
4d4b43cf5a
fix doc names (#1354) 2023-03-01 09:40:31 -08:00
Harrison Chase
fe7dbecfe6
pandas and csv agents (#1353) 2023-02-28 22:19:11 -08:00
Harrison Chase
02ec72df87
improve docs (#1351) 2023-02-28 21:37:18 -08:00
Jon Luo
92ab27e4b8
sql doc formatting (#1350)
My bad, missed a few tabs between the two PRs
2023-02-28 19:54:46 -08:00
Ankush Gola
82baecc892
Add a SQL agent for interacting with SQL Databases and JSON Agent for interacting with large JSON blobs (#1150)
This PR adds 

* `ZeroShotAgent.as_sql_agent`, which returns an agent for interacting
with a sql database. This builds off of `SQLDatabaseChain`. The main
advantages are 1) answering general questions about the db, 2) access to
a tool for double checking queries, and 3) recovering from errors
* `ZeroShotAgent.as_json_agent` which returns an agent for interacting
with json blobs.
* Several examples in notebooks

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-02-28 19:44:39 -08:00
Jon Luo
35f1e8f569
separate columns by tabs instead of single space in sql sample rows (#1348)
Use tabs to separate columns instead of a single space - confusing when
there are spaces in a cell
2023-02-28 18:59:53 -08:00
James Brotchie
3574418a40
Fix link in summarization.md (#1344)
"Utilities for working with Documents" was linking to a non-useful page.
Re-linked to the utils page that includes info about working with docs.
2023-02-28 18:58:12 -08:00
Jon Luo
5bf8772f26
add option to use user-defined SQL table info (#1347)
Currently, table information is gathered through SQLAlchemy as complete
table DDL and a user-selected number of sample rows from each table.
This PR adds the option to use user-defined table information instead of
automatically collecting it. This will use the provided table
information and fall back to the automatic gathering for tables that the
user didn't provide information for.

Off the top of my head, there are a few cases where this can be quite
useful:
- The first n rows of a table are uninformative, or very similar to one
another. In this case, hand-crafting example rows for a table such that
they provide the good, diverse information can be very helpful. Another
approach we can think about later is getting a random sample of n rows
instead of the first n rows, but there are some performance
considerations that need to be taken there. Even so, hand-crafting the
sample rows is useful and can guarantee the model sees informative data.
- The user doesn't want every column to be available to the model. This
is not an elegant way to fulfill this specific need since the user would
have to provide the table definition instead of a simple list of columns
to include or ignore, but it does work for this purpose.
- For the developers, this makes it a lot easier to compare/benchmark
the performance of different prompting structures for providing table
information in the prompt.

These are cases I've run into myself (particularly cases 1 and 3) and
I've found these changes useful. Personally, I keep custom table info
for a few tables in a yaml file for versioning and easy loading.

Definitely open to other opinions/approaches though!
2023-02-28 18:58:04 -08:00
Harrison Chase
786852e9e6
partial variables (#1308) 2023-02-28 08:40:35 -08:00
Tim Asp
72ef69d1ba
Add new iFixit document loader (#1333)
iFixit is a wikipedia-like site that has a huge amount of open content
on how to fix things, questions/answers for common troubleshooting and
"things" related content that is more technical in nature. All content
is licensed under CC-BY-SA-NC 3.0

Adding docs from iFixit as context for user questions like "I dropped my
phone in water, what do I do?" or "My macbook pro is making a whining
noise, what's wrong with it?" can yield significantly better responses
than context free response from LLMs.
2023-02-27 20:40:20 -08:00
Matt Robinson
1aa41b5741
feat: document loader for image files (#1330)
### Summary

Adds a document loader for image files such as `.jpg` and `.png` files.

### Testing

Run the following using the example document from the [`unstructured`
repo](https://github.com/Unstructured-IO/unstructured/tree/main/example-docs).

```python
from langchain.document_loaders.image import UnstructuredImageLoader

loader = UnstructuredImageLoader("layout-parser-paper-fast.jpg")
loader.load()
```
2023-02-27 14:43:32 -08:00
Eugene Yurtsev
c14cff60d0
Documentation: Minor typo fixes (#1327)
Fixing a few minor typos in the documentation (and likely introducing
other
ones in the process).
2023-02-27 14:40:43 -08:00
Harrison Chase
f61858163d
bump version to 0.0.95 (#1324) 2023-02-27 07:45:54 -08:00
Harrison Chase
0824d65a5c
Harrison/indexing pipeline (#1317) 2023-02-27 00:31:36 -08:00
Akshay
a0bf856c70
Update agent_vectorstore.ipynb (#1318)
nitpicking but just thought i'd add this typo which I found when going
through the How-to 😄 (unless it was intentional) also, it's amazing that
you added ReAct to LangChain!
2023-02-26 23:22:35 -08:00
Harrison Chase
166cda2cc6
Harrison/deeplake (#1316)
Co-authored-by: Davit Buniatyan <d@activeloop.ai>
2023-02-26 22:35:04 -08:00
Harrison Chase
aaad6cc954
Harrison/atlas db (#1315)
Co-authored-by: Brandon Duderstadt <brandonduderstadt@gmail.com>
2023-02-26 22:11:38 -08:00
Marc Puig
3989c793fd
Making it possible to use "certainty" as a parameter for the weaviate similarity_search (#1218)
Checking if weaviate similarity_search kwargs contains "certainty" and
use it accordingly. The minimal level of certainty must be a float, and
it is computed by normalized distance.
2023-02-26 17:55:28 -08:00
Harrison Chase
81abcae91a
Harrison/banana fix (#1311)
Co-authored-by: Erik Dunteman <44653944+erik-dunteman@users.noreply.github.com>
2023-02-26 17:53:57 -08:00
Casey A. Fitzpatrick
648b3b3909
Fix use case sentence for bash util doc (#1295)
Thanks for all your hard work!

I noticed a small typo in the bash util doc so here's a quick update.
Additionally, my formatter caught some spacing in the `.md` as well.
Happy to revert that if it's an issue.

The main change is just
```
- A common use case this is for letting it interact with your local file system. 

+ A common use case for this is letting the LLM interact with your local file system.
```

## Testing

`make docs_build` succeeds locally and the changes show as expected ✌️ 
<img width="704" alt="image"
src="https://user-images.githubusercontent.com/17773666/221376160-e99e59a6-b318-49d1-a1d7-89f5c17cdab4.png">
2023-02-26 17:41:03 -08:00
Ingo Kleiber
fd9975dad7
add CoNLL-U document loader (#1297)
I've added a simple
[CoNLL-U](https://universaldependencies.org/format.html) document
loader. CoNLL-U is a common format for NLP tasks and is used, for
example, in the Universal Dependencies treebank corpora. The loader
reads a single file in standard CoNLL-U format and returns a document.
2023-02-26 17:27:00 -08:00
Harrison Chase
d29f74114e
copy paste loader (#1302) 2023-02-26 17:26:37 -08:00
Harrison Chase
ce441edd9c
improve docs (#1309) 2023-02-26 11:25:16 -08:00
Harrison Chase
6f30d68581
add example of using agent with vectorstores (#1285) 2023-02-25 13:27:24 -08:00
Matt Robinson
2f15c11b87
feat: document loader for MS Word documents (#1282)
### Summary

Adds a document loader for MS Word Documents. Works with both `.docx`
and `.doc` files as longer as the user has installed
`unstructured>=0.4.11`.

### Testing

The follow workflow test the loader for both `.doc` and `.docx` files
using example docs from the `unstructured` repo.

#### `.docx`

```python
from langchain.document_loaders import UnstructuredWordDocumentLoader

filename = "../unstructured/example-docs/fake.docx"
loader = UnstructuredWordDocumentLoader(filename)
loader.load()
```

#### `.doc`

```python
from langchain.document_loaders import UnstructuredWordDocumentLoader

filename = "../unstructured/example-docs/fake.doc"
loader = UnstructuredWordDocumentLoader(filename)
loader.load()
```
2023-02-24 08:26:19 -08:00
Harrison Chase
96db6ed073
cleanup (#1274) 2023-02-24 07:38:24 -08:00
Harrison Chase
42167a1e24
Harrison/fb loader (#1277)
Co-authored-by: Vairo Di Pasquale <vairo.dp@gmail.com>
2023-02-24 07:22:48 -08:00
Klein Tahiraj
8a0751dadd
adding .ipynb loader and documentation Fixes #1248 (#1252)
`NotebookLoader.load()` loads the `.ipynb` notebook file into a
`Document` object.

**Parameters**:

* `include_outputs` (bool): whether to include cell outputs in the
resulting document (default is False).
* `max_output_length` (int): the maximum number of characters to include
from each cell output (default is 10).
* `remove_newline` (bool): whether to remove newline characters from the
cell sources and outputs (default is False).
* `traceback` (bool): whether to include full traceback (default is
False).
2023-02-24 07:10:35 -08:00
Enrico Shippole
9becdeaadf
Add Writer, Banana, Modal, StochasticAI (#1270)
Add LLM wrappers and examples for Banana, Writer, Modal, Stochastic AI

Added rigid json format for Banana and Modal
2023-02-24 06:58:58 -08:00
Matt Robinson
10e73a3723
docs: remove nltk download steps (#1253)
### Summary

Updates the docs to remove the `nltk` download steps from
`unstructured`. As of `unstructured` `0.4.14`, this is handled
automatically in the relevant modules within `unstructured`.
2023-02-23 12:34:44 -08:00
Justin Torre
5bc6dc076e
added caching and properties docs (#1255) 2023-02-23 11:03:04 -08:00
Iskren Ivov Chernev
8e3cd3e0dd
Add DeepInfra LLM support (#1232)
DeepInfra is an Inference-as-a-Service provider. Add a simple wrapper
using HTTPS requests.
2023-02-23 07:37:15 -08:00
Dmitri Melikyan
b7765a95a0
docs: add Graphsignal ecosystem page (#1228)
Adds a Graphsignal ecosystem page
2023-02-23 07:33:00 -08:00
Harrison Chase
6085fe18d4
add ifttt tool (#1244) 2023-02-22 22:29:43 -08:00
Harrison Chase
71709ad5d5
Update key_concepts.md (#1209) (#1237)
Link for easier navigation (it's not immediately clear where to find
more info on SimpleSequentialChain (3 clicks away)

---------

Co-authored-by: Larry Fisherman <l4rryfisherman@protonmail.com>
2023-02-22 13:30:53 -08:00
Dennis Antela Martinez
53c67e04d4
add aleph alpha llm (#1207)
Integrate Aleph Alpha's client into Langchain to provide access to the
luminous models - more info on latest benchmarks here:
https://www.aleph-alpha.com/luminous-performance-benchmarks
2023-02-22 10:37:36 -08:00
Ikko Eltociear Ashimine
334b553260
Update petals.md (#1225)
Huggingface -> Hugging Face
2023-02-22 10:34:16 -08:00
Sason
cc7d2e5621
Correct typo in "Question Answering" How-To Guide (#1221) 2023-02-21 17:02:58 -08:00
Matt Robinson
3d5f56a8a1
docs: add quotes to unstructured[local-inference] install instructions (#1208)
### Summary

Corrects the install instruction for local inference to `pip install
"unstructured[local-inference]"`
2023-02-21 08:06:43 -08:00
Harrison Chase
047231840d
add docs for chroma persistance (#1202) 2023-02-20 23:04:17 -08:00
Harrison Chase
5bdb8dd6fe
Harrison/unstructured io (#1200) 2023-02-20 22:54:49 -08:00
Harrison Chase
d90a287d8f
Harrison/updating docs (#1196) 2023-02-20 22:54:26 -08:00
Dennis Antela Martinez
23243ae69c
add gitbook document loader (#1180)
Added a GitBook document loader. It lets you both, (1) fetch text from
any single GitBook page, or (2) fetch all relative paths and return
their respective content in Documents.

I've modified the `scrape` method in the `WebBaseLoader` to accept
custom web paths if given, but happy to remove it and move that logic
into the `GitbookLoader` itself.
2023-02-20 20:05:04 -08:00
Naveen Tatikonda
0118706fd6
Add Support for OpenSearch Vector database (#1191)
### Description
This PR adds a wrapper which adds support for the OpenSearch vector
database. Using opensearch-py client we are ingesting the embeddings of
given text into opensearch cluster using Bulk API. We can perform the
`similarity_search` on the index using the 3 popular searching methods
of OpenSearch k-NN plugin:

- `Approximate k-NN Search` use approximate nearest neighbor (ANN)
algorithms from the [nmslib](https://github.com/nmslib/nmslib),
[faiss](https://github.com/facebookresearch/faiss), and
[Lucene](https://lucene.apache.org/) libraries to power k-NN search.
- `Script Scoring` extends OpenSearch’s script scoring functionality to
execute a brute force, exact k-NN search.
- `Painless Scripting` adds the distance functions as painless
extensions that can be used in more complex combinations. Also, supports
brute force, exact k-NN search like Script Scoring.

### Issues Resolved 
https://github.com/hwchase17/langchain/issues/1054

---------

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
2023-02-20 18:39:34 -08:00
Harrison Chase
926c121b98
Harrison/text splitter docs (#1188) 2023-02-20 15:14:03 -08:00
Harrison Chase
91446a5e9b
clean up text splitting docs (#1184) 2023-02-20 11:24:31 -08:00
Harrison Chase
5a954efdd7
update gallery with slack bot (#1177) 2023-02-20 08:21:00 -08:00
9962bda70b
searx_search: docs updates (#1175)
- fix notebook formatting, remove empty cells and add scrolling for long
text

---------

Co-authored-by: blob42 <spike@w530>
2023-02-20 06:46:44 -08:00