Commit Graph

612 Commits

Author SHA1 Message Date
Sam Hogan
85c1449a96
Fix typo in HyDE docs (#1142) 2023-02-18 11:48:46 -08:00
kekayan
9111f4ca8a
fix chatvectordbchain to use pinecone namespace (#1139)
In the similarity search, the pinecone namespace is not used, which
makes the bot return _I don't know_ where the embeddings are stored in
the pinecone namespace. Now we can query by passing the namespace
optionally.
```result = qa({"question": query, "chat_history": chat_history, "namespace":"01gshyhjcfgkq1q5wxjtm17gjh"})```
2023-02-18 10:58:48 -08:00
Harrison Chase
fb3c73d194
add srt loader (#1140) 2023-02-18 10:58:39 -08:00
Francisco Ingham
3f29742adc
Sql alchemy commands used in table info (#1135)
This approach has several advantages:

* it improves the readability of the code
* removes incompatibilities between SQL dialects
* fixes a bug with `datetime` values in rows and `ast.literal_eval`

Huge thanks and credits to @jzluo for finding the weaknesses in the
current approach and for the thoughtful discussion on the best way to
implement this.

---------

Co-authored-by: Francisco Ingham <>
Co-authored-by: Jon Luo <20971593+jzluo@users.noreply.github.com>
2023-02-18 10:58:29 -08:00
Harrison Chase
483821ea3b
fix docs (#1133) 2023-02-18 08:13:54 -08:00
Harrison Chase
ee3590cb61
instruct embeddings docs (#1131) 2023-02-17 16:14:49 -08:00
Noah Gundotra
8c5fbab72d
[Integration Tests] Cast fake embeddings to ALL float values (#1102)
Pydantic validation breaks tests for example (`test_qdrant.py`) because
fake embeddings contain an integer.

This PR casts the embeddings array to all floats.

Now the `qdrant` test passes, `poetry run pytest
tests/integration_tests/vectorstores/test_qdrant.py`
2023-02-17 15:18:09 -08:00
Harrison Chase
d5f3dfa1e1
Harrison/hn loader (#1130)
Co-authored-by: William X <william.y.xuan@gmail.com>
2023-02-17 15:15:02 -08:00
Tom Bocklisch
47c3221fda
Max marginal relecance search fails if there are not enough docs (#1117)
Implementation fails if there are not enough documents. Added the same
check as used for similarity search.

Current implementation raises
```  
File ".venv/lib/python3.9/site-packages/langchain/vectorstores/faiss.py", line 160, in max_marginal_relevance_search
    _id = self.index_to_docstore_id[i]
KeyError: -1
```
2023-02-17 15:12:31 -08:00
Harrison Chase
511d41114f
return source documents for chat vector db chain (#1128) 2023-02-17 13:40:52 -08:00
Jon Luo
c39ef70aa4
fix for database compatibility when getting table DDL (#1129)
#1081 introduced a method to get DDL (table definitions) in a manner
specific to sqlite3, thus breaking compatibility with other non-sqlite3
databases. This uses the sqlite3 command if the detected dialect is
sqlite, and otherwise uses the standard SQL `SHOW CREATE TABLE`. This
should fix #1103.
2023-02-17 13:39:44 -08:00
yakigac
1ed708391e
Fix a bug that shows "KeyError 'items'" (#1118)
Fix KeyError 'items' when no result found.

## Problem

When no result found for a query, google search crashed with `KeyError
'items'`.

## Solution

I added a check for an empty response before accessing the 'items' key.
It will handle the case correctly.

## Other

my twitter: yakigac
(I don't mind even if you don't mention me for this PR. But just because
last time my real name was shout out :) )
2023-02-17 13:04:02 -08:00
Matt Robinson
2bee8d4941
feat: add support for .ppt files in UnstructuredPowerPointLoader (#1124)
###  Summary

Adds support for older `.ppt` file in the PowerPoint loader. 

### Testing

The following should work on `unstructured==0.4.11` using the example
docs from the `unstructured` repo.

```python
from langchain.document_loaders import UnstructuredPowerPointLoader

filename = "../unstructured/example-docs/fake-power-point.pptx"
loader = UnstructuredPowerPointLoader(filename)
loader.load()

filename = "../unstructured/example-docs/fake-power-point.ppt"
loader = UnstructuredPowerPointLoader(filename)
loader.load()
```

Now downgrade `unstructured` to version `0.4.10`. The following should
work:

```python
from langchain.document_loaders import UnstructuredPowerPointLoader

filename = "../unstructured/example-docs/fake-power-point.pptx"
loader = UnstructuredPowerPointLoader(filename)
loader.load()
```

and the following should give you a `ValueError` and invite you to
upgrade `unstructured`.


```python
from langchain.document_loaders import UnstructuredPowerPointLoader

filename = "../unstructured/example-docs/fake-power-point.ppt"
loader = UnstructuredPowerPointLoader(filename)
loader.load()
```
2023-02-17 13:03:25 -08:00
Matt Robinson
b956070f08
docs: add an unstructured section to the ecosystem page (#1125)
### Summary

Adds an Unstructured section to the ecosystem page.
2023-02-17 13:02:23 -08:00
Hasegawa Yuya
383c67c1b2
Fix Issue #1100 (#1101)
https://github.com/hwchase17/langchain/issues/1100
When faiss data and doc.index are created in past versions, error occurs
that say there was no attribute. So I put hasattr in the check as a
simple solution.

However, increasing the number of such checks is not good for
conservatism, so I think there is a better solution.


Also, the code for the batch process was left out, so I put it back in.
2023-02-17 00:53:16 -08:00
Harrison Chase
3f50feb280
fix telegram imports (#1110) 2023-02-17 00:53:01 -08:00
trigaten
6fafcd0a70
Strange behavior with LLM import requirements (#1104)
This import works fine:
```python
from langchain import Anthropic
```
This import does not:
```python
from langchain import AI21
```

```
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: cannot import name 'AI21' from 'langchain' (/opt/anaconda3/envs/fed_nlp/lib/python3.9/site-packages/langchain/__init__.py)
```

I think there is a slight documentation inconsistency here:
https://langchain.readthedocs.io/en/latest/reference/modules/llms.html

This PR starts to solve that. Should all the import examples be
`from langchain.llms import X` instead of `from langchain import X`?
2023-02-16 23:13:34 -08:00
Kacper Łukawski
ab1a3cccac
Hotfix: Qdrant content retrieval (revert: #1088) (#1093)
The #1088 introduced a bug in Qdrant integration. That PR reverts those
changes and provides class attributes to ensure consistent payload keys.
In addition to that, an exception will be thrown if any of texts is None
(that could have been an issue reported in #1087)
2023-02-16 12:46:06 -08:00
Harrison Chase
6322b6f657
bump version 0.0.88 (#1090) 2023-02-16 07:32:32 -08:00
Francisco Ingham
3462130e2d
Modify number of types of chains (#1089)
Changed number of types of chains to make it consistent with the rest of
the docs
2023-02-16 07:06:30 -08:00
Rishabh Raizada
5d11e5da40
Update qdrant.py (#1088)
Fixes #1087
2023-02-16 07:06:02 -08:00
Harrison Chase
7745505482
chat qa with sources (#1084) 2023-02-16 00:29:47 -08:00
Harrison Chase
badeeb37b0
fix stuff count (#1083) 2023-02-15 23:57:13 -08:00
Harrison Chase
971458c5de
docs for batch size (#1082) 2023-02-15 23:53:56 -08:00
Harrison Chase
5e10e19bfe
Harrison/align table (#1081)
Co-authored-by: Francisco Ingham <fpingham@gmail.com>
2023-02-15 23:53:37 -08:00
Harrison Chase
c60954d0f8
Harrison/telegram loader (#1080)
Co-authored-by: Maxime Vidal <max.vidal@hotmail.fr>
2023-02-15 23:24:32 -08:00
Dennis Antela Martinez
a1c296bc3c
docs: increase width (#1049)
This addresses #948.

I set the documentation max width to 2560px, but can be adjusted - see
screenshot below.

<img width="1741" alt="Screenshot 2023-02-14 at 13 05 57"
src="https://user-images.githubusercontent.com/23406704/218749076-ea51e90a-a220-4558-b4fe-5a95b39ebf15.png">
2023-02-15 23:07:01 -08:00
Harrison Chase
c96ac3e591
Harrison/semantic subset (#1079)
Co-authored-by: Chen Wu (吴尘) <henrychenwu@cmu.edu>
2023-02-15 23:06:48 -08:00
Harrison Chase
19c2797bed
add anthropic example (#1041)
Co-authored-by: Ivan Vendrov <ivendrov@gmail.com>
Co-authored-by: Sasmitha Manathunga <70096033+mmz-001@users.noreply.github.com>
2023-02-15 23:04:28 -08:00
3ecdea8be4
SearxNG meta search api helper (#854)
This is a work in progress PR to track my progres.

## TODO:

- [x]  Get results using the specifed searx host
- [x]  Prioritize returning an  `answer`  or results otherwise
    - [ ] expose the field `infobox` when available
    - [ ] expose `score` of result to help agent's decision
- [ ] expose the `suggestions` field to agents so they could try new
queries if no results are found with the orignial query ?

- [ ] Dynamic tool description for agents ?
- Searx offers many engines and a search syntax that agents can take
advantage of. It would be nice to generate a dynamic Tool description so
that it can be used many times as a tool but for different purposes.

- [x]  Limit number of results
- [ ]   Implement paging
- [x]  Miror the usage of the Google Search tool
- [x] easy selection of search engines
- [x]  Documentation
    - [ ] update HowTo guide notebook on Search Tools
- [ ] Handle async 
- [ ]  Tests

###  Add examples / documentation on possible uses with
 - [ ]  getting factual answers with `!wiki` option and `infoboxes`
 - [ ]  getting `suggestions`
 - [ ]  getting `corrections`

---------

Co-authored-by: blob42 <spike@w530>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-02-15 23:03:57 -08:00
Hasegawa Yuya
e08961ab25
Fixed openai embeddings to be safe by batching them based on token size calculation. (#991)
I modified the logic of the batch calculation for embedding according to
this cookbook

https://github.com/openai/openai-cookbook/blob/main/examples/Embedding_long_inputs.ipynb
2023-02-15 23:02:32 -08:00
seanaedmiston
f0a258555b
Support similarity search by vector (in FAISS) (#961)
Alternate implementation to PR #960 Again - only FAISS is implemented.
If accepted can add this to other vectorstores or leave as
NotImplemented? Suggestions welcome...
2023-02-15 22:50:00 -08:00
Jonathan Pedoeem
05ad399abe
Update PromptLayerOpenAI LLM to include support for ASYNC API (#1066)
This PR updates `PromptLayerOpenAI` to now support requests using the
[Async
API](https://langchain.readthedocs.io/en/latest/modules/llms/async_llm.html)
It also updates the documentation on Async API to let users know that
PromptLayerOpenAI also supports this.

`PromptLayerOpenAI` now redefines `_agenerate` a similar was to how it
redefines `_generate`
2023-02-15 22:48:09 -08:00
Harrison Chase
98186ef180
Harrison/evernote nb (#1078)
Co-authored-by: Akshay <64036106+akshayvkt@users.noreply.github.com>
2023-02-15 22:47:30 -08:00
rogerserper
e46cd3b7db
Google Search API integration with serper.dev (wrapper, tests, docs, … (#909)
Adds Google Search integration with [Serper](https://serper.dev) a
low-cost alternative to SerpAPI (10x cheaper + generous free tier).
Includes documentation, tests and examples. Hopefully I am not missing
anything.

Developers can sign up for a free account at
[serper.dev](https://serper.dev) and obtain an api key.

## Usage

```python
from langchain.utilities import GoogleSerperAPIWrapper
from langchain.llms.openai import OpenAI
from langchain.agents import initialize_agent, Tool

import os
os.environ["SERPER_API_KEY"] = ""
os.environ['OPENAI_API_KEY'] = ""

llm = OpenAI(temperature=0)
search = GoogleSerperAPIWrapper()
tools = [
    Tool(
        name="Intermediate Answer",
        func=search.run
    )
]

self_ask_with_search = initialize_agent(tools, llm, agent="self-ask-with-search", verbose=True)
self_ask_with_search.run("What is the hometown of the reigning men's U.S. Open champion?")
```

### Output
```
Entering new AgentExecutor chain...
 Yes.
Follow up: Who is the reigning men's U.S. Open champion?
Intermediate answer: Current champions Carlos Alcaraz, 2022 men's singles champion.
Follow up: Where is Carlos Alcaraz from?
Intermediate answer: El Palmar, Spain
So the final answer is: El Palmar, Spain

> Finished chain.

'El Palmar, Spain'
```
2023-02-15 22:47:17 -08:00
Harrison Chase
52753066ef
Harrison/handle stop tokens ai21 (#1077)
Co-authored-by: Andrew Huang <jhuang16888@gmail.com>
2023-02-15 22:44:55 -08:00
Akshay
d8ed286200
Update and rename everynote.py to evernote.py (#1060)
Updating this base file as well as the .ipynb file of the example on the
website:

https://github.com/hwchase17/langchain/compare/master...akshayvkt:langchain:patch-1

https://langchain.readthedocs.io/en/latest/modules/document_loaders/examples/everynote.html
2023-02-15 22:41:42 -08:00
Jeff Huber
34cba2da32
Fix typo in integration with Chroma (#1070)
We introduced a breaking change but missed this call. This PR fixes
`langchain` to work with upstream `chroma`.
2023-02-15 22:37:58 -08:00
Jonathan Pedoeem
05df480376
Update PromptLayerOpenAI LLM usage instructions in documentation (#1053)
This PR updates the usage instructions for PromptLayerOpenAI in
Langchain's documentation. The updated instructions provide more detail
and conform better to the style of other LLM integration documentation
pages.

No code changes were made in this PR, only improvements to the
documentation. This update will make it easier for users to understand
how to use `PromptLayerOpenAI`
2023-02-15 22:37:48 -08:00
Matt Robinson
3ea1e5af1e
feat: added element metadata to unstructured loader (#1068)
### Summary

Adds tracked metadata from `unstructured` elements to the document
metadata when `UnstructuredFileLoader` is used in `"elements"` mode.
Tracked metadata is available in `unstructured>=0.4.9`, but the code is
written for backward compatibility with older `unstructured` versions.

### Testing

Before running, make sure to upgrade to `unstructured==0.4.9`. In the
code snippet below, you should see `page_number`, `filename`, and
`category` in the metadata for each document. `doc[0]` should have
`page_number: 1` and `doc[-1]` should have `page_number: 2`. The example
document is `layout-parser-paper-fast.pdf` from the [`unstructured`
sample
docs](https://github.com/Unstructured-IO/unstructured/tree/main/example-docs).

```python
from langchain.document_loaders import UnstructuredFileLoader
loader = UnstructuredFileLoader(file_path=f"layout-parser-paper-fast.pdf", mode="elements")
docs = loader.load()
```
2023-02-15 22:36:18 -08:00
Harrison Chase
bac676c8e7
bump version (#1057) 2023-02-15 07:09:10 -08:00
Ankush Gola
d8ac274fc2
add to async chain notebook (#1056) 2023-02-14 18:20:38 -08:00
Ankush Gola
caa8e4742e
Enable streaming for OpenAI LLM (#986)
* Support a callback `on_llm_new_token` that users can implement when
`OpenAI.streaming` is set to `True`
2023-02-14 15:06:14 -08:00
Harrison Chase
f05f025e41
bump version to 0086 (#1050) 2023-02-14 07:14:40 -08:00
Sasmitha Manathunga
c67c5383fd
docs: fix typo in notebook (#1046) 2023-02-14 07:06:08 -08:00
Harrison Chase
88bebb4caa
Harrison/llm integrations (#1039)
Co-authored-by: jped <jonathanped@gmail.com>
Co-authored-by: Justin Torre <justintorre75@gmail.com>
Co-authored-by: Ivan Vendrov <ivan@anthropic.com>
2023-02-13 22:06:25 -08:00
Harrison Chase
ec727bf166
Align table info (#999) (#1034)
Currently the chain is getting the column names and types on the one
side and the example rows on the other. It is easier for the llm to read
the table information if the column name and examples are shown together
so that it can easily understand to which columns do the examples refer
to. For an instantiation of this, please refer to the changes in the
`sqlite.ipynb` notebook.

Also changed `eval` for `ast.literal_eval` when interpreting the results
from the sample row query since it is a better practice.

---------

Co-authored-by: Francisco Ingham <>

---------

Co-authored-by: Francisco Ingham <fpingham@gmail.com>
2023-02-13 21:48:41 -08:00
Harrison Chase
8c45f06d58
Harrison/standarize prompt loading (#1036)
Co-authored-by: Ibis Prevedello <ibiscp@gmail.com>
2023-02-13 21:48:09 -08:00
Enrico Shippole
f30dcc6359
Add GooseAI, CerebriumAI, Petals, ForefrontAI (#981)
Add GooseAI, CerebriumAI, Petals, ForefrontAI
2023-02-13 21:20:19 -08:00
Anton Troynikov
d43d430d86
Chroma persistence (#1028)
This PR adds persistence to the Chroma vector store.

Users can supply a `persist_directory` with any of the `Chroma` creation
methods. If supplied, the store will be automatically persisted at that
directory.

If a user creates a new `Chroma` instance with the same persistence
directory, it will get loaded up automatically. If they use `from_texts`
or `from_documents` in this way, the documents will be loaded into the
existing store.

There is the chance of some funky behavior if the user passes a
different embedding function from the one used to create the collection
- we will make this easier in future updates. For now, we log a warning.
2023-02-13 21:09:06 -08:00