Commit Graph

950 Commits

Author SHA1 Message Date
Harrison Chase
880a6a3db5
Harrison/redis id key (#2057)
Co-authored-by: Fabrizio Ruocco <ruoccofabrizio@gmail.com>
2023-03-27 15:03:51 -07:00
cragwolfe
71e8eaff2b
UnstructuredURLLoader: allow url failures, keep processing (#1954)
By default, UnstructuredURLLoader now continues processing remaining
`urls` if encountering an error for a particular url.

If failure of the entire loader is desired as was previously the case,
use `continue_on_failure=False`.

E.g., this fails splendidly, courtesy of the 2nd url:

```
from langchain.document_loaders import UnstructuredURLLoader
urls = [
    "https://www.understandingwar.org/backgrounder/russian-offensive-campaign-assessment-february-8-2023",
    "https://doesnotexistithinkprobablynotverynotlikely.io",
    "https://www.understandingwar.org/backgrounder/russian-offensive-campaign-assessment-february-9-2023",
]
loader = UnstructuredURLLoader(urls=urls, continue_on_failure=False)
data = loader.load()
```

Issue: https://github.com/hwchase17/langchain/issues/1939
2023-03-27 14:34:14 -07:00
Daniel Chalef
6598beacdb
PydanticOutputParser unit test (#2047)
Unit test for PydanticOutputParser

---------

Co-authored-by: Daniel Chalef <daniel.chalef@private.org>
2023-03-27 14:32:56 -07:00
William FH
e4f15e4eac
Add support for YAML Spec Plugins (#2054)
It's common to use `yaml` for an OpenAPI spec used in the GPT plugins. 

For example: https://www.joinmilo.com/openapi.yaml or
https://api.slack.com/specs/openapi/ai-plugin.yaml (from [Wong2's
ChatGPT Plugins List](https://github.com/wong2/chatgpt-plugins))
2023-03-27 14:27:48 -07:00
weiyang
e50c1ea7fb
Fix the parameter error of 'Qdrant.maximal_marginal_relevance' (#1921)
Hi, first and foremost, I would like to express my gratitude for your
outstanding work; it's truly remarkable!


https://github.com/hwchase17/langchain/blob/master/langchain/vectorstores/qdrant.py#L134
It appears that there might be a minor issue with the `limit` parameter
being passed incorrectly in the `Qdrant.maximal_marginal_relevance`
function. This seems to be a typographical error.

Signed-off-by: weiyang <weiyang.ones@gmail.com>
2023-03-27 08:29:07 -07:00
goka
62e08f80de
feat #1915 support for google custom search site restricted api (#1920)
#1915 

https://developers.google.com/custom-search/v1/site_restricted_api

It is possible to search unrestricted to specific sites.
2023-03-27 08:28:55 -07:00
david qiu
c50fafb35d
fix Poetry 1.4.0+ installation (#1935)
Temporary fix for #1801 until upstream issues with `pydata-sphinx-theme`
wheel are resolved.
2023-03-27 08:27:54 -07:00
Jason Holtkamp
3d3e523520
Update getting_started with better example (#1910)
I noticed that the "getting started" guide section on agents included an
example test where the agent was getting the question wrong 😅

I guess Olivia Wilde's dating life is too tough to keep track of for
this simple agent example. Let's change it to something a little easier,
so users who are running their agent for the first time are less likely
to be confused by a result that doesn't match that which is on the docs.
2023-03-27 08:19:13 -07:00
Eduard van Valkenburg
c1a9d83b34
Added Azure Blob Storage File and Container Loader (#1890)
Added support for document loaders for Azure Blob Storage using a
connection string. Fixes #1805

---------

Co-authored-by: Mick Vleeshouwer <mick@imick.nl>
2023-03-27 08:17:14 -07:00
Harrison Chase
42d725223e
Harrison/num token calculation (#2041)
Co-authored-by: Aratako <127325395+Aratako@users.noreply.github.com>
2023-03-27 08:16:32 -07:00
Harrison Chase
0bbcc7815b
Harrison/open search kwargs (#2040)
Signed-off-by: Marcel Coetzee <marcelcoetzee@tutanota.com>
Co-authored-by: Marcel <34739235+Pipboyguy@users.noreply.github.com>
2023-03-27 07:56:09 -07:00
Harrison Chase
b26fa1935d
fix headers (#2039) 2023-03-27 07:55:57 -07:00
Harrison Chase
bc2ed93b77
fix doc tags (#2019) 2023-03-26 21:43:51 -07:00
Ankush Gola
c71f2a7b26
small nit on index page (#2018) 2023-03-27 00:15:24 -04:00
Harrison Chase
51681f653f
fix docs (#2017) 2023-03-26 20:50:36 -07:00
Harrison Chase
705431aecc
big docs refactor (#1978)
Co-authored-by: Ankush Gola <ankush.gola@gmail.com>
2023-03-26 19:49:46 -07:00
Harrison Chase
b83e826510
plugin tool (#1974) 2023-03-24 12:30:08 -07:00
Mario Kostelac
e7d6de6b1c
(ChatOpenAI) Add model_name to LLMResult.llm_output (#1960)
This makes sure OpenAI and ChatOpenAI have the same llm_output, and
allow tracking usage per model. Same work for OpenAI was done in
https://github.com/hwchase17/langchain/pull/1713.
2023-03-24 08:51:16 -07:00
Harrison Chase
6e0d3880df
bump version to 122 (#1970) 2023-03-24 08:24:44 -07:00
Harrison Chase
6ec5780547
add docs for openai retriever ingest (#1969) 2023-03-24 08:24:33 -07:00
Harrison Chase
47d37db2d2
WIP: Harrison/base retriever (#1765) 2023-03-24 07:46:49 -07:00
Enwei Jiao
4f364db9a9
Add milvus for ecosystem (#1951) 2023-03-23 22:01:28 -07:00
Tim Asp
030ce9f506
fix import error of bs4 (#1952)
Ran into a broken build if bs4 wasn't installed in the project.

Minor tweak to follow the other doc loaders optional package-loading
conventions.

Also updated html docs to include reference to this new html loader.

side note: Should there be 2 different html-to-text document loaders?
This new one only handles local files, while the existing unstructured
html loader handles HTML from local and remote. So it seems like the
improvement was adding the title to the metadata, which is useful but
could also be added to `html.py`
2023-03-23 21:56:13 -07:00
Harrison Chase
8990122d5d
retrievers interface (#1948) 2023-03-23 19:00:38 -07:00
Harrison Chase
52d6bf04d0
tracing improvements to docs (#1947) 2023-03-23 19:00:18 -07:00
Harrison Chase
910da8518f
hotfix (#1928) 2023-03-23 07:11:15 -07:00
Naoki Ainoya
2f27ef92fe
Fix typo in VectorStoreIndexWrapper method (#1922)
Fixed a typo in the argument of the query method within the
VectorStoreIndexWrapper class. Specifically, the argument `retriver` has
been changed to `retriever`. With this correction, the correct argument
name is used, and potential bugs are avoided.
2023-03-23 07:08:04 -07:00
Harrison Chase
75149d6d38
bump version 120 (#1918) 2023-03-22 23:21:56 -07:00
Harrison Chase
fab7994b74
Harrison/retrieval code (#1916) 2023-03-22 23:15:04 -07:00
Harrison Chase
eb80d6e0e4
Harrison/from methods (#1912)
Co-authored-by: shibuiwilliam <shibuiyusuke@gmail.com>
2023-03-22 21:10:09 -07:00
Harrison Chase
b5667bed9e
human input default (#1911) 2023-03-22 20:30:45 -07:00
Eric Zhu
b3be83c750
Add human as a tool (#1879)
Human can help AI.  #1871
2023-03-22 20:14:52 -07:00
Harrison Chase
50626a10ee
Hx23840 feat/add redisearch vectorstore (#1909)
Co-authored-by: Peter <peter.shi@alephf.com>
Co-authored-by: Peter Shi <42536066+hx23840@users.noreply.github.com>
2023-03-22 19:57:56 -07:00
Harrison Chase
6e1b5b8f7e
Harrison/figma doc loader (#1908)
Co-authored-by: Ismail Pelaseyed <homanp@gmail.com>
2023-03-22 19:57:46 -07:00
Harrison Chase
eec9b1b306
Harrison/opensearch vectorstore (#1907)
Co-authored-by: Mehmet Öner Yalçın <oneryalcin@gmail.com>
2023-03-22 19:57:38 -07:00
Xin Qiu
ea142f6a32
feat: add drop index in redis and fix prefix generate logic (#1857)
# Description

Add `drop_index` for redis

RediSearch: [RediSearch quick
start](https://redis.io/docs/stack/search/quick_start/)

# How to use

```
from langchain.vectorstores.redis import Redis

Redis.drop_index(index_name="doc",delete_documents=False)
```
2023-03-22 19:44:42 -07:00
Eli
12f868b292
Propagate "filter" arg in Chroma similarity_search (#1869)
Technically a duplicate fix to #1619 but with unit tests and a small
documentation update
- Propagate `filter` arg in Chroma `similarity_search` to delegated call
to `similarity_search_with_score`
- Add `filter` arg to `similarity_search_by_vector`
- Clarify doc strings on FakeEmbeddings
2023-03-22 19:40:10 -07:00
Memento Mori
31f9ecfc19
Fix tiktoken version (#1882)
Fix https://github.com/hwchase17/langchain/issues/1881
This issue occurs when using `'gpt-3.5-turbo'` with
`VectorDBQAWithSourcesChain`
2023-03-22 19:39:57 -07:00
Eric Zhu
273e9bf296
Simplify AzureChatOpenAI implementation. (#1902)
Change AzureChatOpenAI class implementation as Azure just added support
for chat completion API. See:
https://learn.microsoft.com/en-us/azure/cognitive-services/openai/how-to/chatgpt?pivots=programming-language-chat-completions.
This should make the code much simpler.
2023-03-22 19:36:51 -07:00
Maurício Maia
f155d9d3ec
Add metadata filter to PGVector search (#1872)
Add ability to filter pgvector documents by metadata.
2023-03-22 15:21:40 -07:00
Klein Tahiraj
d3d4503ce2
Remove redundant .docx loader (closes #1716) + update how_to_guides.rst (#1891)
In https://github.com/hwchase17/langchain/issues/1716 , it was
identified that there were two .py files performing similar tasks. As a
resolution, one of the files has been removed, as its purpose had
already been fulfilled by the other file. Additionally, the init has
been updated accordingly.

Furthermore, the how_to_guides.rst file has been updated to include
links to documentation that was previously missing. This was deemed
necessary as the existing list on
https://langchain.readthedocs.io/en/latest/modules/document_loaders/how_to_guides.html
was incomplete, causing confusion for users who rely on the full list of
documentation on the left sidebar of the website.
2023-03-22 15:19:42 -07:00
Harrison Chase
1f93c5cf69
extraction docs (#1898) 2023-03-22 15:00:44 -07:00
Sean Zheng
15b5a08f4b
Update how_to_guides.rst (#1893)
Adding OpenSearch examples
2023-03-22 14:30:43 -07:00
Kushal Chordiya
ff4a25b841
Fix minor bug in opensearch vector store add_texts function (#1878)
In the langchain.vectorstores.opensearch_vector_search.py, in the
add_texts function, around line 247, we have the following code

```python
embeddings = [
     self.embedding_function.embed_documents(list(text))[0] for text in texts
]
```

the goal of the `list(text)` part I believe is to pass a list to the
embed_documents list instead of a a str. However, `list(text)` is a
subtle bug

`list(text)` would convert the string text into an array, where each
element of the array is a character of the string

<img width="937" alt="Screenshot 2023-03-22 at 1 27 18 PM"
src="https://user-images.githubusercontent.com/88190553/226836470-384665a1-2f13-46bc-acfc-9a37417cd918.png">

The correct way should be to change the code to 

```python
embeddings = [
      self.embedding_function.embed_documents([text])[0] for text in texts
]
```
Which wraps the string inside a list.
2023-03-22 11:27:32 -07:00
Maurício Maia
2212520a6c
Add PGVector collection metadata (#1887)
The `CollectionStore` for `PGVector` has a `cmetadata` field but it's
never used. This PR add the ability to save metadata information to the
collection.
2023-03-22 11:27:07 -07:00
Harrison Chase
d08f940336
principles list (#1888) 2023-03-22 10:48:38 -07:00
Harrison Chase
2280a2cb2f
bump version to 119 (#1886) 2023-03-22 08:36:09 -07:00
Harrison Chase
ce5d97bcb3
Harrison/guarded output parser (#1804)
Co-authored-by: jerwelborn <jeremy.welborn@gmail.com>
2023-03-21 22:07:23 -07:00
DeadBranch
8fa1764c60
docs: update gpt index references to LlamaIndex (#1856)
The GPT Index project is transitioning to the new project name,
LlamaIndex.

I've updated a few files referencing the old project name and repository
URL to the current ones.

From the [LlamaIndex repo](https://github.com/jerryjliu/llama_index):
> NOTE: We are rebranding GPT Index as LlamaIndex! We will carry out
this transition gradually.
>
> 2/25/2023: By default, our docs/notebooks/instructions now reference
"LlamaIndex" instead of "GPT Index".
>
> 2/19/2023: By default, our docs/notebooks/instructions now use the
llama-index package. However the gpt-index package still exists as a
duplicate!
>
> 2/16/2023: We have a duplicate llama-index pip package. Simply replace
all imports of gpt_index with llama_index if you choose to pip install
llama-index.

I'm not associated with LlamaIndex in any way. I just noticed the
discrepancy when studying the lanchain documentation.
2023-03-21 22:01:05 -07:00
Harrison Chase
f299bd1416
clean up sagemaker nb (#1875) 2023-03-21 22:00:08 -07:00