Commit Graph

461 Commits (5ba1c7b6018837a2cbf61ae8e7f45c8f1e3e8f72)

Author SHA1 Message Date
Harrison Chase 5ba1c7b601 ruff ruff (#1203) 1 year ago
Harrison Chase 429af93cab fix imports (#1288) 1 year ago
Matt Robinson 3830d900bf feat: document loader for MS Word documents (#1282)
### Summary

Adds a document loader for MS Word Documents. Works with both `.docx`
and `.doc` files as longer as the user has installed
`unstructured>=0.4.11`.

### Testing

The follow workflow test the loader for both `.doc` and `.docx` files
using example docs from the `unstructured` repo.

#### `.docx`

```python
from langchain.document_loaders import UnstructuredWordDocumentLoader

filename = "../unstructured/example-docs/fake.docx"
loader = UnstructuredWordDocumentLoader(filename)
loader.load()
```

#### `.doc`

```python
from langchain.document_loaders import UnstructuredWordDocumentLoader

filename = "../unstructured/example-docs/fake.doc"
loader = UnstructuredWordDocumentLoader(filename)
loader.load()
```
1 year ago
Harrison Chase cc10f1fe9c cleanup (#1274) 1 year ago
Harrison Chase cf8cb58b3a Harrison/cohere params (#1278)
Co-authored-by: Stefano Faraggi <40745694+stepp1@users.noreply.github.com>
1 year ago
Harrison Chase 42ddf44f86 Harrison/logprobs (#1279)
Co-authored-by: Prateek Shah <97124740+prateekspanning@users.noreply.github.com>
1 year ago
Harrison Chase b592b4d3f6 Harrison/fb loader (#1277)
Co-authored-by: Vairo Di Pasquale <vairo.dp@gmail.com>
1 year ago
Harrison Chase 91584382e3 Harrison/errors (#1276)
Co-authored-by: Kevin Huo <5000881+kwhuo68@users.noreply.github.com>
1 year ago
Klein Tahiraj 24f02aa39a adding .ipynb loader and documentation Fixes #1248 (#1252)
`NotebookLoader.load()` loads the `.ipynb` notebook file into a
`Document` object.

**Parameters**:

* `include_outputs` (bool): whether to include cell outputs in the
resulting document (default is False).
* `max_output_length` (int): the maximum number of characters to include
from each cell output (default is 10).
* `remove_newline` (bool): whether to remove newline characters from the
cell sources and outputs (default is False).
* `traceback` (bool): whether to include full traceback (default is
False).
1 year ago
Harrison Chase 6cd37e308f Harrison/source docs (#1275)
Co-authored-by: Tushar Dhadiwal <tushardhadiwal@users.noreply.github.com>
1 year ago
Enrico Shippole 4dbea7e9e4 Add Writer, Banana, Modal, StochasticAI (#1270)
Add LLM wrappers and examples for Banana, Writer, Modal, Stochastic AI

Added rigid json format for Banana and Modal
1 year ago
blob42 04bdd22b9f searx: add `query_suffix` parameter (#1259)
- allows to build tools and dynamically inject extra searxh suffix in
  the query. example:
  `search.run("python library", query_suffix="site:github.com")`
 resulting query: `python library site:github.com`

Co-authored-by: blob42 <spike@w530>
1 year ago
Harrison Chase df37dd1be4 fix bug with length function (#1257) 1 year ago
Iskren Ivov Chernev 2ab4def6d3 Add DeepInfra LLM support (#1232)
DeepInfra is an Inference-as-a-Service provider. Add a simple wrapper
using HTTPS requests.
1 year ago
Satoru Sakamoto b248037053 fix to specific language transcript (#1231)
Currently youtube loader only seems to support English audio. 
Changed to load videos in the specified language.
1 year ago
Harrison Chase fcfb409dd3 add ifttt tool (#1244) 1 year ago
Jon Luo aec2bb84a8 Don't instruct LLM to use the LIMIT clause, which is incompatible with SQL Server (#1242)
The current prompt specifically instructs the LLM to use the `LIMIT`
clause. This will cause issues with MS SQL Server, which uses `SELECT
TOP` instead of `LIMIT`. The generated SQL will use `LIMIT`; the
instruction to "always limit... using the LIMIT clause" seems to
override the "create a syntactically correct mssql query to run"
portion. Reported here:
https://github.com/hwchase17/langchain/issues/1103#issuecomment-1441144224

I don't have access to a SQL Server instance to test, but removing that
part of the prompt in OpenAI Playground results in the correct `SELECT
TOP` syntax, whereas keeping it in results in the `LIMIT` clause, even
when instructing it to generate syntactically correct mssql. It's also
still correctly using `LIMIT` in my MariaDB database. I think in this
case we can assume that the model will select the appropriate method
based on the dialect specified.

In general, it would be nice to be able to test a suite of SQL dialects
for things like dialect-specific syntax and other issues we've run into
in the past, but I'm not quite sure how to best approach that yet.
1 year ago
Dennis Antela Martinez 985f36eb3f add aleph alpha llm (#1207)
Integrate Aleph Alpha's client into Langchain to provide access to the
luminous models - more info on latest benchmarks here:
https://www.aleph-alpha.com/luminous-performance-benchmarks
1 year ago
Klein Tahiraj ebb9e4087c Fixing typo in loading.py (#1235)
Just fixing a typo I found in loading.py
1 year ago
Jon Luo 8cad8c34cb fix sqlite internal tables breaking table_info (#1224)
With the current method used to get the SQL table info, sqlite internal
schema tables are being included and are not being handled correctly by
sqlalchemy because the columns have no types. This is easy to see with
the Chinook database:
```python
db = SQLDatabase.from_uri("sqlite:///Chinook.db")
print(db.table_info)
```
```python
...
sqlalchemy.exc.CompileError: (in table 'sqlite_sequence', column 'name'): Can't generate DDL for NullType(); did you forget to specify a type on this Column?
```

SQLAlchemy 2.0 [ignores these by
default](63d90b0f44/lib/sqlalchemy/dialects/sqlite/base.py (L856-L880)):

63d90b0f44/lib/sqlalchemy/dialects/sqlite/base.py (L2096-L2123)
1 year ago
djacobs7 043ce02906 Fix typo in constitutional_ai base.py (#1216)
Found a typo in the documentation code for the constitutional_ai module
1 year ago
blob42 ffeb00c82b searx: remove duplicate param (#1219)
Co-authored-by: blob42 <spike@w530>
1 year ago
Harrison Chase f3c92172f2 Harrison/unstructured io (#1200) 1 year ago
Harrison Chase df84f69b6c Harrison/updating docs (#1196) 1 year ago
Harrison Chase 5f3550437f rfc: callback changes (#1165)
conceptually, no reason a tool should know what an "agent action" is

unless any objections, can change in all callback handlers
1 year ago
Harrison Chase 0008c431fc catch networkx error (#1201) 1 year ago
Harrison Chase 529df2a2dd move serpapi wrapper (#1199)
Co-authored-by: Tim Asp <707699+timothyasp@users.noreply.github.com>
1 year ago
Konstantin Hebenstreit bfe6038c3b HuggingFaceEndpoint: Correct Example for ImportError (#1176)
When I try to import the Class HuggingFaceEndpoint I get an Import
Error: cannot import name 'HuggingFaceEndpoint' from 'langchain'.
(langchain version 0.0.88)
These two imports work fine: from langchain import HuggingFacePipeline
and from langchain import HuggingFaceHub.

So I corrected the import statement in the example. There is probably a
better solution to this, but this fixes the Error for me.
1 year ago
Harrison Chase 8cc3fd424f Harrison/add documents (#1197)
Co-authored-by: OmriNach <32659330+OmriNach@users.noreply.github.com>
1 year ago
Francisco Ingham f5c83eaef4 added ability to override default verbose and memory when load chain … (#1153)
It is useful to be able to specify `verbose` or `memory` while still
keeping the chain's overall structure.

---------

Co-authored-by: Francisco Ingham <>
1 year ago
Anton Troynikov 28ffe63136 Default Chroma collection name (#1198)
For persistence, it's convenient to have a default collection name which
gets used everywhere.
1 year ago
Dennis Antela Martinez 1053c94f17 add gitbook document loader (#1180)
Added a GitBook document loader. It lets you both, (1) fetch text from
any single GitBook page, or (2) fetch all relative paths and return
their respective content in Documents.

I've modified the `scrape` method in the `WebBaseLoader` to accept
custom web paths if given, but happy to remove it and move that logic
into the `GitbookLoader` itself.
1 year ago
William FH fb3c992749 Add a StdIn "Interaction" Tool (#1193)
Lets a chain prompt the user for more input as a part of its execution.
1 year ago
Naveen Tatikonda 8e2152c1d6 Add Support for OpenSearch Vector database (#1191)
### Description
This PR adds a wrapper which adds support for the OpenSearch vector
database. Using opensearch-py client we are ingesting the embeddings of
given text into opensearch cluster using Bulk API. We can perform the
`similarity_search` on the index using the 3 popular searching methods
of OpenSearch k-NN plugin:

- `Approximate k-NN Search` use approximate nearest neighbor (ANN)
algorithms from the [nmslib](https://github.com/nmslib/nmslib),
[faiss](https://github.com/facebookresearch/faiss), and
[Lucene](https://lucene.apache.org/) libraries to power k-NN search.
- `Script Scoring` extends OpenSearch’s script scoring functionality to
execute a brute force, exact k-NN search.
- `Painless Scripting` adds the distance functions as painless
extensions that can be used in more complex combinations. Also, supports
brute force, exact k-NN search like Script Scoring.

### Issues Resolved 
https://github.com/hwchase17/langchain/issues/1054

---------

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
1 year ago
Andrew White f0f9f276bd Allow k to be higher than doc size in max_marginal_relevance_search (#1187)
Fixes issue #1186. For some reason, #1117 didn't seem to fix it.
1 year ago
Zach Schillaci 06d1af114a Refactor some loops into list comprehensions (#1185) 1 year ago
Harrison Chase c42f19d681 clean up loaders (#1178) 1 year ago
blob42 9962bda70b
searx_search: docs updates (#1175)
- fix notebook formatting, remove empty cells and add scrolling for long
text

---------

Co-authored-by: blob42 <spike@w530>
1 year ago
Harrison Chase 28781a6213
Harrison/markdown splitter (#1169)
Co-authored-by: Michael Chen <flamingdescent@gmail.com>
Co-authored-by: Michael Chen <michaelchen@stripe.com>
1 year ago
Harrison Chase 37dd34bea5
fix path (#1168) 1 year ago
Ji ed37fbaeff
for ChatVectorDBChain, add top_k_docs_for_context to allow control how many chunks of context will be retrieved (#1155)
given that we allow user define chunk size, think it would be useful for
user to define how many chunks of context will be retrieved.
1 year ago
Harrison Chase 955c89fccb
pass in prompts to vectordbqa (#1158) 1 year ago
Harrison Chase 65cc81c479
directory loader improvements (#1162) 1 year ago
Harrison Chase 9d6d8f85da
Harrison/self hosted runhouse (#1154)
Co-authored-by: Donny Greenberg <dongreenberg2@gmail.com>
Co-authored-by: John Dagdelen <jdagdelen@users.noreply.github.com>
Co-authored-by: Harrison Chase <harrisonchase@Harrisons-MBP.attlocal.net>
Co-authored-by: Andrew White <white.d.andrew@gmail.com>
Co-authored-by: Peng Qu <82029664+pengqu123@users.noreply.github.com>
Co-authored-by: Matt Robinson <mthw.wm.robinson@gmail.com>
Co-authored-by: jeff <tangj1122@gmail.com>
Co-authored-by: Harrison Chase <harrisonchase@Harrisons-MacBook-Pro.local>
Co-authored-by: zanderchase <zander@unfold.ag>
Co-authored-by: Charles Frye <cfrye59@gmail.com>
Co-authored-by: zanderchase <zanderchase@gmail.com>
Co-authored-by: Shahriar Tajbakhsh <sh.tajbakhsh@gmail.com>
Co-authored-by: Stefan Keselj <skeselj@princeton.edu>
Co-authored-by: Francisco Ingham <fpingham@gmail.com>
Co-authored-by: Dhruv Anand <105786647+dhruv-anand-aintech@users.noreply.github.com>
Co-authored-by: cragwolfe <cragcw@gmail.com>
Co-authored-by: Anton Troynikov <atroyn@users.noreply.github.com>
Co-authored-by: William FH <13333726+hinthornw@users.noreply.github.com>
Co-authored-by: Oliver Klingefjord <oliver@klingefjord.com>
Co-authored-by: blob42 <contact@blob42.xyz>
Co-authored-by: blob42 <spike@w530>
Co-authored-by: Enrico Shippole <henryshippole@gmail.com>
Co-authored-by: Ibis Prevedello <ibiscp@gmail.com>
Co-authored-by: jped <jonathanped@gmail.com>
Co-authored-by: Justin Torre <justintorre75@gmail.com>
Co-authored-by: Ivan Vendrov <ivan@anthropic.com>
Co-authored-by: Sasmitha Manathunga <70096033+mmz-001@users.noreply.github.com>
Co-authored-by: Ankush Gola <9536492+agola11@users.noreply.github.com>
Co-authored-by: Matt Robinson <mrobinson@unstructuredai.io>
Co-authored-by: Jeff Huber <jeffchuber@gmail.com>
Co-authored-by: Akshay <64036106+akshayvkt@users.noreply.github.com>
Co-authored-by: Andrew Huang <jhuang16888@gmail.com>
Co-authored-by: rogerserper <124558887+rogerserper@users.noreply.github.com>
Co-authored-by: seanaedmiston <seane999@gmail.com>
Co-authored-by: Hasegawa Yuya <52068175+Hase-U@users.noreply.github.com>
Co-authored-by: Ivan Vendrov <ivendrov@gmail.com>
Co-authored-by: Chen Wu (吴尘) <henrychenwu@cmu.edu>
Co-authored-by: Dennis Antela Martinez <dennis.antela@gmail.com>
Co-authored-by: Maxime Vidal <max.vidal@hotmail.fr>
Co-authored-by: Rishabh Raizada <110235735+rishabh-ti@users.noreply.github.com>
1 year ago
CG80499 af8f5c1a49
Added constitutional chain. (#1147)
- Added self-critique constitutional chain based on this
[paper](https://www.anthropic.com/constitutional.pdf).
1 year ago
Harrison Chase a83ba44efa
Harrison/ver0089 (#1144) 1 year ago
Ankush Gola 7b5e160d28
Make Tools own model, add ToolKit Concept (#1095)
Follow-up of @hinthornw's PR:

- Migrate the Tool abstraction to a separate file (`BaseTool`).
- `Tool` implementation of `BaseTool` takes in function and coroutine to
more easily maintain backwards compatibility
- Add a Toolkit abstraction that can own the generation of tools around
a shared concept or state

---------

Co-authored-by: William FH <13333726+hinthornw@users.noreply.github.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Francisco Ingham <fpingham@gmail.com>
Co-authored-by: Dhruv Anand <105786647+dhruv-anand-aintech@users.noreply.github.com>
Co-authored-by: cragwolfe <cragcw@gmail.com>
Co-authored-by: Anton Troynikov <atroyn@users.noreply.github.com>
Co-authored-by: Oliver Klingefjord <oliver@klingefjord.com>
Co-authored-by: William Fu-Hinthorn <whinthorn@Williams-MBP-3.attlocal.net>
Co-authored-by: Bruno Bornsztein <bruno.bornsztein@gmail.com>
1 year ago
Harrison Chase 45b5640fe5
fix sql (#1141) 1 year ago
kekayan 9111f4ca8a
fix chatvectordbchain to use pinecone namespace (#1139)
In the similarity search, the pinecone namespace is not used, which
makes the bot return _I don't know_ where the embeddings are stored in
the pinecone namespace. Now we can query by passing the namespace
optionally.
```result = qa({"question": query, "chat_history": chat_history, "namespace":"01gshyhjcfgkq1q5wxjtm17gjh"})```
1 year ago
Harrison Chase fb3c73d194
add srt loader (#1140) 1 year ago