Commit Graph

680 Commits (5ba1c7b6018837a2cbf61ae8e7f45c8f1e3e8f72)
 

Author SHA1 Message Date
Harrison Chase 5ba1c7b601 ruff ruff (#1203) 1 year ago
Harrison Chase 429af93cab fix imports (#1288) 1 year ago
Harrison Chase b11a399f74 bump version to 0094 (#1280) 1 year ago
Matt Robinson 3830d900bf feat: document loader for MS Word documents (#1282)
### Summary

Adds a document loader for MS Word Documents. Works with both `.docx`
and `.doc` files as longer as the user has installed
`unstructured>=0.4.11`.

### Testing

The follow workflow test the loader for both `.doc` and `.docx` files
using example docs from the `unstructured` repo.

#### `.docx`

```python
from langchain.document_loaders import UnstructuredWordDocumentLoader

filename = "../unstructured/example-docs/fake.docx"
loader = UnstructuredWordDocumentLoader(filename)
loader.load()
```

#### `.doc`

```python
from langchain.document_loaders import UnstructuredWordDocumentLoader

filename = "../unstructured/example-docs/fake.doc"
loader = UnstructuredWordDocumentLoader(filename)
loader.load()
```
1 year ago
Harrison Chase cc10f1fe9c cleanup (#1274) 1 year ago
Harrison Chase cf8cb58b3a Harrison/cohere params (#1278)
Co-authored-by: Stefano Faraggi <40745694+stepp1@users.noreply.github.com>
1 year ago
Harrison Chase 42ddf44f86 Harrison/logprobs (#1279)
Co-authored-by: Prateek Shah <97124740+prateekspanning@users.noreply.github.com>
1 year ago
Harrison Chase b592b4d3f6 Harrison/fb loader (#1277)
Co-authored-by: Vairo Di Pasquale <vairo.dp@gmail.com>
1 year ago
Harrison Chase 91584382e3 Harrison/errors (#1276)
Co-authored-by: Kevin Huo <5000881+kwhuo68@users.noreply.github.com>
1 year ago
Klein Tahiraj 24f02aa39a adding .ipynb loader and documentation Fixes #1248 (#1252)
`NotebookLoader.load()` loads the `.ipynb` notebook file into a
`Document` object.

**Parameters**:

* `include_outputs` (bool): whether to include cell outputs in the
resulting document (default is False).
* `max_output_length` (int): the maximum number of characters to include
from each cell output (default is 10).
* `remove_newline` (bool): whether to remove newline characters from the
cell sources and outputs (default is False).
* `traceback` (bool): whether to include full traceback (default is
False).
1 year ago
Harrison Chase 6cd37e308f Harrison/source docs (#1275)
Co-authored-by: Tushar Dhadiwal <tushardhadiwal@users.noreply.github.com>
1 year ago
Enrico Shippole 4dbea7e9e4 Add Writer, Banana, Modal, StochasticAI (#1270)
Add LLM wrappers and examples for Banana, Writer, Modal, Stochastic AI

Added rigid json format for Banana and Modal
1 year ago
blob42 04bdd22b9f searx: add `query_suffix` parameter (#1259)
- allows to build tools and dynamically inject extra searxh suffix in
  the query. example:
  `search.run("python library", query_suffix="site:github.com")`
 resulting query: `python library site:github.com`

Co-authored-by: blob42 <spike@w530>
1 year ago
Harrison Chase df37dd1be4 fix bug with length function (#1257) 1 year ago
Matt Robinson a1f6421655 docs: remove nltk download steps (#1253)
### Summary

Updates the docs to remove the `nltk` download steps from
`unstructured`. As of `unstructured` `0.4.14`, this is handled
automatically in the relevant modules within `unstructured`.
1 year ago
Justin Torre a43cb5371f added caching and properties docs (#1255) 1 year ago
Harrison Chase 38f015bca8 bump version to 0093 (#1251) 1 year ago
Iskren Ivov Chernev 2ab4def6d3 Add DeepInfra LLM support (#1232)
DeepInfra is an Inference-as-a-Service provider. Add a simple wrapper
using HTTPS requests.
1 year ago
Dmitri Melikyan 23f792e760 docs: add Graphsignal ecosystem page (#1228)
Adds a Graphsignal ecosystem page
1 year ago
Satoru Sakamoto b248037053 fix to specific language transcript (#1231)
Currently youtube loader only seems to support English audio. 
Changed to load videos in the specified language.
1 year ago
Harrison Chase fcfb409dd3 add ifttt tool (#1244) 1 year ago
Jon Luo aec2bb84a8 Don't instruct LLM to use the LIMIT clause, which is incompatible with SQL Server (#1242)
The current prompt specifically instructs the LLM to use the `LIMIT`
clause. This will cause issues with MS SQL Server, which uses `SELECT
TOP` instead of `LIMIT`. The generated SQL will use `LIMIT`; the
instruction to "always limit... using the LIMIT clause" seems to
override the "create a syntactically correct mssql query to run"
portion. Reported here:
https://github.com/hwchase17/langchain/issues/1103#issuecomment-1441144224

I don't have access to a SQL Server instance to test, but removing that
part of the prompt in OpenAI Playground results in the correct `SELECT
TOP` syntax, whereas keeping it in results in the `LIMIT` clause, even
when instructing it to generate syntactically correct mssql. It's also
still correctly using `LIMIT` in my MariaDB database. I think in this
case we can assume that the model will select the appropriate method
based on the dialect specified.

In general, it would be nice to be able to test a suite of SQL dialects
for things like dialect-specific syntax and other issues we've run into
in the past, but I'm not quite sure how to best approach that yet.
1 year ago
Harrison Chase f070d29934 Update key_concepts.md (#1209) (#1237)
Link for easier navigation (it's not immediately clear where to find
more info on SimpleSequentialChain (3 clicks away)

---------

Co-authored-by: Larry Fisherman <l4rryfisherman@protonmail.com>
1 year ago
Dennis Antela Martinez 985f36eb3f add aleph alpha llm (#1207)
Integrate Aleph Alpha's client into Langchain to provide access to the
luminous models - more info on latest benchmarks here:
https://www.aleph-alpha.com/luminous-performance-benchmarks
1 year ago
Klein Tahiraj ebb9e4087c Fixing typo in loading.py (#1235)
Just fixing a typo I found in loading.py
1 year ago
Ikko Eltociear Ashimine fcf43461a7 Update petals.md (#1225)
Huggingface -> Hugging Face
1 year ago
Jon Luo 8cad8c34cb fix sqlite internal tables breaking table_info (#1224)
With the current method used to get the SQL table info, sqlite internal
schema tables are being included and are not being handled correctly by
sqlalchemy because the columns have no types. This is easy to see with
the Chinook database:
```python
db = SQLDatabase.from_uri("sqlite:///Chinook.db")
print(db.table_info)
```
```python
...
sqlalchemy.exc.CompileError: (in table 'sqlite_sequence', column 'name'): Can't generate DDL for NullType(); did you forget to specify a type on this Column?
```

SQLAlchemy 2.0 [ignores these by
default](63d90b0f44/lib/sqlalchemy/dialects/sqlite/base.py (L856-L880)):

63d90b0f44/lib/sqlalchemy/dialects/sqlite/base.py (L2096-L2123)
1 year ago
djacobs7 043ce02906 Fix typo in constitutional_ai base.py (#1216)
Found a typo in the documentation code for the constitutional_ai module
1 year ago
Sason eaea19f959 Correct typo in "Question Answering" How-To Guide (#1221) 1 year ago
blob42 ffeb00c82b searx: remove duplicate param (#1219)
Co-authored-by: blob42 <spike@w530>
1 year ago
Harrison Chase f59e542742 bump version 0092 (#1204) 1 year ago
Matt Robinson 52e59e9fde docs: add quotes to `unstructured[local-inference]` install instructions (#1208)
### Summary

Corrects the install instruction for local inference to `pip install
"unstructured[local-inference]"`
1 year ago
Harrison Chase e1d3551f92 add docs for chroma persistance (#1202) 1 year ago
Harrison Chase f3c92172f2 Harrison/unstructured io (#1200) 1 year ago
Harrison Chase df84f69b6c Harrison/updating docs (#1196) 1 year ago
Harrison Chase 5f3550437f rfc: callback changes (#1165)
conceptually, no reason a tool should know what an "agent action" is

unless any objections, can change in all callback handlers
1 year ago
Harrison Chase 0008c431fc catch networkx error (#1201) 1 year ago
Harrison Chase 529df2a2dd move serpapi wrapper (#1199)
Co-authored-by: Tim Asp <707699+timothyasp@users.noreply.github.com>
1 year ago
Konstantin Hebenstreit bfe6038c3b HuggingFaceEndpoint: Correct Example for ImportError (#1176)
When I try to import the Class HuggingFaceEndpoint I get an Import
Error: cannot import name 'HuggingFaceEndpoint' from 'langchain'.
(langchain version 0.0.88)
These two imports work fine: from langchain import HuggingFacePipeline
and from langchain import HuggingFaceHub.

So I corrected the import statement in the example. There is probably a
better solution to this, but this fixes the Error for me.
1 year ago
Harrison Chase 8cc3fd424f Harrison/add documents (#1197)
Co-authored-by: OmriNach <32659330+OmriNach@users.noreply.github.com>
1 year ago
Francisco Ingham f5c83eaef4 added ability to override default verbose and memory when load chain … (#1153)
It is useful to be able to specify `verbose` or `memory` while still
keeping the chain's overall structure.

---------

Co-authored-by: Francisco Ingham <>
1 year ago
Anton Troynikov 28ffe63136 Default Chroma collection name (#1198)
For persistence, it's convenient to have a default collection name which
gets used everywhere.
1 year ago
Dennis Antela Martinez 1053c94f17 add gitbook document loader (#1180)
Added a GitBook document loader. It lets you both, (1) fetch text from
any single GitBook page, or (2) fetch all relative paths and return
their respective content in Documents.

I've modified the `scrape` method in the `WebBaseLoader` to accept
custom web paths if given, but happy to remove it and move that logic
into the `GitbookLoader` itself.
1 year ago
William FH fb3c992749 Add a StdIn "Interaction" Tool (#1193)
Lets a chain prompt the user for more input as a part of its execution.
1 year ago
Naveen Tatikonda 8e2152c1d6 Add Support for OpenSearch Vector database (#1191)
### Description
This PR adds a wrapper which adds support for the OpenSearch vector
database. Using opensearch-py client we are ingesting the embeddings of
given text into opensearch cluster using Bulk API. We can perform the
`similarity_search` on the index using the 3 popular searching methods
of OpenSearch k-NN plugin:

- `Approximate k-NN Search` use approximate nearest neighbor (ANN)
algorithms from the [nmslib](https://github.com/nmslib/nmslib),
[faiss](https://github.com/facebookresearch/faiss), and
[Lucene](https://lucene.apache.org/) libraries to power k-NN search.
- `Script Scoring` extends OpenSearch’s script scoring functionality to
execute a brute force, exact k-NN search.
- `Painless Scripting` adds the distance functions as painless
extensions that can be used in more complex combinations. Also, supports
brute force, exact k-NN search like Script Scoring.

### Issues Resolved 
https://github.com/hwchase17/langchain/issues/1054

---------

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
1 year ago
Andrew White f0f9f276bd Allow k to be higher than doc size in max_marginal_relevance_search (#1187)
Fixes issue #1186. For some reason, #1117 didn't seem to fix it.
1 year ago
Zach Schillaci 06d1af114a Refactor some loops into list comprehensions (#1185) 1 year ago
Harrison Chase d9a2a51f13 Harrison/text splitter docs (#1188) 1 year ago
Harrison Chase 9fb6bcd672 clean up text splitting docs (#1184) 1 year ago
Harrison Chase 910340cdea bump version to 0091 (#1181) 1 year ago