### Summary
Adds a document loader for MS Word Documents. Works with both `.docx`
and `.doc` files as longer as the user has installed
`unstructured>=0.4.11`.
### Testing
The follow workflow test the loader for both `.doc` and `.docx` files
using example docs from the `unstructured` repo.
#### `.docx`
```python
from langchain.document_loaders import UnstructuredWordDocumentLoader
filename = "../unstructured/example-docs/fake.docx"
loader = UnstructuredWordDocumentLoader(filename)
loader.load()
```
#### `.doc`
```python
from langchain.document_loaders import UnstructuredWordDocumentLoader
filename = "../unstructured/example-docs/fake.doc"
loader = UnstructuredWordDocumentLoader(filename)
loader.load()
```
`NotebookLoader.load()` loads the `.ipynb` notebook file into a
`Document` object.
**Parameters**:
* `include_outputs` (bool): whether to include cell outputs in the
resulting document (default is False).
* `max_output_length` (int): the maximum number of characters to include
from each cell output (default is 10).
* `remove_newline` (bool): whether to remove newline characters from the
cell sources and outputs (default is False).
* `traceback` (bool): whether to include full traceback (default is
False).
Link for easier navigation (it's not immediately clear where to find
more info on SimpleSequentialChain (3 clicks away)
---------
Co-authored-by: Larry Fisherman <l4rryfisherman@protonmail.com>
Added a GitBook document loader. It lets you both, (1) fetch text from
any single GitBook page, or (2) fetch all relative paths and return
their respective content in Documents.
I've modified the `scrape` method in the `WebBaseLoader` to accept
custom web paths if given, but happy to remove it and move that logic
into the `GitbookLoader` itself.
### Description
This PR adds a wrapper which adds support for the OpenSearch vector
database. Using opensearch-py client we are ingesting the embeddings of
given text into opensearch cluster using Bulk API. We can perform the
`similarity_search` on the index using the 3 popular searching methods
of OpenSearch k-NN plugin:
- `Approximate k-NN Search` use approximate nearest neighbor (ANN)
algorithms from the [nmslib](https://github.com/nmslib/nmslib),
[faiss](https://github.com/facebookresearch/faiss), and
[Lucene](https://lucene.apache.org/) libraries to power k-NN search.
- `Script Scoring` extends OpenSearch’s script scoring functionality to
execute a brute force, exact k-NN search.
- `Painless Scripting` adds the distance functions as painless
extensions that can be used in more complex combinations. Also, supports
brute force, exact k-NN search like Script Scoring.
### Issues Resolved
https://github.com/hwchase17/langchain/issues/1054
---------
Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
Follow-up of @hinthornw's PR:
- Migrate the Tool abstraction to a separate file (`BaseTool`).
- `Tool` implementation of `BaseTool` takes in function and coroutine to
more easily maintain backwards compatibility
- Add a Toolkit abstraction that can own the generation of tools around
a shared concept or state
---------
Co-authored-by: William FH <13333726+hinthornw@users.noreply.github.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Francisco Ingham <fpingham@gmail.com>
Co-authored-by: Dhruv Anand <105786647+dhruv-anand-aintech@users.noreply.github.com>
Co-authored-by: cragwolfe <cragcw@gmail.com>
Co-authored-by: Anton Troynikov <atroyn@users.noreply.github.com>
Co-authored-by: Oliver Klingefjord <oliver@klingefjord.com>
Co-authored-by: William Fu-Hinthorn <whinthorn@Williams-MBP-3.attlocal.net>
Co-authored-by: Bruno Bornsztein <bruno.bornsztein@gmail.com>
This is a work in progress PR to track my progres.
## TODO:
- [x] Get results using the specifed searx host
- [x] Prioritize returning an `answer` or results otherwise
- [ ] expose the field `infobox` when available
- [ ] expose `score` of result to help agent's decision
- [ ] expose the `suggestions` field to agents so they could try new
queries if no results are found with the orignial query ?
- [ ] Dynamic tool description for agents ?
- Searx offers many engines and a search syntax that agents can take
advantage of. It would be nice to generate a dynamic Tool description so
that it can be used many times as a tool but for different purposes.
- [x] Limit number of results
- [ ] Implement paging
- [x] Miror the usage of the Google Search tool
- [x] easy selection of search engines
- [x] Documentation
- [ ] update HowTo guide notebook on Search Tools
- [ ] Handle async
- [ ] Tests
### Add examples / documentation on possible uses with
- [ ] getting factual answers with `!wiki` option and `infoboxes`
- [ ] getting `suggestions`
- [ ] getting `corrections`
---------
Co-authored-by: blob42 <spike@w530>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Alternate implementation to PR #960 Again - only FAISS is implemented.
If accepted can add this to other vectorstores or leave as
NotImplemented? Suggestions welcome...
This PR updates `PromptLayerOpenAI` to now support requests using the
[Async
API](https://langchain.readthedocs.io/en/latest/modules/llms/async_llm.html)
It also updates the documentation on Async API to let users know that
PromptLayerOpenAI also supports this.
`PromptLayerOpenAI` now redefines `_agenerate` a similar was to how it
redefines `_generate`
Adds Google Search integration with [Serper](https://serper.dev) a
low-cost alternative to SerpAPI (10x cheaper + generous free tier).
Includes documentation, tests and examples. Hopefully I am not missing
anything.
Developers can sign up for a free account at
[serper.dev](https://serper.dev) and obtain an api key.
## Usage
```python
from langchain.utilities import GoogleSerperAPIWrapper
from langchain.llms.openai import OpenAI
from langchain.agents import initialize_agent, Tool
import os
os.environ["SERPER_API_KEY"] = ""
os.environ['OPENAI_API_KEY'] = ""
llm = OpenAI(temperature=0)
search = GoogleSerperAPIWrapper()
tools = [
Tool(
name="Intermediate Answer",
func=search.run
)
]
self_ask_with_search = initialize_agent(tools, llm, agent="self-ask-with-search", verbose=True)
self_ask_with_search.run("What is the hometown of the reigning men's U.S. Open champion?")
```
### Output
```
Entering new AgentExecutor chain...
Yes.
Follow up: Who is the reigning men's U.S. Open champion?
Intermediate answer: Current champions Carlos Alcaraz, 2022 men's singles champion.
Follow up: Where is Carlos Alcaraz from?
Intermediate answer: El Palmar, Spain
So the final answer is: El Palmar, Spain
> Finished chain.
'El Palmar, Spain'
```
This PR updates the usage instructions for PromptLayerOpenAI in
Langchain's documentation. The updated instructions provide more detail
and conform better to the style of other LLM integration documentation
pages.
No code changes were made in this PR, only improvements to the
documentation. This update will make it easier for users to understand
how to use `PromptLayerOpenAI`