iFixit is a wikipedia-like site that has a huge amount of open content
on how to fix things, questions/answers for common troubleshooting and
"things" related content that is more technical in nature. All content
is licensed under CC-BY-SA-NC 3.0
Adding docs from iFixit as context for user questions like "I dropped my
phone in water, what do I do?" or "My macbook pro is making a whining
noise, what's wrong with it?" can yield significantly better responses
than context free response from LLMs.
### Summary
Adds a document loader for image files such as `.jpg` and `.png` files.
### Testing
Run the following using the example document from the [`unstructured`
repo](https://github.com/Unstructured-IO/unstructured/tree/main/example-docs).
```python
from langchain.document_loaders.image import UnstructuredImageLoader
loader = UnstructuredImageLoader("layout-parser-paper-fast.jpg")
loader.load()
```
- run and exec_run need a separate default command. Run usually executes
a script while exec_run simulates an interactive session. The image
templates and run funcs have been upgraded to handle both
types of commands.
- test: make docker tests run when docker is installed and docker lib
avaialble.
- test that runsc runtime is used by default when gVisor is installed.
(manually removing gVisor skips the test)
- quickly run or exec_run commands with sane defaults
- wip image templates with parameters for common docker images
- shell escaping logic
- capture stdout+stderr for exec commands
- added minimal testing
nitpicking but just thought i'd add this typo which I found when going
through the How-to 😄 (unless it was intentional) also, it's amazing that
you added ReAct to LangChain!
Checking if weaviate similarity_search kwargs contains "certainty" and
use it accordingly. The minimal level of certainty must be a float, and
it is computed by normalized distance.
While using a `SQLiteCache`, if there are duplicate `(prompt, llm, idx)`
tuples passed to
[`update_cache()`](c5dd491a21/langchain/llms/base.py (L39)),
then an `IntegrityError` is thrown. This can happen when there are
duplicated prompts within the same batch.
This PR changes the SQLAlchemy `session.add()` to a `session.merge()` in
`cache.py`, [following the solution from this SO
thread](https://stackoverflow.com/questions/10322514/dealing-with-duplicate-primary-keys-on-insert-in-sqlalchemy-declarative-style).
I believe this fixes#983, but not entirely sure since that also
involves async
Here's a minimal example of the error:
```python
from pathlib import Path
import langchain
from langchain.cache import SQLiteCache
llm = langchain.OpenAI(model_name="text-ada-001", openai_api_key=Path("/.openai_api_key").read_text().strip())
langchain.llm_cache = SQLiteCache("test_cache.db")
llm.generate(['a'] * 5)
```
```
> IntegrityError: (sqlite3.IntegrityError) UNIQUE constraint failed: full_llm_cache.prompt, full_llm_cache.llm, full_llm_cache.idx
[SQL: INSERT INTO full_llm_cache (prompt, llm, idx, response) VALUES (?, ?, ?, ?)]
[parameters: ('a', "[('_type', 'openai'), ('best_of', 1), ('frequency_penalty', 0), ('logit_bias', {}), ('max_tokens', 256), ('model_name', 'text-ada-001'), ('n', 1), ('presence_penalty', 0), ('request_timeout', None), ('stop', None), ('temperature', 0.7), ('top_p', 1)]", 0, '\n\nA is for air.\n\nA is for atmosphere.')]
(Background on this error at: https://sqlalche.me/e/14/gkpj)
```
After the change, we now have the following
```python
class Output:
def __init__(self, text):
self.text = text
# make dummy data
cache = SQLiteCache("test_cache_2.db")
cache.update(prompt="prompt_0", llm_string="llm_0", return_val=[Output("text_0")])
cache.engine.execute("SELECT * FROM full_llm_cache").fetchall()
# output
> [('prompt_0', 'llm_0', 0, 'text_0')]
```
```python
# update data, before change this would have thrown an `IntegrityError`
cache.update(prompt="prompt_0", llm_string="llm_0", return_val=[Output("text_0_new")])
cache.engine.execute("SELECT * FROM full_llm_cache").fetchall()
# output
> [('prompt_0', 'llm_0', 0, 'text_0_new')]
```
Thanks for all your hard work!
I noticed a small typo in the bash util doc so here's a quick update.
Additionally, my formatter caught some spacing in the `.md` as well.
Happy to revert that if it's an issue.
The main change is just
```
- A common use case this is for letting it interact with your local file system.
+ A common use case for this is letting the LLM interact with your local file system.
```
## Testing
`make docs_build` succeeds locally and the changes show as expected ✌️
<img width="704" alt="image"
src="https://user-images.githubusercontent.com/17773666/221376160-e99e59a6-b318-49d1-a1d7-89f5c17cdab4.png">
I've added a simple
[CoNLL-U](https://universaldependencies.org/format.html) document
loader. CoNLL-U is a common format for NLP tasks and is used, for
example, in the Universal Dependencies treebank corpora. The loader
reads a single file in standard CoNLL-U format and returns a document.
### Summary
Adds a document loader for MS Word Documents. Works with both `.docx`
and `.doc` files as longer as the user has installed
`unstructured>=0.4.11`.
### Testing
The follow workflow test the loader for both `.doc` and `.docx` files
using example docs from the `unstructured` repo.
#### `.docx`
```python
from langchain.document_loaders import UnstructuredWordDocumentLoader
filename = "../unstructured/example-docs/fake.docx"
loader = UnstructuredWordDocumentLoader(filename)
loader.load()
```
#### `.doc`
```python
from langchain.document_loaders import UnstructuredWordDocumentLoader
filename = "../unstructured/example-docs/fake.doc"
loader = UnstructuredWordDocumentLoader(filename)
loader.load()
```
`NotebookLoader.load()` loads the `.ipynb` notebook file into a
`Document` object.
**Parameters**:
* `include_outputs` (bool): whether to include cell outputs in the
resulting document (default is False).
* `max_output_length` (int): the maximum number of characters to include
from each cell output (default is 10).
* `remove_newline` (bool): whether to remove newline characters from the
cell sources and outputs (default is False).
* `traceback` (bool): whether to include full traceback (default is
False).
### Summary
Updates the docs to remove the `nltk` download steps from
`unstructured`. As of `unstructured` `0.4.14`, this is handled
automatically in the relevant modules within `unstructured`.