mirror of
https://github.com/nomic-ai/gpt4all
synced 2024-11-04 12:00:10 +00:00
Improved localdocs documentation (#762)
* Improved localdocs documentation * Improved localdocs documentation * Improved localdocs documentation * Improved localdocs documentation
This commit is contained in:
parent
02290fd881
commit
6ed9c1a8d8
@ -9,30 +9,53 @@ It is optimized to run 7-13B parameter LLMs on the CPU's of any computer running
|
|||||||
GPT4All Chat Plugins allow you to expand the capabilities of Local LLMs. All plugins are compatible with the
|
GPT4All Chat Plugins allow you to expand the capabilities of Local LLMs. All plugins are compatible with the
|
||||||
chat clients server mode.
|
chat clients server mode.
|
||||||
|
|
||||||
### LocalDocs Plugin (Chat With Your Data)
|
### LocalDocs Plugin (Chat With Your Data, PrivateGPT)
|
||||||
LocalDocs is a GPT4All plugin that allows you to chat with your local files and data.
|
LocalDocs is a GPT4All plugin that allows you to chat with your local files and data.
|
||||||
It allows you to utilize powerful local LLMs to chat with private data without any data leaving your computer or server.
|
It allows you to utilize powerful local LLMs to chat with private data without any data leaving your computer or server.
|
||||||
When using LocalDocs, your LLM will cite the sources that most likely contributed to a given output. Note, even an LLM equipped with LocalDocs can hallucinate.
|
When using LocalDocs, your LLM will cite the sources that most likely contributed to a given output. Note, even an LLM equipped with LocalDocs can hallucinate.
|
||||||
|
|
||||||
#### Enabling LocalDocs
|
#### Enabling LocalDocs
|
||||||
1. Install the latest version of GPT4All Chat from https://gpt4all.io.
|
1. Install the latest version of GPT4All Chat from [GPT4All Website](https://gpt4all.io).
|
||||||
2. Go to `Settings > the LocalDocs tab`.
|
2. Go to `Settings > LocalDocs tab`.
|
||||||
3. Configure a collection (folder) on your computer that contains the files your LLM should have access to. You can alter the contents of the folder/directory at anytime. As you
|
3. Configure a collection (folder) on your computer that contains the files your LLM should have access to. You can alter the contents of the folder/directory at anytime. As you
|
||||||
add more files to your collection, your LLM will dynamically be able to access them.
|
add more files to your collection, your LLM will dynamically be able to access them.
|
||||||
4. Spin up a chat session with any LLM (including external ones like ChatGPT but warning data will leave your machine!)
|
4. Spin up a chat session with any LLM (including external ones like ChatGPT but warning data will leave your machine!)
|
||||||
5. At the top right, click the database icon and select which collection you want your LLM to know about.
|
5. At the top right, click the database icon and select which collection you want your LLM to know about during your chat session.
|
||||||
6. Start chatting!
|
|
||||||
|
|
||||||
### How it works
|
|
||||||
|
|
||||||
|
#### How LocalDocs Works
|
||||||
LocalDocs works by maintaining an index of all data in the directory your collection is linked to. This index
|
LocalDocs works by maintaining an index of all data in the directory your collection is linked to. This index
|
||||||
consists of small chunks of each document that the LLM can receive as additional input when you ask it a question.
|
consists of small chunks of each document that the LLM can receive as additional input when you ask it a question.
|
||||||
This helps it respond to your queries with knowledge about the contents of your data.
|
The general technique this plugin uses is called [Retrieval Augmented Generation](https://arxiv.org/abs/2005.11401).
|
||||||
The number of chunks and the size of each chunk can be configured in the LocalDocs plugin settings tab.
|
|
||||||
For indexing speed purposes, LocalDocs uses pre-deep-learning n-gram and tfidf based retrieval when deciding
|
|
||||||
what documents your LLM should have as context in response to a question. You'll find its of comparable quality
|
|
||||||
with embedding based retrieval approaches but magnitudes faster to ingest data. Don't worry, embedding based semantic
|
|
||||||
search for retrieval is on the roadmap for those with more powerful computers - pick up the feature on Github!
|
|
||||||
|
|
||||||
|
These document chunks help your LLM respond to queries with knowledge about the contents of your data.
|
||||||
|
The number of chunks and the size of each chunk can be configured in the LocalDocs plugin settings tab.
|
||||||
|
For indexing speed purposes, LocalDocs uses pre-deep-learning n-gram and TF-IDF based retrieval when deciding
|
||||||
|
what document chunks your LLM should use as context. You'll find its of comparable quality
|
||||||
|
with embedding based retrieval approaches but magnitudes faster to ingest data.
|
||||||
|
|
||||||
|
LocalDocs supports the following file types:
|
||||||
|
```json
|
||||||
|
["txt", "doc", "docx", "pdf", "rtf", "odt", "html", "htm",
|
||||||
|
"xls", "xlsx", "csv", "ods", "ppt", "pptx", "odp", "xml", "json", "log", "md", "tex", "asc", "wks",
|
||||||
|
"wpd", "wps", "wri", "xhtml", "xht", "xslt", "yaml", "yml", "dtd", "sgml", "tsv", "strings", "resx",
|
||||||
|
"plist", "properties", "ini", "config", "bat", "sh", "ps1", "cmd", "awk", "sed", "vbs", "ics", "mht",
|
||||||
|
"mhtml", "epub", "djvu", "azw", "azw3", "mobi", "fb2", "prc", "lit", "lrf", "tcr", "pdb", "oxps",
|
||||||
|
"xps", "pages", "numbers", "key", "keynote", "abw", "zabw", "123", "wk1", "wk3", "wk4", "wk5", "wq1",
|
||||||
|
"wq2", "xlw", "xlr", "dif", "slk", "sylk", "wb1", "wb2", "wb3", "qpw", "wdb", "wks", "wku", "wr1",
|
||||||
|
"wrk", "xlk", "xlt", "xltm", "xltx", "xlsm", "xla", "xlam", "xll", "xld", "xlv", "xlw", "xlc", "xlm",
|
||||||
|
"xlt", "xln"]
|
||||||
|
```
|
||||||
|
|
||||||
|
#### LocalDocs Limitations
|
||||||
|
LocalDocs allows your LLM to have context about the contents of your documentation collection. LocalDocs currently cannot:
|
||||||
|
- Answer metadata queries about your documents (e.g. `What documents do you know about?`)
|
||||||
|
- Summarize *all* of your documents. It can however write a summary informed by the contents of your documents.
|
||||||
|
|
||||||
|
#### LocalDocs Roadmap
|
||||||
|
- Embedding based semantic search for retrieval.
|
||||||
|
- Customize model fine-tuned with retrieval in the loop.
|
||||||
|
|
||||||
## Server Mode
|
## Server Mode
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user