Readme typos (#409)

I was honored by the twitter mention, so used PyCharm to try and... help
docs even a little bit.
Mostly typo-s and correct spellings. 

PyCharm really complains about "really good" being used all the time and
recommended alternative wordings haha
harrison/map-rerank
altryne 1 year ago committed by GitHub
parent 2ad285aab2
commit f990395211
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -2,12 +2,12 @@
## Overview
Language models are trained on large amounts of unstructured data, which makes them really good at general purpose text generation. However, there are many instances where you may want the language model to generate text based not on generic data but rather on specific data. Some common examples of this include:
Language models are trained on large amounts of unstructured data, which makes them fantastic at general purpose text generation. However, there are many instances where you may want the language model to generate text based not on generic data but rather on specific data. Some common examples of this include:
- Summarization of a specific piece of text (a website, a private document, etc)
- Question answering over a specific piece of text (a website, a private document, etc)
- Question answering over multiple pieces of text (multiple websites, multiple private documents, etc)
- Using the results of some external call to an API (results from a SQL query, etc)
- Summarization of a specific piece of text (a website, a private document, etc.)
- Question answering over a specific piece of text (a website, a private document, etc.)
- Question answering over multiple pieces of text (multiple websites, multiple private documents, etc.)
- Using the results of some external call to an API (results from a SQL query, etc.)
All of these examples are instances when you do not want the LLM to generate text based solely on the data it was trained over, but rather you want it to incorporate other external data in some way. At a high level, this process can be broken down into two steps:
@ -25,7 +25,7 @@ This paper introduces RAG models where the parametric memory is a pre-trained se
**[REALM](https://arxiv.org/abs/2002.08909):** Retrieval-Augmented Language Model Pre-Training.
To capture knowledge in a more modular and interpretable way, this paper augments language model pre-training with a latent knowledge retriever, which allows the model to retrieve and attend over documents from a large corpus such as Wikipedia, used during pre-training, fine-tuning and inference.
**[HayStack](https://haystack.deepset.ai/):** This is not a paper, but rather an open source library aimed at semantic search, question answering, summarization, and document ranking for a wide range of NLP applications. The underpinnings of this library are focused on the same `fetching` and `augmenting` concepts discussed here, and incorporate some of the methods in the above papers.
**[HayStack](https://haystack.deepset.ai/):** This is not a paper, but rather an open source library aimed at semantic search, question answering, summarization, and document ranking for a wide range of NLP applications. The underpinnings of this library are focused on the same `fetching` and `augmenting` concepts discussed here, and incorporate some methods in the above papers.
These papers/open-source projects are centered around retrieval of documents, which is important for question-answering tasks over a large corpus of documents (which is how they are evaluated). However, we use the terminology of `Data Augmented Generation` to highlight that retrieval from some document store is only one possible way of fetching relevant data to include. Other methods to fetch relevant data could involve hitting an API, querying a database, or just working with user provided data (eg a specific document that they want to summarize).
@ -50,7 +50,7 @@ to synthesize those results.
There are two big issues to deal with in fetching:
1. Fetching small enough pieces of information
2. Not fetching too many pieces of information (eg fetching only the most relevant pieces)
2. Not fetching too many pieces of information (e.g. fetching only the most relevant pieces)
### Text Splitting
One big issue with all of these methods is how to make sure you are working with pieces of text that are not too large.
@ -117,7 +117,7 @@ asking the LLM to refine the output based on the new document.
## Use Cases
LangChain supports the above three methods of augmenting LLMs with external data.
These methods can be used to underpin several common use cases and they are discussed below.
These methods can be used to underpin several common use cases, and they are discussed below.
For all three of these use cases, all three methods are supported.
It is important to note that a large part of these implementations is the prompts
that are used. We provide default prompts for all three use cases, but these can be configured.

@ -41,5 +41,5 @@ more information on the company and the person, and then write them a sales mess
### [Question-Answering on a Web Browser](https://twitter.com/chillzaza_/status/1592961099384905730?s=20&t=EhU8jl0KyCPJ7vE9Rnz-cQ)
By Zahid Khawaja, this demo utilizes question answering to answer questions about a given website.
A followup added this for [Youtube videos](https://twitter.com/chillzaza_/status/1593739682013220865?s=20&t=EhU8jl0KyCPJ7vE9Rnz-cQ),
A followup added this for [YouTube videos](https://twitter.com/chillzaza_/status/1593739682013220865?s=20&t=EhU8jl0KyCPJ7vE9Rnz-cQ),
and then another followup added it for [Wikipedia](https://twitter.com/chillzaza_/status/1594847151238037505?s=20&t=EhU8jl0KyCPJ7vE9Rnz-cQ).

@ -11,12 +11,12 @@ More complex iterations dynamically construct the template string from few shot
For a more detailed explanation of how LangChain approaches prompts and prompt templates, see [here](/examples/prompts/prompt_management).
## LLMs
Wrappers around Large Language Models (in particular, the `generate` ability of large language models) are some of the core functionality of LangChain.
Wrappers around Large Language Models (in particular, the `generate` ability of large language models) are at the core of LangChain functionality.
These wrappers are classes that are callable: they take in an input string, and return the generated output string.
## Embeddings
These classes are very similar to the LLM classes in that they are wrappers around models,
but rather than return a string they return an embedding (list of floats). This are particularly useful when
but rather than return a string they return an embedding (list of floats). These are particularly useful when
implementing semantic search functionality. They expose separate methods for embedding queries versus embedding documents.
## Vectorstores

@ -42,7 +42,7 @@ Resources:
### Prompt Chaining
Combining multiple LLM calls together, with the output of one step being the input to the next.
Combining multiple LLM calls together, with the output of one-step being the input to the next.
Resources:
- [PromptChainer Paper](https://arxiv.org/pdf/2203.06566.pdf)

@ -1,7 +1,7 @@
# Tools
Tools are functions that agents can use to interact with the world.
These tools can be generic utilities (eg search), other chains, or even other agents.
These tools can be generic utilities (e.g. search), other chains, or even other agents.
Currently, tools can be loaded with the following snippet:
@ -11,7 +11,7 @@ tool_names = [...]
tools = load_tools(tool_names)
```
Some tools (eg chains, agents) may require a base LLM to use to initialize them.
Some tools (e.g. chains, agents) may require a base LLM to use to initialize them.
In that case, you can pass in an LLM as well:
```python
@ -57,13 +57,13 @@ Below is a list of all supported tools and relevant information:
**pal-math**
- Tool Name: PAL-MATH
- Tool Description: A language model that is really good at solving complex word math problems. Input should be a fully worded hard word math problem.
- Tool Description: A language model that is excellent at solving complex word math problems. Input should be a fully worded hard word math problem.
- Notes: Based on [this paper](https://arxiv.org/pdf/2211.10435.pdf).
- Requires LLM: Yes
**pal-colored-objects**
- Tool Name: PAL-COLOR-OBJ
- Tool Description: A language model that is really good at reasoning about position and the color attributes of objects. Input should be a fully worded hard reasoning problem. Make sure to include all information about the objects AND the final question you want to answer.
- Tool Description: A language model that is wonderful at reasoning about position and the color attributes of objects. Input should be a fully worded hard reasoning problem. Make sure to include all information about the objects AND the final question you want to answer.
- Notes: Based on [this paper](https://arxiv.org/pdf/2211.10435.pdf).
- Requires LLM: Yes
@ -91,4 +91,4 @@ Below is a list of all supported tools and relevant information:
- Tool Description: Useful for when you want to get information from The Movie Database. The input should be a question in natural language that this API can answer.
- Notes: A natural language connection to the TMDB API (`https://api.themoviedb.org/3`), specifically the `/search/movie` endpoint.
- Requires LLM: Yes
- Extra Parameters: `tmdb_bearer_token` (your Bearer Token to access this endpoint - note that this is different than the API key)
- Extra Parameters: `tmdb_bearer_token` (your Bearer Token to access this endpoint - note that this is different from the API key)

@ -143,7 +143,7 @@ Start here if you haven't used LangChain before.
examples/memory.rst
examples/model_laboratory.ipynb
More elaborate examples and walk-throughs of particular
More elaborate examples and walkthroughs of particular
integrations and use cases. This is the place to look if you have questions
about how to integrate certain pieces, or if you want to find examples of
common tasks or cool demos.

@ -255,7 +255,7 @@ class Crawler:
meta_data = []
# inefficient to grab the same set of keys for kinds of objects but its fine for now
# inefficient to grab the same set of keys for kinds of objects, but it's fine for now
element_attributes = find_attributes(
attributes[index], ["type", "placeholder", "aria-label", "title", "alt"]
)

Loading…
Cancel
Save