**Description:**
I've added a new use-case to the Web scraping docs. I also fixed some
typos in the existing text.
---------
Co-authored-by: davidjohnbarton <41335923+davidjohnbarton@users.noreply.github.com>
The `/docs/integrations/tools/sqlite` page is not about the tool
integrations.
I've moved it into `/docs/use_cases/sql/sqlite`.
`vercel.json` modified
As a result two pages now under the `/docs/use_cases/sql/` folder. So
the `sql` root page moved down together with `sqlite` page.
- Description: Fixing Colab broken link and comment correction to align
with the code that uses Warren Buffet for wiki query
- Issue: None open
- Dependencies: none
- Tag maintainer: n/a
- Twitter handle: Not a PR change but: kcocco
Changes in:
- `create_sql_agent` function so that user can easily add custom tools
as complement for the toolkit.
- updating **sql use case** notebook to showcase 2 examples of extra
tools.
Motivation for these changes is having the possibility of including
domain expert knowledge to the agent, which improves accuracy and
reduces time/tokens.
---------
Co-authored-by: Manuel Soria <manuel.soria@greyscaleai.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
### Description
The feature for anonymizing data has been implemented. In order to
protect private data, such as when querying external APIs (OpenAI), it
is worth pseudonymizing sensitive data to maintain full privacy.
Anonynization consists of two steps:
1. **Identification:** Identify all data fields that contain personally
identifiable information (PII).
2. **Replacement**: Replace all PIIs with pseudo values or codes that do
not reveal any personal information about the individual but can be used
for reference. We're not using regular encryption, because the language
model won't be able to understand the meaning or context of the
encrypted data.
We use *Microsoft Presidio* together with *Faker* framework for
anonymization purposes because of the wide range of functionalities they
provide. The full implementation is available in `PresidioAnonymizer`.
### Future works
- **deanonymization** - add the ability to reverse anonymization. For
example, the workflow could look like this: `anonymize -> LLMChain ->
deanonymize`. By doing this, we will retain anonymity in requests to,
for example, OpenAI, and then be able restore the original data.
- **instance anonymization** - at this point, each occurrence of PII is
treated as a separate entity and separately anonymized. Therefore, two
occurrences of the name John Doe in the text will be changed to two
different names. It is therefore worth introducing support for full
instance detection, so that repeated occurrences are treated as a single
object.
### Twitter handle
@deepsense_ai / @MaksOpp
---------
Co-authored-by: MaksOpp <maks.operlejn@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
The Graph Chains are different in the way that it uses two LLMChains
instead of one like the retrievalQA chains. Therefore, sometimes you
want to use different LLM to generate the database query and to generate
the final answer.
This feature would make it more convenient to use different LLMs in the
same chain.
I have also renamed the Graph DB QA Chain to Neo4j DB QA Chain in the
documentation only as it is used only for Neo4j. The naming was
ambigious as it was the first graphQA chain added and wasn't sure how do
you want to spin it.
- Description: added graph_memgraph_qa.ipynb which shows how to use LLMs
to provide a natural language interface to a Memgraph database using
[MemgraphGraph](https://github.com/langchain-ai/langchain/pull/8591)
class.
- Dependencies: given that the notebook utilizes the MemgraphGraph
class, it relies on both this class and several Python packages that are
installed in the notebook using pip (langchain, openai, neo4j,
gqlalchemy). The notebook is dependent on having a functional Memgraph
instance running, as it requires this instance to establish a
connection.
The current Collab URL returns a 404, since there is no `chatbots`
directory under `use_cases`.
<!-- Thank you for contributing to LangChain!
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17, @rlancemartin.
-->
- Description: Fix a minor variable naming inconsistency in a code
snippet in the docs
- Issue: N/A
- Dependencies: none
- Tag maintainer: N/A
- Twitter handle: N/A
# Added SmartGPT workflow by providing SmartLLM wrapper around LLMs
Edit:
As @hwchase17 suggested, this should be a chain, not an LLM. I have
adapted the PR.
It is used like this:
```
from langchain.prompts import PromptTemplate
from langchain.chains import SmartLLMChain
from langchain.chat_models import ChatOpenAI
hard_question = "I have a 12 liter jug and a 6 liter jug. I want to measure 6 liters. How do I do it?"
hard_question_prompt = PromptTemplate.from_template(hard_question)
llm = ChatOpenAI(model_name="gpt-4")
prompt = PromptTemplate.from_template(hard_question)
chain = SmartLLMChain(llm=llm, prompt=prompt, verbose=True)
chain.run({})
```
Original text:
Added SmartLLM wrapper around LLMs to allow for SmartGPT workflow (as in
https://youtu.be/wVzuvf9D9BU). SmartLLM can be used wherever LLM can be
used. E.g:
```
smart_llm = SmartLLM(llm=OpenAI())
smart_llm("What would be a good company name for a company that makes colorful socks?")
```
or
```
smart_llm = SmartLLM(llm=OpenAI())
prompt = PromptTemplate(
input_variables=["product"],
template="What is a good name for a company that makes {product}?",
)
chain = LLMChain(llm=smart_llm, prompt=prompt)
chain.run("colorful socks")
```
SmartGPT consists of 3 steps:
1. Ideate - generate n possible solutions ("ideas") to user prompt
2. Critique - find flaws in every idea & select best one
3. Resolve - improve upon best idea & return it
Fixes#4463
## Who can review?
Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:
- @hwchase17
- @agola11
Twitter: [@UmerHAdil](https://twitter.com/@UmerHAdil) | Discord:
RicChilligerDude#7589
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>