# Improve pinecone hybrid search retriever adding metadata support
I simply remove the hardwiring of metadata to the existing
implementation allowing one to pass `metadatas` attribute to the
constructors and in `get_relevant_documents`. I also add one missing pip
install to the accompanying notebook (I am not adding dependencies, they
were pre-existing).
First contribution, just hoping to help, feel free to critique :)
my twitter username is `@andreliebschner`
While looking at hybrid search I noticed #3043 and #1743. I think the
former can be closed as following the example right now (even prior to
my improvements) works just fine, the latter I think can be also closed
safely, maybe pointing out the relevant classes and example. Should I
reply those issues mentioning someone?
@dev2049, @hwchase17
---------
Co-authored-by: Andreas Liebschner <a.liebschner@shopfully.com>
This is a highly optimized update to the pull request
https://github.com/hwchase17/langchain/pull/3269
Summary:
1) Added ability to MRKL agent to self solve the ValueError(f"Could not
parse LLM output: `{llm_output}`") error, whenever llm (especially
gpt-3.5-turbo) does not follow the format of MRKL Agent, while returning
"Action:" & "Action Input:".
2) The way I am solving this error is by responding back to the llm with
the messages "Invalid Format: Missing 'Action:' after 'Thought:'" &
"Invalid Format: Missing 'Action Input:' after 'Action:'" whenever
Action: and Action Input: are not present in the llm output
respectively.
For a detailed explanation, look at the previous pull request.
New Updates:
1) Since @hwchase17 , requested in the previous PR to communicate the
self correction (error) message, using the OutputParserException, I have
added new ability to the OutputParserException class to store the
observation & previous llm_output in order to communicate it to the next
Agent's prompt. This is done, without breaking/modifying any of the
functionality OutputParserException previously performs (i.e.
OutputParserException can be used in the same way as before, without
passing any observation & previous llm_output too).
---------
Co-authored-by: Deepak S V <svdeepak99@users.noreply.github.com>
tldr: The docarray [integration
PR](https://github.com/hwchase17/langchain/pull/4483) introduced a
pinned dependency to protobuf. This is a docarray dependency, not a
langchain dependency. Since this is handled by the docarray
dependencies, it is unnecessary here.
Further, as a pinned dependency, this quickly leads to incompatibilities
with application code that consumes the library. Much less with a
heavily used library like protobuf.
Detail: as we see in the [docarray
integration](https://github.com/hwchase17/langchain/pull/4483/files#diff-50c86b7ed8ac2cf95bd48334961bf0530cdc77b5a56f852c5c61b89d735fd711R81-R83),
the transitive dependencies of docarray were also listed as langchain
dependencies. This is unnecessary as the docarray project has an
appropriate
[extras](a01a05542d/pyproject.toml (L70)).
The docarray project also does not require this _pinned_ version of
protobuf, rather [a minimum
version](a01a05542d/pyproject.toml (L41)).
So this pinned version was likely in error.
To fix this, this PR reverts the explicit hnswlib and protobuf
dependencies and adds the hnswlib extras install for docarray (which
installs hnswlib and protobuf, as originally intended). Because version
`0.32.0`
of the docarray hnswlib extras added protobuf, we bump the docarray
dependency from `^0.31.0` to `^0.32.0`.
# revert docarray explicit transitive dependencies and use extras
instead
## Who can review?
@dev2049 -- reviewed the original PR
@eyurtsev -- bumped the pinned protobuf dependency a few days ago
---------
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
Update to pull request https://github.com/hwchase17/langchain/pull/3215
Summary:
1) Improved the sanitization of query (using regex), by removing python
command (since gpt-3.5-turbo sometimes assumes python console as a
terminal, and runs python command first which causes error). Also
sometimes 1 line python codes contain single backticks.
2) Added 7 new test cases.
For more details, view the previous pull request.
---------
Co-authored-by: Deepak S V <svdeepak99@users.noreply.github.com>
Let user inspect the token ids in addition to getting th enumber of tokens
---------
Co-authored-by: Zach Schillaci <40636930+zachschillaci27@users.noreply.github.com>
Extract the methods specific to running an LLM or Chain on a dataset to
separate utility functions.
This simplifies the client a bit and lets us separate concerns of LCP
details from running examples (e.g., for evals)
# docs: `deployments` page moved into `ecosystem/`
The `Deployments` page moved into the `Ecosystem/` group
Small fixes:
- `index` page: fixed order of items in the `Modules` list, in the `Use
Cases` list
- item `References/Installation` was lost in the `index` page (not on
the Navbar!). Restored it.
- added `|` marker in several places.
NOTE: I also thought about moving the `Additional Resources/Gallery`
page into the `Ecosystem` group but decided to leave it unchanged.
Please, advise on this.
## Who can review?
Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:
@dev2049
Without the addition of 'in its original language', the condensing
response, more often than not, outputs the rephrased question in
English, even when the conversation is in another language. This
question in English then transfers to the question in the retrieval
prompt and the chatbot is stuck in English.
I'm sometimes surprised that this does not happen more often, but
apparently the GPT models are smart enough to understand that when the
template contains
Question: ....
Answer:
then the answer should be in in the language of the question.
### Submit Multiple Files to the Unstructured API
Enables batching multiple files into a single Unstructured API requests.
Support for requests with multiple files was added to both
`UnstructuredAPIFileLoader` and `UnstructuredAPIFileIOLoader`. Note that
if you submit multiple files in "single" mode, the result will be
concatenated into a single document. We recommend using this feature in
"elements" mode.
### Testing
The following should load both documents, using two of the example docs
from the integration tests folder.
```python
from langchain.document_loaders import UnstructuredAPIFileLoader
file_paths = ["examples/layout-parser-paper.pdf", "examples/whatsapp_chat.txt"]
loader = UnstructuredAPIFileLoader(
file_paths=file_paths,
api_key="FAKE_API_KEY",
strategy="fast",
mode="elements",
)
docs = loader.load()
```
# Corrected Misspelling in agents.rst Documentation
<!--
Thank you for contributing to LangChain! Your PR will appear in our next
release under the title you set. Please make sure it highlights your
valuable contribution.
Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.
After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get
-->
In the
[documentation](https://python.langchain.com/en/latest/modules/agents.html)
it says "in fact, it is often best to have an Action Agent be in
**change** of the execution for the Plan and Execute agent."
**Suggested Change:** I propose correcting change to charge.
Fix for issue: #5039
# Add documentation for Databricks integration
This is a follow-up of https://github.com/hwchase17/langchain/pull/4702
It documents the details of how to integrate Databricks using langchain.
It also provides examples in a notebook.
## Who can review?
@dev2049 @hwchase17 since you are aware of the context. We will promote
the integration after this doc is ready. Thanks in advance!
# Fixes an annoying typo in docs
<!--
Thank you for contributing to LangChain! Your PR will appear in our next
release under the title you set. Please make sure it highlights your
valuable contribution.
Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.
After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.
-->
<!-- Remove if not applicable -->
Fixes Annoying typo in docs - "Therefor" -> "Therefore". It's so
annoying to read that I just had to make this PR.
# Streaming only final output of agent (#2483)
As requested in issue #2483, this Callback allows to stream only the
final output of an agent (ie not the intermediate steps).
Fixes#2483
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
# Ensuring that users pass a single prompt when calling a LLM
- This PR adds a check to the `__call__` method of the `BaseLLM` class
to ensure that it is called with a single prompt
- Raises a `ValueError` if users try to call a LLM with a list of prompt
and instructs them to use the `generate` method instead
## Why this could be useful
I stumbled across this by accident. I accidentally called the OpenAI LLM
with a list of prompts instead of a single string and still got a
result:
```
>>> from langchain.llms import OpenAI
>>> llm = OpenAI()
>>> llm(["Tell a joke"]*2)
"\n\nQ: Why don't scientists trust atoms?\nA: Because they make up everything!"
```
It might be better to catch such a scenario preventing unnecessary costs
and irritation for the user.
## Proposed behaviour
```
>>> from langchain.llms import OpenAI
>>> llm = OpenAI()
>>> llm(["Tell a joke"]*2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/marcus/Projects/langchain/langchain/llms/base.py", line 291, in __call__
raise ValueError(
ValueError: Argument `prompt` is expected to be a single string, not a list. If you want to run the LLM on multiple prompts, use `generate` instead.
```
# Add self query translator for weaviate vectorstore
Adds support for the EQ comparator and the AND/OR operators.
Co-authored-by: Dominic Chan <dchan@cppib.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
- Higher accuracy on the responses
- New redesigned UI
- Pretty Sources: display the sources by title / sub-section instead of
long URL.
- Fixed Reset Button bugs and some other UI issues
- Other tweaks
# Improve Evernote Document Loader
When exporting from Evernote you may export more than one note.
Currently the Evernote loader concatenates the content of all notes in
the export into a single document and only attaches the name of the
export file as metadata on the document.
This change ensures that each note is loaded as an independent document
and all available metadata on the note e.g. author, title, created,
updated are added as metadata on each document.
It also uses an existing optional dependency of `html2text` instead of
`pypandoc` to remove the need to download the pandoc application via
`download_pandoc()` to be able to use the `pypandoc` python bindings.
Fixes#4493
Co-authored-by: Mike McGarry <mike.mcgarry@finbourne.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
# Change the logger message level
The library is logging at `error` level a situation that is not an
error.
We noticed this error in our logs, but from our point of view it's an
expected behavior and the log level should be `warning`.
# Adds "IN" metadata filter for pgvector to all checking for set
presence
PGVector currently supports metadata filters of the form:
```
{"filter": {"key": "value"}}
```
which will return documents where the "key" metadata field is equal to
"value".
This PR adds support for metadata filters of the form:
```
{"filter": {"key": { "IN" : ["list", "of", "values"]}}}
```
Other vector stores support this via an "$in" syntax. I chose to use
"IN" to match postgres' syntax, though happy to switch.
Tested locally with PGVector and ChatVectorDBChain.
@dev2049
---------
Co-authored-by: jade@spanninglabs.com <jade@spanninglabs.com>
# Bug fixes in Redis - Vectorstore (Added the version of redis to the
error message and removed the cls argument from a classmethod)
Co-authored-by: Tyler Hutcherson <tyler.hutcherson@redis.com>
# Remove autoreload in examples
Remove the `autoreload` in examples since it is not necessary for most
users:
```
%load_ext autoreload,
%autoreload 2
```
# Powerbi API wrapper bug fix + integration tests
- Bug fix by removing `TYPE_CHECKING` in in utilities/powerbi.py
- Added integration test for power bi api in
utilities/test_powerbi_api.py
- Added integration test for power bi agent in
agent/test_powerbi_agent.py
- Edited .env.examples to help set up power bi related environment
variables
- Updated demo notebook with working code in
docs../examples/powerbi.ipynb - AzureOpenAI -> ChatOpenAI
Notes:
Chat models (gpt3.5, gpt4) are much more capable than davinci at writing
DAX queries, so that is important to getting the agent to work properly.
Interestingly, gpt3.5-turbo needed the examples=DEFAULT_FEWSHOT_EXAMPLES
to write consistent DAX queries, so gpt4 seems necessary as the smart
llm.
Fixes#4325
## Before submitting
Azure-core and Azure-identity are necessary dependencies
check integration tests with the following:
`pytest tests/integration_tests/utilities/test_powerbi_api.py`
`pytest tests/integration_tests/agent/test_powerbi_agent.py`
You will need a power bi account with a dataset id + table name in order
to test. See .env.examples for details.
## Who can review?
@hwchase17
@vowelparrot
---------
Co-authored-by: aditya-pethe <adityapethe1@gmail.com>
# Added a YouTube Tutorial
Added a LangChain tutorial playlist aimed at onboarding newcomers to
LangChain and its use cases.
I've shared the video in the #tutorials channel and it seemed to be well
received. I think this could be useful to the greater community.
## Who can review?
@dev2049
This PR adds support for Databricks runtime and Databricks SQL by using
[Databricks SQL Connector for
Python](https://docs.databricks.com/dev-tools/python-sql-connector.html).
As a cloud data platform, accessing Databricks requires a URL as follows
`databricks://token:{api_token}@{hostname}?http_path={http_path}&catalog={catalog}&schema={schema}`.
**The URL is **complicated** and it may take users a while to figure it
out**. Since the fields `api_token`/`hostname`/`http_path` fields are
known in the Databricks notebook, I am proposing a new method
`from_databricks` to simplify the connection to Databricks.
## In Databricks Notebook
After changes, Databricks users only need to specify the `catalog` and
`schema` field when using langchain.
<img width="881" alt="image"
src="https://github.com/hwchase17/langchain/assets/1097932/984b4c57-4c2d-489d-b060-5f4918ef2f37">
## In Jupyter Notebook
The method can be used on the local setup as well:
<img width="678" alt="image"
src="https://github.com/hwchase17/langchain/assets/1097932/142e8805-a6ef-4919-b28e-9796ca31ef19">
# Add Spark SQL support
* Add Spark SQL support. It can connect to Spark via building a
local/remote SparkSession.
* Include a notebook example
I tried some complicated queries (window function, table joins), and the
tool works well.
Compared to the [Spark Dataframe
agent](https://python.langchain.com/en/latest/modules/agents/toolkits/examples/spark.html),
this tool is able to generate queries across multiple tables.
---------
# Your PR Title (What it does)
<!--
Thank you for contributing to LangChain! Your PR will appear in our next
release under the title you set. Please make sure it highlights your
valuable contribution.
Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.
After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.
-->
<!-- Remove if not applicable -->
Fixes # (issue)
## Before submitting
<!-- If you're adding a new integration, include an integration test and
an example notebook showing its use! -->
## Who can review?
Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:
<!-- For a quicker response, figure out the right person to tag with @
@hwchase17 - project lead
Tracing / Callbacks
- @agola11
Async
- @agola11
DataLoaders
- @eyurtsev
Models
- @hwchase17
- @agola11
Agents / Tools / Toolkits
- @vowelparrot
VectorStores / Retrievers / Memory
- @dev2049
-->
---------
Co-authored-by: Gengliang Wang <gengliang@apache.org>
Co-authored-by: Mike W <62768671+skcoirz@users.noreply.github.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: UmerHA <40663591+UmerHA@users.noreply.github.com>
Co-authored-by: 张城铭 <z@hyperf.io>
Co-authored-by: assert <zhangchengming@kkguan.com>
Co-authored-by: blob42 <spike@w530>
Co-authored-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Co-authored-by: Richard He <he.yucheng@outlook.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
Co-authored-by: Leonid Ganeline <leo.gan.57@gmail.com>
Co-authored-by: Alexey Nominas <60900649+Chae4ek@users.noreply.github.com>
Co-authored-by: elBarkey <elbarkey@gmail.com>
Co-authored-by: Davis Chase <130488702+dev2049@users.noreply.github.com>
Co-authored-by: Jeffrey D <1289344+verygoodsoftwarenotvirus@users.noreply.github.com>
Co-authored-by: so2liu <yangliu35@outlook.com>
Co-authored-by: Viswanadh Rayavarapu <44315599+vishwa-rn@users.noreply.github.com>
Co-authored-by: Chakib Ben Ziane <contact@blob42.xyz>
Co-authored-by: Daniel Chalef <131175+danielchalef@users.noreply.github.com>
Co-authored-by: Daniel Chalef <daniel.chalef@private.org>
Co-authored-by: Jari Bakken <jari.bakken@gmail.com>
Co-authored-by: escafati <scafatieugenio@gmail.com>
# Fixes syntax for setting Snowflake database search_path
An error occurs when using a Snowflake database and providing a schema
argument.
I have updated the syntax to run a Snowflake specific query when the
database dialect is 'snowflake'.