At the moment, pinecone vectorStore does not support filters and
namespaces when using similarity_score_threshold search type.
In this PR, I've implemented that. It passes all the kwargs except
"score_threshold" as that is not a supported argument for method
"similarity_search_with_score".
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
## Changes
- [X] Fill the `llm_output` param when there is an output parsing error
in a Pydantic schema so that we can get the original text that failed to
parse when handling the exception
## Background
With this change, we could do something like this:
```
output_parser = PydanticOutputParser(pydantic_object=pydantic_obj)
chain = ConversationChain(..., output_parser=output_parser)
try:
response: PydanticSchema = chain.predict(input=input)
except OutputParserException as exc:
logger.error(
'OutputParserException while parsing chatbot response: %s', exc.llm_output,
)
```
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
hi @rlancemartin ,
We had a new deployment and the `pg_extension` creation command was
updated from `CREATE EXTENSION pg_embedding` to `CREATE EXTENSION
embedding`.
https://github.com/neondatabase/neon/pull/4646
The extension not made public yet. No users will be affected by this.
Will be public next week.
Please let me know if you have any questions.
Thank you in advance 🙏
Continuing with Tolkien inspired series of langchain tools. I bring to
you:
**The Fellowship of the Vectors**, AKA EmbeddingsClusteringFilter.
This document filter uses embeddings to group vectors together into
clusters, then allows you to pick an arbitrary number of documents
vector based on proximity to the cluster centers. That's a
representative sample of the cluster.
The original idea is from [Greg Kamradt](https://github.com/gkamradt)
from this video (Level4):
https://www.youtube.com/watch?v=qaPMdcCqtWk&t=365s
I added few tricks to make it a bit more versatile, so you can
parametrize what to do with duplicate documents in case of cluster
overlap: replace the duplicates with the next closest document or remove
it. This allow you to use it as an special kind of redundant filter too.
Additionally you can choose 2 diff orders: grouped by cluster or
respecting the original retriever scores.
In my use case I was using the docs grouped by cluster to run refine
chains per cluster to generate summarization over a large corpus of
documents.
Let me know if you want to change anything!
@rlancemartin, @eyurtsev, @hwchase17,
---------
Co-authored-by: rlm <pexpresss31@gmail.com>
Change details:
- Description: When calling db.persist(), a check prevents from it
proceeding as the constructor only sets member `_persist_directory` from
parameters. But the ChromaDB client settings also has this parameter,
and if the client_settings parameter is used without passing the
persist_directory (which is optional), the `persist` method raises
`ValueError` for not setting `_persist_directory`. This change fixes it
by setting the member `_persist_directory` variable from client_settings
if it is set, else uses the constructor parameter.
- Issue: I didn't find any github issue of this, but I discovered it
after calling the persist method
- Dependencies: None
- Tag maintainer: vectorstore related change - @rlancemartin, @eyurtsev
- Twitter handle: Don't have one :(
*Additional discussion*: We may need to discuss the way I implemented
the fallback using `or`.
---------
Co-authored-by: rlm <pexpresss31@gmail.com>
This PR improves the example notebook for the Marqo vectorstore
implementation by adding a new RetrievalQAWithSourcesChain example. The
`embedding` parameter in `from_documents` has its type updated to
`Union[Embeddings, None]` and a default parameter of None because this
is ignored in Marqo.
This PR also upgrades the Marqo version to 0.11.0 to remove the device
parameter after a breaking change to the API.
Related to #7068 @tomhamer @hwchase17
---------
Co-authored-by: Tom Hamer <tom@marqo.ai>
Description: Pack of small fixes and refactorings that don't affect
functionality, just making code prettier & fixing some misspelling
(hand-filtered improvements proposed by SeniorAi.online, prototype of
code improving tool based on gpt4), agents and callbacks folders was
covered.
Dependencies: Nothing changed
Twitter: https://twitter.com/nayjest
Co-authored-by: Bagatur <baskaryan@gmail.com>
This PR improves upon the Clarifai LangChain integration with improved docs, errors, args and the addition of embedding model support in LancChain for Clarifai's embedding models and an overview of the various ways you can integrate with Clarifai added to the docs.
---------
Co-authored-by: Matthew Zeiler <zeiler@clarifai.com>
Description: Added number_of_head_rows as a parameter to pandas agent.
number_of_head_rows allows the user to select the number of rows to pass
with the prompt when include_df_in_prompt is True. This gives the
ability to control the token length and can be helpful in dealing with
large dataframe.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
- Description: Solving, anthropic packaging version issue by clearing
the mixup from package.version that is being confused with version from
- importlib.metadata.version.
- Issue: it fixes the issue #7283
- Maintainer: @hwchase17
The following change has been explained in the comment -
https://github.com/hwchase17/langchain/issues/7283#issuecomment-1624328978
- Description: pydantic's `ModelField.type_` only exposes the native
data type but not complex type hints like `List`. Thus, generating a
Tool with `from_function` through function signature produces incorrect
argument schemas (e.g., `str` instead of `List[str]`)
- Issue: N/A
- Dependencies: N/A
- Tag maintainer: @hinthornw
- Twitter handle: `mapped`
All the unittest (with an additional one in this PR) passed, though I
didn't try integration tests...
- Description: Sometimes there are csv attachments with the media type
"application/vnd.ms-excel". These files failed to be loaded via the xlrd
library. It throws a corrupted file error. I fixed it by separately
processing excel files using pandas. Excel files will be processed just
like before.
- Dependencies: pandas, os, io
---------
Co-authored-by: Chathura <chathurar@yaalalabs.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
In some cases, the OpenAI response is missing the `finish_reason`
attribute. It seems to happen when using Ada or Babbage and
`stream=true`, but I can't always reproduce it. This change just
gracefully handles the missing key.
Introduction of newest function calling feature doesn't work properly
with PromptLayerChatOpenAI model since on the `_generate` method,
functions argument are not even getting passed to the `ChatOpenAI` base
class which results in empty `ai_message.additional_kwargs`
Fixes #6365
updated `tutorials.mdx`:
- added a link to new `Deeplearning AI` course on LangChain
- added links to other tutorial videos
- fixed format
@baskaryan, @hwchase17
#### Description
refactor BedrockEmbeddings class to clean code as below:
1. inline content type and accept
2. rewrite input_body as a dictionary literal
3. no need to declare embeddings variable, so remove it
- Description: Adding to Chroma integration the option to run a
similarity search by a vector with relevance scores. Fixing two minor
typos.
- Issue: The "lambda_mult" typo is related to #4861
- Maintainer: @rlancemartin, @eyurtsev
Based on user feedback, we have improved the Alibaba Cloud OpenSearch
vector store documentation.
Co-authored-by: zhaoshengbo <shengbo.zsb@alibaba-inc.com>
<!-- Thank you for contributing to LangChain!
Replace this comment with:
- Description: a description of the change,
- Issue: the issue # it fixes (if applicable),
- Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use.
Maintainer responsibilities:
- General / Misc / if you don't know who to tag: @baskaryan
- DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
- Models / Prompts: @hwchase17, @baskaryan
- Memory: @hwchase17
- Agents / Tools / Toolkits: @hinthornw
- Tracing / Callbacks: @agola11
- Async: @agola11
If no one reviews your PR within a few days, feel free to @-mention the
same people again.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->
Several updates for the PowerBI tools:
- Handle 0 records returned by requesting redo with different filtering
- Handle too large results by optionally tokenizing the result and
comparing against a max (change in signature, non-breaking)
- Implemented LLMChain with Chat for chat models for the tools.
- Updates to the main prompt including tables
- Update to Tool prompt with TOPN function
- Split the tool prompt to allow the LLMChain with ChatPromptTemplate
Smaller fixes for stability.
For visibility: @hinthornw
Replace this comment with:
- Description: added documentation for a template repo that helps
dockerizing and deploying a LangChain using a Cloud Build CI/CD pipeline
to Google Cloud build serverless
- Issue: None,
- Dependencies: None,
- Tag maintainer: @baskaryan,
- Twitter handle: EdenEmarco177
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use.