Commit Graph

3467 Commits

Author SHA1 Message Date
Saurabh Misra
db9d5b213a
Optimize the cosine_similarity_top_k function performance (#8151)
Optimizing important numerical code and making it run faster.

Performance went up by 1.48x (148%). Runtime went down from 138715us to
56020us

Optimization explanation:

The `cosine_similarity_top_k` function is where we made the most
significant optimizations.
Instead of sorting the entire score_array which needs considering all
elements, `np.argpartition` is utilized to find the top_k largest scores
indices, this operation has a time complexity of O(n), higher
performance than sorting. Remember, `np.argpartition` doesn't guarantee
the order of the values. So we need to use argsort() to get the indices
that would sort our top-k values after partitioning, which is much more
efficient because it only sorts the top-K elements, not the entire
array. Then to get the row and column indices of sorted top_k scores in
the original score array, we use `np.unravel_index`. This operation is
more efficient and cleaner than a list comprehension.

The code has been tested for correctness by running the following
snippet on both the original function and the optimized function and
averaged over 5 times.
```
def test_cosine_similarity_top_k_large_matrices():
    X = np.random.rand(1000, 1000)
    Y = np.random.rand(1000, 1000)
    top_k = 100
    score_threshold = 0.5
    gc.disable()
    counter = time.perf_counter_ns()
    return_value = cosine_similarity_top_k(X, Y, top_k, score_threshold)
    duration = time.perf_counter_ns() - counter
    gc.enable()
```

@hwaking @hwchase17 @jerwelborn 

Unit tests pass, I also generated more regression tests which all
passed.
2023-07-26 18:03:49 -07:00
Fabrizio Ruocco
ddc353a768
Azure Cognitive Search: Custom index and scoring profile support (#6843)
Description: Adding support for custom index and scoring profile support
in Azure Cognitive Search
@hwchase17

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-26 17:58:01 -07:00
Leonid Ganeline
ed24de8467
removed namespace title (#8208)
This change compacts the left-side Navbar (ToC) of the [API
Reference](https://api.python.langchain.com/en/latest/api_reference.html).
Now almost each namespace item is split into two lines. For example
`langchain.chat_models: Chat Models`
We remove the `Chat Models` and leave one the `langchain.chat_models`. 
This effectively compacts the navbar and increases the main page's
usability. On my screen, it reduces # of lines in Toc from 28 t to 18,
which is huge.

Removing the namespace "title" (like `Chat Models`) does not remove any
information because the title is composed directly from the namespace.
API Reference users are developers. Usability for them is very
important. We see less text => we find faster.
2023-07-26 16:45:23 -07:00
Kacper Łukawski
c5988c1d4b
Implement async support for Cohere (#8237)
This PR introduces async API support for Cohere, both LLM and
embeddings. It requires updating `cohere` package to `^4`.

Tagging @hwchase17, @baskaryan, @agola11

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-26 15:51:18 -07:00
Daniel Alexander Brenot
bf1357f584
Added async support to PlanAndExecute Chain (#8239)
- Description: Adds async support to the PlanAndExecute Chain

Maintainer responsibilities:
  - Async: @agola11

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-26 15:16:07 -07:00
Bastin Florian
a3ac9b23eb
feat(confluence): add markdown format option (#8246)
# Description:
**Add the possibility to keep text as Markdown in the ConfluenceLoader**
Add a bool variable that allows to keep the Markdown format of the
Confluence pages.
It is useful because it allows to use MarkdownHeaderTextSplitter as a
DataSplitter.
If this variable in set to True in the load() method, the pages are
extracted using the markdownify library.

  # Issue: 
[4407](https://github.com/langchain-ai/langchain/issues/4407)
  # Dependencies: 
Add the markdownify library
  # Tag maintainer:
 @rlancemartin, @eyurtsev
  # Twitter handle:
 FloBastinHeyI - https://twitter.com/FloBastinHeyI

---------

Co-authored-by: Florian Bastin <florian.bastin@octo.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-26 15:00:27 -07:00
Leonid Ganeline
ee6ff96e28
docstrings cleanup (#8311)
- added missed docstrings
 - changed docstrings into consistent format
  
@baskaryan
2023-07-26 14:13:10 -07:00
Bagatur
ceab0a7c1f
update api ref style (#8318) 2023-07-26 14:12:44 -07:00
Rohit Gupta
e5dba8978a
Avoid re-computation of embedding in weaviate similarity search (#8284)
---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-26 13:31:55 -07:00
William FH
01a9b06400
Add api cross ref linking (#8275)
Example of how it would show up in our python docs:


![image](https://github.com/langchain-ai/langchain/assets/13333726/0f0a88cc-ba4a-4778-bc47-118c66807f15)


Examples added to the reference docs:

https://api.python.langchain.com/en/wfh-api_crosslink/vectorstores/langchain.vectorstores.chroma.Chroma.html#langchain.vectorstores.chroma.Chroma


![image](https://github.com/langchain-ai/langchain/assets/13333726/dcd150de-cb56-4d42-b49a-a76a002a5a52)
2023-07-26 12:38:58 -07:00
Nuno Campos
a612800ef0
Runnable single protocol (#7800)
Objects implementing Runnable: BasePromptTemplate, LLM, ChatModel,
Chain, Retriever, OutputParser

- [x] Implement Runnable in base Retriever
- [x] Raise TypeError in operator methods for unsupported things 
- [x] Implement dict which calls values in parallel and outputs dict
with results
- [x] Merge in `+` for prompts
- [x] Confirm precedence order for operators, ideal would be `+` `|`,
https://docs.python.org/3/reference/expressions.html#operator-precedence
- [x] Add support for openai functions, ie. Chat Models must return
messages
- [x] Implement BaseMessageChunk return type for BaseChatModel, a
subclass of BaseMessage which implements __add__ to return
BaseMessageChunk, concatenating all str args
- [x] Update implementation of stream/astream for llm and chat models to
use new `_stream`, `_astream` optional methods, with default
implementation in base class `raise NotImplementedError` use
https://stackoverflow.com/a/59762827 to see if it is implemented in base
class
- [x] Delete the IteratorCallbackHandler (leave the async one because
people using)
- [x] Make BaseLLMOutputParser implement Runnable, accepting either str
or BaseMessage
---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2023-07-26 12:16:46 -07:00
Bharat
04a4d3e312
Fixes #8310 Fix maximum recursion depth exceeded error (#8313)
ElasticsearchVectorStore.as_retriever() method is returning 
`RecursionError: maximum recursion depth exceeded` 
because of incorrect field reference in
 `embeddings()` method

  - Description: Fix RecursionError because of a typo
  - Issue: the issue #8310 
  - Dependencies: None,
  - Tag maintainer: @eyurtsev
  - Twitter handle: bpatel
2023-07-26 12:15:37 -07:00
Caitlin2694
b9db3dd09b
Fix "missing key op" RDFGraph OWL serialization (#8276)
Replace this comment with:
- Description: Fix "missing key op" error in RDFGraph OWL Serialization
  - Issue: #8263
  - Dependencies: None
  - Tag maintainer: @baskaryan
2023-07-26 12:14:56 -07:00
Eugene Yurtsev
862e9aed66
ChatPromptTemplate: Update doc-strings, update from_role_strings behavior (#8308)
* Update doc-strings in ChatPromptTemplate
* Update from_role_strings classmethod to use well known roles
2023-07-26 15:02:36 -04:00
Bagatur
2c2fd9ff13
bump 244 (#8314) 2023-07-26 11:58:26 -07:00
Lance Martin
77c0582243
Clean queries prior to search (#8309)
With some search tools, we see no results returned if the query is a
numeric list.

E.g., if we pass:
```
'1. "LangChain vs LangSmith: How do they differ?"'
```

We see:
```
No good Google Search Result was found
```

Local testing w/ Streamlit:

![image](https://github.com/langchain-ai/langchain/assets/122662504/0a7e3dca-59e8-415e-8df6-bd9e4ea962ee)
2023-07-26 11:48:28 -07:00
shibuiwilliam
6b88fbd9bb
add test for embedding distance evaluation (#8285)
Add tests for embedding distance evaluation

  - Description: Add tests for embedding distance evaluation
  - Issue: None
  - Dependencies: None
  - Tag maintainer: @baskaryan
  - Twitter handle: @MlopsJ
2023-07-26 11:45:50 -07:00
Riche Akparuorji
f3d2fdd54c
Fix for code snippet in documentation (#8290)
- Description: I fixed an issue in the code snippet related to the
variable name and the evaluation of its length. The original code used
the variable "docs," but the correct variable name is "docs_svm" after
using the SVMRetriever.
- maintainer: @baskaryan
- Twitter handle: @iamreechi_

Co-authored-by: iamreechi <richieakparuorji>
2023-07-26 11:31:08 -07:00
Bagatur
f27176930a
fix geopandas link (#8305) 2023-07-26 11:30:17 -07:00
Timon Palm
70604e590f
DuckDuckGoSearch News Tool (#8292)
Description: 
I wanted to use the DuckDuckGoSearch tool in an agent to let him get the
latest news for a topic. DuckDuckGoSearch has already an implemented
function for retrieving news articles. But there wasn't a tool to use
it. I simply adapted the SearchResult class with an extra argument
"backend". You can set it to "news" to only get news articles.

Furthermore, I added an example to the DuckDuckGo Notebook on how to
further customize the results by using the DuckDuckGoSearchAPIWrapper.

Dependencies: no new dependencies
---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-26 11:30:01 -07:00
Aarav Borthakur
8ce661d5a1
Docs: Fix Rockset links (#8214)
Fix broken Rockset links.

Right now links at
https://python.langchain.com/docs/integrations/providers/rockset are
broken.
2023-07-26 10:38:37 -07:00
Byron Saltysiak
61347bd322
giving path to the copy command for *.toml files (#8294)
Description: in the .devcontainer, docker-compose build is currently
failing due to the src paths in the COPY command. This change adds the
full path to the pyproject.toml and poetry.toml to allow the build to
run.
Issue: 

You can see the issue if you try to build the dev docker image with:
```
cd .devcontainer
docker-compose build
```

Dependencies: none
Twitter handle: byronsalty
2023-07-26 10:37:03 -07:00
happyxhw
6384c1ec8f
fix: ElasticVectorSearch.from_documents failed #8293 (#8296)
- Description: fix ElasticVectorSearch.from_documents with
elasticsearch_url param,
- Issue: ElasticVectorSearch.from_documents failed #8293 # it fixes (if
applicable),


---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-26 10:33:52 -07:00
Jon Bennion
ad38eb2d50
correction to reference to code (#8301)
- Description: fixes typo referencing code

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-26 10:33:18 -07:00
jacobswe
83a53e2126
Bug Fix: AzureChatOpenAI streaming with function calls (#8300)
- Description: During streaming, the first chunk may only contain the
name of an OpenAI function and not any arguments. In this case, the
current code presumes there is a streaming response and tries to append
to it, but gets a KeyError. This fixes that case by checking if the
arguments key exists, and if not, creates a new entry instead of
appending.
  - Issue: Related to #6462

Sample Code:
```python
llm = AzureChatOpenAI(
    deployment_name=deployment_name,
    model_name=model_name,
    streaming=True
)

tools = [PythonREPLTool()]
callbacks = [StreamingStdOutCallbackHandler()]

agent = initialize_agent(
    tools=tools,
    llm=llm,
    agent=AgentType.OPENAI_FUNCTIONS,
    callbacks=callbacks
)

agent('Run some python code to test your interpreter')
```

Previous Result:
```
File ...langchain/chat_models/openai.py:344, in ChatOpenAI._generate(self, messages, stop, run_manager, **kwargs)
    342         function_call = _function_call
    343     else:
--> 344         function_call["arguments"] += _function_call["arguments"]
    345 if run_manager:
    346     run_manager.on_llm_new_token(token)

KeyError: 'arguments'
```

New Result:
```python
{'input': 'Run some python code to test your interpreter',
 'output': "The Python code `print('Hello, World!')` has been executed successfully, and the output `Hello, World!` has been printed."}
```

Co-authored-by: jswe <jswe@polencapital.com>
2023-07-26 10:11:50 -07:00
German Martin
457a4730b2
Fix the mangling issue on several VectorStores child classes. (#8274)
- Description: Fix mangling issue affecting a couple of VectorStore
classes including Redis.
  - Issue: https://github.com/langchain-ai/langchain/issues/8185
  - @rlancemartin 
  
This is a simple issue but I lack of some context in the original
implementation.
My changes perhaps are not the definitive fix but to start a quick
discussion.

@hinthornw Tagging you since one of your changes introduced this
[here.](c38965fcba)
2023-07-26 09:48:55 -07:00
Alec Flett
4da43f77e5
Add ability to load (deserialize) objects from other namespaces (#7726)
I have some Prompt subclasses in my project that I'd like to be able to
deserialize in callbacks. Right now `loads()`/`load()` will bail when it
encounters my object, but I know I can trust the objects because they're
in my own projects.

<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description: a description of the change, 
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @baskaryan
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @baskaryan
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @hinthornw
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->
2023-07-26 16:59:28 +01:00
Bagatur
5c6dcb1960
bump 243 (#8289) 2023-07-26 05:41:56 -07:00
William FH
adf019724f
unpack later (#8278)
Fix https://github.com/langchain-ai/langchain/issues/8272
2023-07-26 01:53:22 -07:00
Naveen Tatikonda
9cbefcc56c
[ OpenSearch ] : Add AOSS Support to OpenSearch (#8256)
### Description

This PR includes the following changes:

- Adds AOSS (Amazon OpenSearch Service Serverless) support to
OpenSearch. Please refer to the documentation on how to use it.
- While creating an index, AOSS only supports Approximate Search with
`nmslib` and `faiss` engines. During Search, only Approximate Search and
Script Scoring (on doc values) are supported.
- This PR also adds support to `efficient_filter` which can be used with
`faiss` and `lucene` engines.
- The `lucene_filter` is deprecated. Instead please use the
`efficient_filter` for the lucene engine.


Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
2023-07-25 23:59:36 -07:00
Lance Martin
7a00f17033
Web research retriever (#8102)
Given a user question, this will -
* Use LLM to generate a set of queries.
* Query for each.
* The URLs from search results are stored in self.urls.
* A check is performed for any new URLs that haven't been processed yet
(not in self.url_database).
* Only these new URLs are loaded, transformed, and added to the
vectorstore.
* The vectorstore is queried for relevant documents based on the
questions generated by the LLM.
* Only unique documents are returned as the final result.

This code will avoid reprocessing of URLs across multiple runs of
similar queries, which should improve the performance of the retriever.
It also keeps track of all URLs that have been processed, which could be
useful for debugging or understanding the retriever's behavior.

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-07-25 19:58:00 -07:00
Rithwik Ediga Lakhamsani
d1d691caa4
Added Databricks support to MLflow Callback (#7906)
Added a quick check to make integration easier with Databricks; another
option would be to make a new class, but this seemed more
straightfoward.

cc: @liangz1 Can this be done in a more straightfoward way?
2023-07-25 18:23:54 -07:00
William FH
479cc086ba
Rm Github Import (#8257)
It's not a required dep but would break peoples builds

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-25 18:20:58 -07:00
Byron Saltysiak
68a906bb31
added lxml to the pip install example since it is required (#8260)
- Description: The trello dataloader example didn't work without an
additional dependency installed - lxml
  - Issue: na
2023-07-25 18:16:07 -07:00
Emory Petermann
7734a2b5ab
update golden-query notebook and fix typo in golden docs (#8253)
updating the documentation to be consistent for Golden query tool and
have a better introduction to the tool
2023-07-25 18:15:48 -07:00
Erick Friis
c14571ab37
New enterprise support form (#8254) 2023-07-25 15:43:27 -07:00
William FH
dd87275dde
Add LLMChain example of memory with chat models (#8250) 2023-07-25 15:20:32 -07:00
William FH
1f40d3e094
Update Broken Links (#8247) 2023-07-25 12:26:39 -07:00
Eugene Yurtsev
ec069381fb
Remove operator overloading for BaseMessage (#8245)
This PR removes operator overloading for base message.

Removing the `+` operating from base message will help make sure that:

1) There's no need to re-define `+` for message chunks
2) That there's no unexpected behavior in terms of types changing
(adding two messages yields a ChatPromptTemplate which is not a message)
2023-07-25 20:12:19 +01:00
William FH
30c2d3cd06
Update references (#8243) 2023-07-25 11:49:25 -07:00
jacobswe
0af48b06d0
Bug Fix #6462 (#8241)
- Description: Small change to fix broken Azure streaming. More complete
migration probably still necessary once the new API behavior is
finalized.
- Issue: Implements fix by @rock-you in #6462 
- Dependencies: N/A

There don't seem to be any tests specifically for this, and I was having
some trouble adding some. This is just a small temporary fix to allow
for the new API changes that OpenAI are releasing without breaking any
other code.

---------

Co-authored-by: Jacob Swe <jswe@polencapital.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-25 11:30:22 -07:00
Bagatur
c1ea8da9bc
bump 242 (#8238) 2023-07-25 08:01:37 -07:00
shibuiwilliam
af788b7cf0
Add/faiss test score threshold (#8224)
# What
- This is to add test for faiss vector store with score threshold

<!-- Thank you for contributing to LangChain!

Replace this comment with:
- Description: This is to add test for faiss vector store with score
threshold
  - Issue: None
  - Dependencies: None
  - Tag maintainer: @rlancemartin, @eyurtsev
  - Twitter handle: @MlopsJ

Please make sure you're PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @baskaryan
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @baskaryan
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @hinthornw
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->
2023-07-25 09:56:29 -04:00
shibuiwilliam
bed8eb978e
use logger instead of logging (#8225)
# What
- Use `logger` instead of using logging directly.

<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description: Use `logger` instead of using logging directly.
  - Issue: None
  - Dependencies: None
  - Tag maintainer: @baskaryan
  - Twitter handle: @MlopsJ

Please make sure you're PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @baskaryan
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @baskaryan
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @hinthornw
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->
2023-07-25 09:55:30 -04:00
Leonid Ganeline
afc55a4fee
Refactored requests (#8203)
Refactored `requests.py`. The same as
https://github.com/langchain-ai/langchain/pull/7961 #8098 #8099
requests.py is in the root code folder. This creates the
`langchain.requests: Requests` group on the API Reference navigation
ToC, on the same level as Chains and Agents which is incorrect.

Refactoring:

- copied requests.py content into utils/requests.py
- I added the backwards compatibility ref in the original requests.py. 
- updated imports to requests objects

@hwchase17, @baskaryan
2023-07-24 21:23:59 -07:00
William FH
0a16b3d84b
Update Integrations links (#8206) 2023-07-24 21:20:32 -07:00
Alex Stachowiak
a7efa95775
Update base chain type hints (#7680)
Addresses #7578. `run()` can return dictionaries, Pydantic objects or
strings, so the type hints should reflect that. See the chain from
`create_structured_output_chain` for an example of a non-string return
type from `run()`.

I've updated the BaseLLMChain return type hint from `str` to `Any`.
Although, the differences between `run()` and `__call__()` seem less
clear now.

CC: @baskaryan

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-24 21:16:41 -07:00
Ani peter benjamin
e58b1d7073
feat: temp fixed Could not parse LLM output on agents folder (#7746)
---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-24 19:20:37 -07:00
Dayuan Jiang
125ae6d9de
add Hybrid retriever that not require any external service (#8108)
- Until now, hybrid search was limited to modules requiring external
services, such as Weaviate/Pinecone Hybrid Search. However, I have
developed a hybrid retriever that can merge a list of retrievers using
the [Reciprocal Rank
Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf)
algorithm. This new approach, similar to Weaviate hybrid search, does
not require the initialization of any external service.
  - Dependencies: No  - Twitter handle: dayuanjian21687

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-24 19:16:10 -07:00
Dario Ruben
04e45f9cde
Fixed grammar in LLM models documentation (#8210)
Description: I fixed a typo in the documentation related to LLMs
(https://python.langchain.com/docs/modules/model_io/models/llms/)
2023-07-24 19:14:32 -07:00