community[minor]: add DashScope Rerank (#22403)

**Description:** this PR adds DashScope Rerank capability to Langchain,
you can find DashScope Rerank API from
[here](https://help.aliyun.com/document_detail/2780058.html?spm=a2c4g.2780059.0.0.6d995024FlrJ12)
&
[here](https://help.aliyun.com/document_detail/2780059.html?spm=a2c4g.2780058.0.0.63f75024cr11N9).
[DashScope](https://dashscope.aliyun.com/) is the generative AI service
from Alibaba Cloud (Aliyun). You can create DashScope API key from
[here](https://bailian.console.aliyun.com/?apiKey=1#/api-key).

**Dependencies:** DashScopeRerank depends on `dashscope` python package.

**Twitter handle:** my twitter/x account is https://x.com/LastMonopoly
and I'd like a mention, thanks you!


**Tests and docs**
  1. integration test: `test_dashscope_rerank.py`
  2. example notebook: `dashscope_rerank.ipynb`

**Lint and test**: I have run `make format`, `make lint` and `make test`
from the root of the package I've modified.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
erick/docs-update-chatbedrock-with-tool-calling-docs-do-not-use
X-HAN 4 weeks ago committed by GitHub
parent 29064848f9
commit 62f13f95e4
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

@ -0,0 +1,387 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# DashScope Reranker\n",
"\n",
"This notebook shows how to use DashScope Reranker for document compression and retrieval. [DashScope](https://dashscope.aliyun.com/) is the generative AI service from Alibaba Cloud (Aliyun).\n",
"\n",
"DashScope's [Text ReRank Model](https://help.aliyun.com/document_detail/2780058.html?spm=a2c4g.2780059.0.0.6d995024FlrJ12) supports reranking documents with a maximum of 4000 tokens. Moreover, it supports Chinese, English, Japanese, Korean, Thai, Spanish, French, Portuguese, Indonesian, Arabic, and over 50 other languages. For more details, please visit [here](https://help.aliyun.com/document_detail/2780059.html?spm=a2c4g.2780058.0.0.3a9e5b1dWeOQjI)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install --upgrade --quiet dashscope"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install --upgrade --quiet faiss\n",
"\n",
"# OR (depending on Python version)\n",
"\n",
"%pip install --upgrade --quiet faiss-cpu"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# To create api key: https://bailian.console.aliyun.com/?apiKey=1#/api-key\n",
"\n",
"import getpass\n",
"import os\n",
"\n",
"os.environ[\"DASHSCOPE_API_KEY\"] = getpass.getpass(\"DashScope API Key:\")"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"# Helper function for printing docs\n",
"def pretty_print_docs(docs):\n",
" print(\n",
" f\"\\n{'-' * 100}\\n\".join(\n",
" [f\"Document {i+1}:\\n\\n\" + d.page_content for i, d in enumerate(docs)]\n",
" )\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Set up the base vector store retriever\n",
"Let's start by initializing a simple vector store retriever and storing the 2023 State of the Union speech (in chunks). We can set up the retriever to retrieve a high number (20) of docs."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Document 1:\n",
"\n",
"I understand. \n",
"\n",
"I remember when my Dad had to leave our home in Scranton, Pennsylvania to find work. I grew up in a family where if the price of food went up, you felt it. \n",
"\n",
"Thats why one of the first things I did as President was fight to pass the American Rescue Plan. \n",
"\n",
"Because people were hurting. We needed to act, and we did. \n",
"\n",
"Few pieces of legislation have done more in a critical moment in our history to lift us out of crisis.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 2:\n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 3:\n",
"\n",
"To all Americans, I will be honest with you, as Ive always promised. A Russian dictator, invading a foreign country, has costs around the world. \n",
"\n",
"And Im taking robust action to make sure the pain of our sanctions is targeted at Russias economy. And I will use every tool at our disposal to protect American businesses and consumers. \n",
"\n",
"Tonight, I can announce that the United States has worked with 30 other countries to release 60 Million barrels of oil from reserves around the world.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 4:\n",
"\n",
"We cannot let this happen. \n",
"\n",
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
"\n",
"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 5:\n",
"\n",
"Tonight I say to the Russian oligarchs and corrupt leaders who have bilked billions of dollars off this violent regime no more. \n",
"\n",
"The U.S. Department of Justice is assembling a dedicated task force to go after the crimes of Russian oligarchs. \n",
"\n",
"We are joining with our European allies to find and seize your yachts your luxury apartments your private jets. We are coming for your ill-begotten gains.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 6:\n",
"\n",
"Every Administration says theyll do it, but we are actually doing it. \n",
"\n",
"We will buy American to make sure everything from the deck of an aircraft carrier to the steel on highway guardrails are made in America. \n",
"\n",
"But to compete for the best jobs of the future, we also need to level the playing field with China and other competitors.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 7:\n",
"\n",
"When we invest in our workers, when we build the economy from the bottom up and the middle out together, we can do something we havent done in a long time: build a better America. \n",
"\n",
"For more than two years, COVID-19 has impacted every decision in our lives and the life of the nation. \n",
"\n",
"And I know youre tired, frustrated, and exhausted. \n",
"\n",
"But I also know this.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 8:\n",
"\n",
"A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since shes been nominated, shes received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \n",
"\n",
"And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 9:\n",
"\n",
"My plan will not only lower costs to give families a fair shot, it will lower the deficit. \n",
"\n",
"The previous Administration not only ballooned the deficit with tax cuts for the very wealthy and corporations, it undermined the watchdogs whose job was to keep pandemic relief funds from being wasted. \n",
"\n",
"But in my administration, the watchdogs have been welcomed back. \n",
"\n",
"Were going after the criminals who stole billions in relief money meant for small businesses and millions of Americans.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 10:\n",
"\n",
"He will never extinguish their love of freedom. He will never weaken the resolve of the free world. \n",
"\n",
"We meet tonight in an America that has lived through two of the hardest years this nation has ever faced. \n",
"\n",
"The pandemic has been punishing. \n",
"\n",
"And so many families are living paycheck to paycheck, struggling to keep up with the rising cost of food, gas, housing, and so much more. \n",
"\n",
"I understand.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 11:\n",
"\n",
"And tonight, Im announcing that the Justice Department will name a chief prosecutor for pandemic fraud. \n",
"\n",
"By the end of this year, the deficit will be down to less than half what it was before I took office. \n",
"\n",
"The only president ever to cut the deficit by more than one trillion dollars in a single year. \n",
"\n",
"Lowering your costs also means demanding more competition. \n",
"\n",
"Im a capitalist, but capitalism without competition isnt capitalism. \n",
"\n",
"Its exploitation—and it drives up prices.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 12:\n",
"\n",
"Let each of us here tonight in this Chamber send an unmistakable signal to Ukraine and to the world. \n",
"\n",
"Please rise if you are able and show that, Yes, we the United States of America stand with the Ukrainian people. \n",
"\n",
"Throughout our history weve learned this lesson when dictators do not pay a price for their aggression they cause more chaos. \n",
"\n",
"They keep moving. \n",
"\n",
"And the costs and the threats to America and the world keep rising.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 13:\n",
"\n",
"Cancer is the #2 cause of death in Americasecond only to heart disease. \n",
"\n",
"Last month, I announced our plan to supercharge \n",
"the Cancer Moonshot that President Obama asked me to lead six years ago. \n",
"\n",
"Our goal is to cut the cancer death rate by at least 50% over the next 25 years, turn more cancers from death sentences into treatable diseases. \n",
"\n",
"More support for patients and families. \n",
"\n",
"To get there, I call on Congress to fund ARPA-H, the Advanced Research Projects Agency for Health.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 14:\n",
"\n",
"It fueled our efforts to vaccinate the nation and combat COVID-19. It delivered immediate economic relief for tens of millions of Americans. \n",
"\n",
"Helped put food on their table, keep a roof over their heads, and cut the cost of health insurance. \n",
"\n",
"And as my Dad used to say, it gave people a little breathing room.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 15:\n",
"\n",
"America will lead that effort, releasing 30 Million barrels from our own Strategic Petroleum Reserve. And we stand ready to do more if necessary, unified with our allies. \n",
"\n",
"These steps will help blunt gas prices here at home. And I know the news about whats happening can seem alarming. \n",
"\n",
"But I want you to know that we are going to be okay. \n",
"\n",
"When the history of this era is written Putins war on Ukraine will have left Russia weaker and the rest of the world stronger.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 16:\n",
"\n",
"So thats my plan. It will grow the economy and lower costs for families. \n",
"\n",
"So what are we waiting for? Lets get this done. And while youre at it, confirm my nominees to the Federal Reserve, which plays a critical role in fighting inflation. \n",
"\n",
"My plan will not only lower costs to give families a fair shot, it will lower the deficit.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 17:\n",
"\n",
"And we will, as one people. \n",
"\n",
"One America. \n",
"\n",
"The United States of America. \n",
"\n",
"May God bless you all. May God protect our troops.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 18:\n",
"\n",
"As Ive told Xi Jinping, it is never a good bet to bet against the American people. \n",
"\n",
"Well create good jobs for millions of Americans, modernizing roads, airports, ports, and waterways all across America. \n",
"\n",
"And well do it all to withstand the devastating effects of the climate crisis and promote environmental justice.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 19:\n",
"\n",
"And I know youre tired, frustrated, and exhausted. \n",
"\n",
"But I also know this. \n",
"\n",
"Because of the progress weve made, because of your resilience and the tools we have, tonight I can say \n",
"we are moving forward safely, back to more normal routines. \n",
"\n",
"Weve reached a new moment in the fight against COVID-19, with severe cases down to a level not seen since last July. \n",
"\n",
"Just a few days ago, the Centers for Disease Control and Prevention—the CDC—issued new mask guidelines.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 20:\n",
"\n",
"Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans. \n",
"\n",
"Last year COVID-19 kept us apart. This year we are finally together again. \n",
"\n",
"Tonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. \n",
"\n",
"With a duty to one another to the American people to the Constitution. \n",
"\n",
"And with an unwavering resolve that freedom will always triumph over tyranny.\n"
]
}
],
"source": [
"from langchain_community.document_loaders import TextLoader\n",
"from langchain_community.embeddings.dashscope import DashScopeEmbeddings\n",
"from langchain_community.vectorstores.faiss import FAISS\n",
"from langchain_text_splitters import RecursiveCharacterTextSplitter\n",
"\n",
"documents = TextLoader(\"../../how_to/state_of_the_union.txt\").load()\n",
"text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)\n",
"texts = text_splitter.split_documents(documents)\n",
"retriever = FAISS.from_documents(texts, DashScopeEmbeddings()).as_retriever( # type: ignore\n",
" search_kwargs={\"k\": 20}\n",
")\n",
"\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = retriever.invoke(query)\n",
"pretty_print_docs(docs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Reranking with DashScopeRerank\n",
"Now let's wrap our base retriever with a `ContextualCompressionRetriever`. We'll use the `DashScopeRerank` to rerank the returned results."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Document 1:\n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 2:\n",
"\n",
"Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans. \n",
"\n",
"Last year COVID-19 kept us apart. This year we are finally together again. \n",
"\n",
"Tonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. \n",
"\n",
"With a duty to one another to the American people to the Constitution. \n",
"\n",
"And with an unwavering resolve that freedom will always triumph over tyranny.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 3:\n",
"\n",
"Tonight I say to the Russian oligarchs and corrupt leaders who have bilked billions of dollars off this violent regime no more. \n",
"\n",
"The U.S. Department of Justice is assembling a dedicated task force to go after the crimes of Russian oligarchs. \n",
"\n",
"We are joining with our European allies to find and seize your yachts your luxury apartments your private jets. We are coming for your ill-begotten gains.\n"
]
}
],
"source": [
"from langchain.retrievers import ContextualCompressionRetriever\n",
"from langchain_community.document_compressors.dashscope_rerank import DashScopeRerank\n",
"\n",
"compressor = DashScopeRerank()\n",
"compression_retriever = ContextualCompressionRetriever(\n",
" base_compressor=compressor, base_retriever=retriever\n",
")\n",
"\n",
"compressed_docs = compression_retriever.invoke(\n",
" \"What did the president say about Ketanji Jackson Brown\"\n",
")\n",
"pretty_print_docs(compressed_docs)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.13"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

@ -2,6 +2,9 @@ import importlib
from typing import TYPE_CHECKING, Any
if TYPE_CHECKING:
from langchain_community.document_compressors.dashscope_rerank import (
DashScopeRerank,
)
from langchain_community.document_compressors.flashrank_rerank import (
FlashrankRerank,
)
@ -25,6 +28,7 @@ _module_lookup = {
"JinaRerank": "langchain_community.document_compressors.jina_rerank",
"RankLLMRerank": "langchain_community.document_compressors.rankllm_rerank",
"FlashrankRerank": "langchain_community.document_compressors.flashrank_rerank",
"DashScopeRerank": "langchain_community.document_compressors.dashscope_rerank",
}
@ -41,4 +45,5 @@ __all__ = [
"FlashrankRerank",
"JinaRerank",
"RankLLMRerank",
"DashScopeRerank",
]

@ -0,0 +1,119 @@
from __future__ import annotations
from copy import deepcopy
from typing import Any, Dict, List, Optional, Sequence, Union
from langchain_core.callbacks.base import Callbacks
from langchain_core.documents import BaseDocumentCompressor, Document
from langchain_core.pydantic_v1 import Extra, Field, root_validator
from langchain_core.utils import get_from_dict_or_env
class DashScopeRerank(BaseDocumentCompressor):
"""Document compressor that uses `DashScope Rerank API`."""
client: Any = None
"""DashScope client to use for compressing documents."""
model: Optional[str] = None
"""Model to use for reranking."""
top_n: Optional[int] = 3
"""Number of documents to return."""
dashscope_api_key: Optional[str] = Field(None, alias="api_key")
"""DashScope API key. Must be specified directly or via environment variable
DASHSCOPE_API_KEY."""
class Config:
"""Configuration for this pydantic object."""
extra = Extra.forbid
arbitrary_types_allowed = True
allow_population_by_field_name = True
@root_validator()
def validate_environment(cls, values: Dict) -> Dict:
"""Validate that api key and python package exists in environment."""
if not values.get("client"):
try:
import dashscope
except ImportError:
raise ImportError(
"Could not import dashscope python package. "
"Please install it with `pip install dashscope`."
)
values["client"] = dashscope.TextReRank
values["dashscope_api_key"] = get_from_dict_or_env(
values, "dashscope_api_key", "DASHSCOPE_API_KEY"
)
values["model"] = dashscope.TextReRank.Models.gte_rerank
return values
def rerank(
self,
documents: Sequence[Union[str, Document, dict]],
query: str,
*,
top_n: Optional[int] = -1,
) -> List[Dict[str, Any]]:
"""Returns an ordered list of documents ordered by their relevance to the provided query.
Args:
query: The query to use for reranking.
documents: A sequence of documents to rerank.
top_n : The number of results to return. If None returns all results.
Defaults to self.top_n.
""" # noqa: E501
if len(documents) == 0: # to avoid empty api call
return []
docs = [
doc.page_content if isinstance(doc, Document) else doc for doc in documents
]
top_n = top_n if (top_n is None or top_n > 0) else self.top_n
results = self.client.call(
model=self.model,
query=query,
documents=docs,
top_n=top_n,
return_documents=False,
api_key=self.dashscope_api_key,
)
result_dicts = []
for res in results.output.results:
result_dicts.append(
{"index": res.index, "relevance_score": res.relevance_score}
)
return result_dicts
def compress_documents(
self,
documents: Sequence[Document],
query: str,
callbacks: Optional[Callbacks] = None,
) -> Sequence[Document]:
"""
Compress documents using DashScope's rerank API.
Args:
documents: A sequence of documents to compress.
query: The query to use for compressing the documents.
callbacks: Callbacks to run during the compression process.
Returns:
A sequence of compressed documents.
"""
compressed = []
for res in self.rerank(documents, query):
doc = documents[res["index"]]
doc_copy = Document(doc.page_content, metadata=deepcopy(doc.metadata))
doc_copy.metadata["relevance_score"] = res["relevance_score"]
compressed.append(doc_copy)
return compressed

@ -0,0 +1,24 @@
from langchain_core.documents import Document
from langchain_community.document_compressors.dashscope_rerank import (
DashScopeRerank,
)
def test_rerank() -> None:
reranker = DashScopeRerank(api_key=None)
docs = [
Document(page_content="量子计算是计算科学的一个前沿领域"),
Document(page_content="预训练语言模型的发展给文本排序模型带来了新的进展"),
Document(
page_content="文本排序模型广泛用于搜索引擎和推荐系统中,它们根据文本相关性对候选文本进行排序"
),
Document(page_content="random text for nothing"),
]
compressed = reranker.compress_documents(
query="什么是文本排序模型",
documents=docs,
)
assert len(compressed) == 3, "default top_n is 3"
assert compressed[0].page_content == docs[2].page_content, "rerank works"

@ -6,6 +6,7 @@ EXPECTED_ALL = [
"JinaRerank",
"RankLLMRerank",
"FlashrankRerank",
"DashScopeRerank",
]

Loading…
Cancel
Save