voyageai[patch]: VoyageAI rerank (#19521)

Adding VoyageAI reranking

---------

Co-authored-by: fodizoltan <zoltan@conway.expert>
Co-authored-by: Yujie Qian <thomasq0809@gmail.com>
pull/19627/head
fzowl 3 months ago committed by GitHub
parent 4d85485e71
commit aea2be5bf3
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

@ -0,0 +1,465 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "fc0db1bc",
"metadata": {},
"source": [
"# VoyageAI Reranker\n",
"\n",
">[Voyage AI](https://www.voyageai.com/) provides cutting-edge embedding/vectorizations models.\n",
"\n",
"This notebook shows how to use [Voyage AI's rerank endpoint](https://api.voyageai.com/v1/rerank) in a retriever. This builds on top of ideas in the [ContextualCompressionRetriever](/docs/modules/data_connection/retrievers/contextual_compression/)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4f5973bb-7897-4340-a8ce-c3365ee73b2f",
"metadata": {},
"outputs": [],
"source": [
"%pip install --upgrade --quiet voyageai\n",
"%pip install --upgrade --quiet langchain-voyageai"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b37bd138-4f3c-4d2c-bc4b-be705ce27a09",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"%pip install --upgrade --quiet faiss\n",
"\n",
"# OR (depending on Python version)\n",
"\n",
"%pip install --upgrade --quiet faiss-cpu"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "c47b0b26-6d51-4beb-aedb-ad09740a9a2b",
"metadata": {},
"outputs": [],
"source": [
"# To obtain your key, create an account on https://www.voyageai.com\n",
"\n",
"import getpass\n",
"import os\n",
"\n",
"os.environ[\"VOYAGE_API_KEY\"] = getpass.getpass(\"Voyage AI API Key:\")"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "6fa3d916",
"metadata": {
"jp-MarkdownHeadingCollapsed": true,
"tags": []
},
"outputs": [],
"source": [
"# Helper function for printing docs\n",
"\n",
"\n",
"def pretty_print_docs(docs):\n",
" print(\n",
" f\"\\n{'-' * 100}\\n\".join(\n",
" [f\"Document {i+1}:\\n\\n\" + d.page_content for i, d in enumerate(docs)]\n",
" )\n",
" )"
]
},
{
"cell_type": "markdown",
"id": "6fa3d916",
"metadata": {
"jp-MarkdownHeadingCollapsed": true,
"tags": []
},
"source": [
"## Set up the base vector store retriever\n",
"Let's start by initializing a simple vector store retriever and storing the 2023 State of the Union speech (in chunks). We can set up the retriever to retrieve a high number (20) of docs."
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "b7648612",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Document 1:\n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.\n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 2:\n",
"\n",
"As I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-given potential.\n",
"\n",
"While it often appears that we never agree, that isnt true. I signed 80 bipartisan bills into law last year. From preventing government shutdowns to protecting Asian-Americans from still-too-common hate crimes to reforming military justice.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 3:\n",
"\n",
"We cannot let this happen.\n",
"\n",
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections.\n",
"\n",
"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 4:\n",
"\n",
"He will never extinguish their love of freedom. He will never weaken the resolve of the free world.\n",
"\n",
"We meet tonight in an America that has lived through two of the hardest years this nation has ever faced.\n",
"\n",
"The pandemic has been punishing.\n",
"\n",
"And so many families are living paycheck to paycheck, struggling to keep up with the rising cost of food, gas, housing, and so much more.\n",
"\n",
"I understand.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 5:\n",
"\n",
"As Ive told Xi Jinping, it is never a good bet to bet against the American people.\n",
"\n",
"Well create good jobs for millions of Americans, modernizing roads, airports, ports, and waterways all across America.\n",
"\n",
"And well do it all to withstand the devastating effects of the climate crisis and promote environmental justice.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 6:\n",
"\n",
"I understand.\n",
"\n",
"I remember when my Dad had to leave our home in Scranton, Pennsylvania to find work. I grew up in a family where if the price of food went up, you felt it.\n",
"\n",
"Thats why one of the first things I did as President was fight to pass the American Rescue Plan.\n",
"\n",
"Because people were hurting. We needed to act, and we did.\n",
"\n",
"Few pieces of legislation have done more in a critical moment in our history to lift us out of crisis.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 7:\n",
"\n",
"I spoke with their families and told them that we are forever in debt for their sacrifice, and we will carry on their mission to restore the trust and safety every community deserves.\n",
"\n",
"Ive worked on these issues a long time.\n",
"\n",
"I know what works: Investing in crime prevention and community police officers wholl walk the beat, wholl know the neighborhood, and who can restore trust and safety.\n",
"\n",
"So lets not abandon our streets. Or choose between safety and equal justice.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 8:\n",
"\n",
"My administration is providing assistance with job training and housing, and now helping lower-income veterans get VA care debt-free.\n",
"\n",
"Our troops in Iraq and Afghanistan faced many dangers.\n",
"\n",
"One was stationed at bases and breathing in toxic smoke from “burn pits” that incinerated wastes of war—medical and hazard material, jet fuel, and more.\n",
"\n",
"When they came home, many of the worlds fittest and best trained warriors were never the same.\n",
"\n",
"Headaches. Numbness. Dizziness.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 9:\n",
"\n",
"And tonight, Im announcing that the Justice Department will name a chief prosecutor for pandemic fraud.\n",
"\n",
"By the end of this year, the deficit will be down to less than half what it was before I took office.\n",
"\n",
"The only president ever to cut the deficit by more than one trillion dollars in a single year.\n",
"\n",
"Lowering your costs also means demanding more competition.\n",
"\n",
"Im a capitalist, but capitalism without competition isnt capitalism.\n",
"\n",
"Its exploitation—and it drives up prices.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 10:\n",
"\n",
"Headaches. Numbness. Dizziness.\n",
"\n",
"A cancer that would put them in a flag-draped coffin.\n",
"\n",
"I know.\n",
"\n",
"One of those soldiers was my son Major Beau Biden.\n",
"\n",
"We dont know for sure if a burn pit was the cause of his brain cancer, or the diseases of so many of our troops.\n",
"\n",
"But Im committed to finding out everything we can.\n",
"\n",
"Committed to military families like Danielle Robinson from Ohio.\n",
"\n",
"The widow of Sergeant First Class Heath Robinson.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 11:\n",
"\n",
"I recently visited the New York City Police Department days after the funerals of Officer Wilbert Mora and his partner, Officer Jason Rivera.\n",
"\n",
"They were responding to a 9-1-1 call when a man shot and killed them with a stolen gun.\n",
"\n",
"Officer Mora was 27 years old.\n",
"\n",
"Officer Rivera was 22.\n",
"\n",
"Both Dominican Americans whod grown up on the same streets they later chose to patrol as police officers.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 12:\n",
"\n",
"This was a bipartisan effort, and I want to thank the members of both parties who worked to make it happen.\n",
"\n",
"Were done talking about infrastructure weeks.\n",
"\n",
"Were going to have an infrastructure decade.\n",
"\n",
"It is going to transform America and put us on a path to win the economic competition of the 21st Century that we face with the rest of the world—particularly with China.\n",
"\n",
"As Ive told Xi Jinping, it is never a good bet to bet against the American people.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 13:\n",
"\n",
"So lets not abandon our streets. Or choose between safety and equal justice.\n",
"\n",
"Lets come together to protect our communities, restore trust, and hold law enforcement accountable.\n",
"\n",
"Thats why the Justice Department required body cameras, banned chokeholds, and restricted no-knock warrants for its officers.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 14:\n",
"\n",
"Lets pass the Paycheck Fairness Act and paid leave.\n",
"\n",
"Raise the minimum wage to $15 an hour and extend the Child Tax Credit, so no one has to raise a family in poverty.\n",
"\n",
"Lets increase Pell Grants and increase our historic support of HBCUs, and invest in what Jill—our First Lady who teaches full-time—calls Americas best-kept secret: community colleges.\n",
"\n",
"And lets pass the PRO Act when a majority of workers want to form a union—they shouldnt be stopped.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 15:\n",
"\n",
"He met the Ukrainian people.\n",
"\n",
"From President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world.\n",
"\n",
"Groups of citizens blocking tanks with their bodies. Everyone from students to retirees teachers turned soldiers defending their homeland.\n",
"\n",
"In this struggle as President Zelenskyy said in his speech to the European Parliament “Light will win over darkness.” The Ukrainian Ambassador to the United States is here tonight.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 16:\n",
"\n",
"To all Americans, I will be honest with you, as Ive always promised. A Russian dictator, invading a foreign country, has costs around the world.\n",
"\n",
"And Im taking robust action to make sure the pain of our sanctions is targeted at Russias economy. And I will use every tool at our disposal to protect American businesses and consumers.\n",
"\n",
"Tonight, I can announce that the United States has worked with 30 other countries to release 60 Million barrels of oil from reserves around the world.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 17:\n",
"\n",
"A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since shes been nominated, shes received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.\n",
"\n",
"And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 18:\n",
"\n",
"But that trickle-down theory led to weaker economic growth, lower wages, bigger deficits, and the widest gap between those at the top and everyone else in nearly a century.\n",
"\n",
"Vice President Harris and I ran for office with a new economic vision for America.\n",
"\n",
"Invest in America. Educate Americans. Grow the workforce. Build the economy from the bottom up\n",
"and the middle out, not from the top down.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 19:\n",
"\n",
"Every Administration says theyll do it, but we are actually doing it.\n",
"\n",
"We will buy American to make sure everything from the deck of an aircraft carrier to the steel on highway guardrails are made in America.\n",
"\n",
"But to compete for the best jobs of the future, we also need to level the playing field with China and other competitors.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 20:\n",
"\n",
"The only nation that can be defined by a single word: possibilities.\n",
"\n",
"So on this night, in our 245th year as a nation, I have come to report on the State of the Union.\n",
"\n",
"And my report is this: the State of the Union is strong—because you, the American people, are strong.\n",
"\n",
"We are stronger today than we were a year ago.\n",
"\n",
"And we will be stronger a year from now than we are today.\n",
"\n",
"Now is our moment to meet and overcome the challenges of our time.\n",
"\n",
"And we will, as one people.\n",
"\n",
"One America.\n"
]
}
],
"source": [
"from langchain_community.document_loaders import TextLoader\n",
"from langchain_community.vectorstores import FAISS\n",
"from langchain_text_splitters import RecursiveCharacterTextSplitter\n",
"from langchain_voyageai import VoyageEmbeddings\n",
"\n",
"documents = TextLoader(\"../../modules/state_of_the_union.txt\").load()\n",
"text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)\n",
"texts = text_splitter.split_documents(documents)\n",
"retriever = FAISS.from_documents(\n",
" texts, VoyageEmbeddings(model=\"voyage-2\")\n",
").as_retriever(search_kwargs={\"k\": 20})\n",
"\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = retriever.get_relevant_documents(query)\n",
"pretty_print_docs(docs)"
]
},
{
"cell_type": "markdown",
"id": "b7648612",
"metadata": {},
"source": [
"## Doing reranking with VoyageAIRerank\n",
"Now let's wrap our base retriever with a `ContextualCompressionRetriever`. We'll add an `VoyageAIRerank`, uses the Voyage AI rerank endpoint to rerank the returned results."
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "b83dfedb",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Document 1:\n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.\n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 2:\n",
"\n",
"So lets not abandon our streets. Or choose between safety and equal justice.\n",
"\n",
"Lets come together to protect our communities, restore trust, and hold law enforcement accountable.\n",
"\n",
"Thats why the Justice Department required body cameras, banned chokeholds, and restricted no-knock warrants for its officers.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 3:\n",
"\n",
"I spoke with their families and told them that we are forever in debt for their sacrifice, and we will carry on their mission to restore the trust and safety every community deserves.\n",
"\n",
"Ive worked on these issues a long time.\n",
"\n",
"I know what works: Investing in crime prevention and community police officers wholl walk the beat, wholl know the neighborhood, and who can restore trust and safety.\n",
"\n",
"So lets not abandon our streets. Or choose between safety and equal justice.\n"
]
}
],
"source": [
"from langchain.retrievers import ContextualCompressionRetriever\n",
"from langchain_openai import OpenAI\n",
"from langchain_voyageai import VoyageAIRerank\n",
"\n",
"llm = OpenAI(temperature=0)\n",
"compressor = VoyageAIRerank(\n",
" model=\"rerank-lite-1\", voyageai_api_key=os.environ[\"VOYAGE_API_KEY\"], top_k=3\n",
")\n",
"compression_retriever = ContextualCompressionRetriever(\n",
" base_compressor=compressor, base_retriever=retriever\n",
")\n",
"\n",
"compressed_docs = compression_retriever.get_relevant_documents(\n",
" \"What did the president say about Ketanji Jackson Brown\"\n",
")\n",
"pretty_print_docs(compressed_docs)"
]
},
{
"cell_type": "markdown",
"id": "b83dfedb",
"metadata": {},
"source": [
"You can of course use this retriever within a QA pipeline"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "367dafe0",
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains import RetrievalQA"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "ae697ca4",
"metadata": {},
"outputs": [],
"source": [
"chain = RetrievalQA.from_chain_type(\n",
" llm=OpenAI(temperature=0), retriever=compression_retriever\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "46ee62fc",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'query': 'What did the president say about Ketanji Brown Jackson',\n",
" 'result': \" The president nominated Ketanji Brown Jackson to serve on the United States Supreme Court. \"}"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain({\"query\": query})"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

@ -22,3 +22,12 @@ See a [usage example](/docs/integrations/text_embedding/voyageai)
```python
from langchain_voyageai import VoyageAIEmbeddings
```
## Reranking
See a [usage example](/docs/integrations/document_transformers/voyageai-reranker)
```python
from langchain_voyageai import VoyageAIRerank
```

@ -0,0 +1,153 @@
from __future__ import annotations
import os
from copy import deepcopy
from typing import Dict, Optional, Sequence, Union
import voyageai # type: ignore
from langchain_core.callbacks.manager import Callbacks
from langchain_core.documents import Document
from langchain_core.documents.compressor import BaseDocumentCompressor
from langchain_core.pydantic_v1 import SecretStr, root_validator
from langchain_core.utils import convert_to_secret_str
from voyageai.object import RerankingObject # type: ignore
class VoyageAIRerank(BaseDocumentCompressor):
"""Document compressor that uses `VoyageAI Rerank API`."""
client: voyageai.Client = None
aclient: voyageai.AsyncClient = None
"""VoyageAI clients to use for compressing documents."""
voyage_api_key: Optional[SecretStr] = None
"""VoyageAI API key. Must be specified directly or via environment variable
VOYAGE_API_KEY."""
model: str
"""Model to use for reranking."""
top_k: Optional[int] = None
"""Number of documents to return."""
truncation: bool = True
class Config:
arbitrary_types_allowed = True
@root_validator(pre=True)
def validate_environment(cls, values: Dict) -> Dict:
"""Validate that api key exists in environment."""
voyage_api_key = values.get("voyage_api_key") or os.getenv(
"VOYAGE_API_KEY", None
)
if voyage_api_key:
api_key_secretstr = convert_to_secret_str(voyage_api_key)
values["voyage_api_key"] = api_key_secretstr
api_key_str = api_key_secretstr.get_secret_value()
else:
api_key_str = None
values["client"] = voyageai.Client(api_key=api_key_str)
values["aclient"] = voyageai.AsyncClient(api_key=api_key_str)
return values
def _rerank(
self,
documents: Sequence[Union[str, Document]],
query: str,
) -> RerankingObject:
"""Returns an ordered list of documents ordered by their relevance
to the provided query.
Args:
query: The query to use for reranking.
documents: A sequence of documents to rerank.
"""
docs = [
doc.page_content if isinstance(doc, Document) else doc for doc in documents
]
return self.client.rerank(
query=query,
documents=docs,
model=self.model,
top_k=self.top_k,
truncation=self.truncation,
)
async def _arerank(
self,
documents: Sequence[Union[str, Document]],
query: str,
) -> RerankingObject:
"""Returns an ordered list of documents ordered by their relevance
to the provided query.
Args:
query: The query to use for reranking.
documents: A sequence of documents to rerank.
"""
docs = [
doc.page_content if isinstance(doc, Document) else doc for doc in documents
]
return await self.aclient.rerank(
query=query,
documents=docs,
model=self.model,
top_k=self.top_k,
truncation=self.truncation,
)
def compress_documents(
self,
documents: Sequence[Document],
query: str,
callbacks: Optional[Callbacks] = None,
) -> Sequence[Document]:
"""
Compress documents using VoyageAI's rerank API.
Args:
documents: A sequence of documents to compress.
query: The query to use for compressing the documents.
callbacks: Callbacks to run during the compression process.
Returns:
A sequence of compressed documents in relevance_score order.
"""
if len(documents) == 0:
return []
compressed = []
for res in self._rerank(documents, query).results:
doc = documents[res.index]
doc_copy = Document(doc.page_content, metadata=deepcopy(doc.metadata))
doc_copy.metadata["relevance_score"] = res.relevance_score
compressed.append(doc_copy)
return compressed
async def acompress_documents(
self,
documents: Sequence[Document],
query: str,
callbacks: Optional[Callbacks] = None,
) -> Sequence[Document]:
"""
Compress documents using VoyageAI's rerank API.
Args:
documents: A sequence of documents to compress.
query: The query to use for compressing the documents.
callbacks: Callbacks to run during the compression process.
Returns:
A sequence of compressed documents in relevance_score order.
"""
if len(documents) == 0:
return []
compressed = []
for res in (await self._arerank(documents, query)).results:
doc = documents[res.index]
doc_copy = Document(doc.page_content, metadata=deepcopy(doc.metadata))
doc_copy.metadata["relevance_score"] = res.relevance_score
compressed.append(doc_copy)
return compressed

@ -0,0 +1,68 @@
"""Test the voyageai reranker."""
import os
from langchain_core.documents import Document
from langchain_voyageai.rerank import VoyageAIRerank
def test_voyageai_reranker_init() -> None:
"""Test the voyageai reranker initializes correctly."""
VoyageAIRerank(voyage_api_key="foo", model="foo")
def test_sync() -> None:
rerank = VoyageAIRerank(
voyage_api_key=os.environ["VOYAGE_API_KEY"],
model="rerank-lite-1",
)
doc_list = [
"The Mediterranean diet emphasizes fish, olive oil, and vegetables"
", believed to reduce chronic diseases.",
"Photosynthesis in plants converts light energy into glucose and "
"produces essential oxygen.",
"20th-century innovations, from radios to smartphones, centered "
"on electronic advancements.",
"Rivers provide water, irrigation, and habitat for aquatic species, "
"vital for ecosystems.",
"Apples conference call to discuss fourth fiscal quarter results and "
"business updates is scheduled for Thursday, November 2, 2023 at 2:00 "
"p.m. PT / 5:00 p.m. ET.",
"Shakespeare's works, like 'Hamlet' and 'A Midsummer Night's Dream,' "
"endure in literature.",
]
documents = [Document(page_content=x) for x in doc_list]
result = rerank.compress_documents(
query="When is the Apple's conference call scheduled?", documents=documents
)
assert len(doc_list) == len(result)
async def test_async() -> None:
rerank = VoyageAIRerank(
voyage_api_key=os.environ["VOYAGE_API_KEY"],
model="rerank-lite-1",
)
doc_list = [
"The Mediterranean diet emphasizes fish, olive oil, and vegetables"
", believed to reduce chronic diseases.",
"Photosynthesis in plants converts light energy into glucose and "
"produces essential oxygen.",
"20th-century innovations, from radios to smartphones, centered "
"on electronic advancements.",
"Rivers provide water, irrigation, and habitat for aquatic species, "
"vital for ecosystems.",
"Apples conference call to discuss fourth fiscal quarter results and "
"business updates is scheduled for Thursday, November 2, 2023 at 2:00 "
"p.m. PT / 5:00 p.m. ET.",
"Shakespeare's works, like 'Hamlet' and 'A Midsummer Night's Dream,' "
"endure in literature.",
]
documents = [Document(page_content=x) for x in doc_list]
result = await rerank.acompress_documents(
query="When is the Apple's conference call scheduled?", documents=documents
)
assert len(doc_list) == len(result)

@ -0,0 +1,83 @@
from collections import namedtuple
from typing import Any
import pytest # type: ignore
from langchain_core.documents import Document
from voyageai.api_resources import VoyageResponse # type: ignore
from voyageai.object import RerankingObject # type: ignore
from langchain_voyageai.rerank import VoyageAIRerank
doc_list = [
"The Mediterranean diet emphasizes fish, olive oil, and vegetables"
", believed to reduce chronic diseases.",
"Photosynthesis in plants converts light energy into glucose and "
"produces essential oxygen.",
"20th-century innovations, from radios to smartphones, centered "
"on electronic advancements.",
"Rivers provide water, irrigation, and habitat for aquatic species, "
"vital for ecosystems.",
"Apples conference call to discuss fourth fiscal quarter results and "
"business updates is scheduled for Thursday, November 2, 2023 at 2:00 "
"p.m. PT / 5:00 p.m. ET.",
"Shakespeare's works, like 'Hamlet' and 'A Midsummer Night's Dream,' "
"endure in literature.",
]
documents = [Document(page_content=x) for x in doc_list]
@pytest.mark.requires("voyageai")
def test_init() -> None:
VoyageAIRerank(
voyage_api_key="foo",
model="rerank-lite-1",
)
def get_mock_rerank_result() -> RerankingObject:
VoyageResultItem = namedtuple("VoyageResultItem", ["index", "relevance_score"])
Usage = namedtuple("Usage", ["total_tokens"])
voyage_response = VoyageResponse()
voyage_response.data = [
VoyageResultItem(index=1, relevance_score=0.9),
VoyageResultItem(index=0, relevance_score=0.8),
]
voyage_response.usage = Usage(total_tokens=255)
return RerankingObject(response=voyage_response, documents=doc_list)
@pytest.mark.requires("voyageai")
def test_rerank_unit_test(mocker: Any) -> None:
mocker.patch("voyageai.Client.rerank").return_value = get_mock_rerank_result()
expected_result = [
Document(
page_content="Photosynthesis in plants converts light energy into "
"glucose and produces essential oxygen.",
metadata={"relevance_score": 0.9},
),
Document(
page_content="The Mediterranean diet emphasizes fish, olive oil, and "
"vegetables, believed to reduce chronic diseases.",
metadata={"relevance_score": 0.8},
),
]
rerank = VoyageAIRerank(
voyage_api_key="foo",
model="rerank-lite-1",
)
result = rerank.compress_documents(
documents=documents, query="When is the Apple's conference call scheduled?"
)
assert expected_result == result
def test_rerank_empty_input() -> None:
rerank = VoyageAIRerank(
voyage_api_key="foo",
model="rerank-lite-1",
)
result = rerank.compress_documents(
documents=[], query="When is the Apple's conference call scheduled?"
)
assert len(result) == 0
Loading…
Cancel
Save