Add Cohere retrieval augmented generation to retrievers (#11483)

Add Cohere retrieval augmented generation to retrievers --------- Co-authored-by: Bagatur <baskaryan@gmail.com>
10 months ago · f4742dce50
parent 0a24ac7388
commit f4742dce50
4 changed files with 359 additions and 18 deletions
--- a/docs/docs_skeleton/docs/integrations/retrievers/cohere.ipynb
+++ b/docs/docs_skeleton/docs/integrations/retrievers/cohere.ipynb
@ -0,0 +1,223 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "bf733a38-db84-4363-89e2-de6735c37230",
+   "metadata": {},
+   "source": [
+    "# Cohere RAG retriever\n",
+    "\n",
+    "This notebook covers how to get started with Cohere RAG retriever. This allows you to leverage the ability to search documents over various connectors or by supplying your own."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "d4a7c55d-b235-4ca4-a579-c90cc9570da9",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "from langchain.chat_models import ChatCohere\n",
+    "from langchain.retrievers import CohereRagRetriever\n",
+    "from langchain.schema.document import Document"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "70cf04e8-423a-4ff6-8b09-f11fb711c817",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "rag = CohereRagRetriever(llm=ChatCohere())"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "7f1e1bf3-542c-4fcb-8643-de6897fa6fcc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def _pretty_print(docs):\n",
+    "    for doc in docs:\n",
+    "        print(doc.metadata)\n",
+    "        print(\"\\n\\n\" + doc.page_content)\n",
+    "        print(\"\\n\\n\" + \"-\" * 30 + \"\\n\\n\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "8199ef8f-eb8b-4253-9ea0-6c24a013ca4c",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'id': 'web-search_4:0', 'snippet': 'AI startup Cohere, now valued at over $2.1B, raises $270M\\n\\nKyle Wiggers 4 months\\n\\nIn a sign that there’s plenty of cash to go around for generative AI startups, Cohere, which is developing an AI model ecosystem for the enterprise, today announced that it raised $270 million as part of its Series C round.\\n\\nReuters reported earlier in the year that Cohere was in talks to raise “hundreds of millions” of dollars at a valuation of upward of just over $6 billion. If there’s credence to that reporting, Cohere appears to have missed the valuation mark substantially; a source familiar with the matter tells TechCrunch that this tranche values the company at between $2.1 billion and $2.2 billion.', 'title': 'AI startup Cohere, now valued at over $2.1B, raises $270M | TechCrunch', 'url': 'https://techcrunch.com/2023/06/08/ai-startup-cohere-now-valued-at-over-2-1b-raises-270m/'}\n",
+      "\n",
+      "\n",
+      "AI startup Cohere, now valued at over $2.1B, raises $270M\n",
+      "\n",
+      "Kyle Wiggers 4 months\n",
+      "\n",
+      "In a sign that there’s plenty of cash to go around for generative AI startups, Cohere, which is developing an AI model ecosystem for the enterprise, today announced that it raised $270 million as part of its Series C round.\n",
+      "\n",
+      "Reuters reported earlier in the year that Cohere was in talks to raise “hundreds of millions” of dollars at a valuation of upward of just over $6 billion. If there’s credence to that reporting, Cohere appears to have missed the valuation mark substantially; a source familiar with the matter tells TechCrunch that this tranche values the company at between $2.1 billion and $2.2 billion.\n",
+      "\n",
+      "\n",
+      "------------------------------\n",
+      "\n",
+      "\n",
+      "{'id': 'web-search_9:0', 'snippet': 'Cohere is a Canadian multinational technology company focused on artificial intelligence for the enterprise, specializing in large language models. Cohere was founded in 2019 by Aidan Gomez, Ivan Zhang, and Nick Frosst, and is headquartered in Toronto and San Francisco, with offices in Palo Alto and London.\\n\\nIn 2017, a team of researchers at Google Brain, which included Aidan Gomez, published a paper called \"Attention is All You Need,\" which introduced the transformer machine learning architecture, setting state-of-the-art performance on a variety of natural language processing tasks. In 2019, Gomez and Nick Frosst, another researcher at Google Brain, founded Cohere along with Ivan Zhang, with whom Gomez had done research at FOR.ai. All of the co-founders attended University of Toronto.', 'title': 'Cohere - Wikipedia', 'url': 'https://en.wikipedia.org/wiki/Cohere'}\n",
+      "\n",
+      "\n",
+      "Cohere is a Canadian multinational technology company focused on artificial intelligence for the enterprise, specializing in large language models. Cohere was founded in 2019 by Aidan Gomez, Ivan Zhang, and Nick Frosst, and is headquartered in Toronto and San Francisco, with offices in Palo Alto and London.\n",
+      "\n",
+      "In 2017, a team of researchers at Google Brain, which included Aidan Gomez, published a paper called \"Attention is All You Need,\" which introduced the transformer machine learning architecture, setting state-of-the-art performance on a variety of natural language processing tasks. In 2019, Gomez and Nick Frosst, another researcher at Google Brain, founded Cohere along with Ivan Zhang, with whom Gomez had done research at FOR.ai. All of the co-founders attended University of Toronto.\n",
+      "\n",
+      "\n",
+      "------------------------------\n",
+      "\n",
+      "\n",
+      "{'id': 'web-search_8:2', 'snippet': ' Cofounded by Aidan Gomez, a Google Brain alum and coauthor of the seminal transformer research paper, Cohere describes itself as being “on a mission to transform enterprises and their products with AI to unlock a more intuitive way to generate, search, and summarize information than ever before.” One key element of Cohere’s approach is its focus on data protection, deploying its models inside enterprises’ secure data environment.\\n\\n“We are both independent and cloud-agnostic, meaning we are not beholden to any one tech company and empower enterprises to implement customized AI solutions on the cloud of their choosing, or even on-premises,” says Martin Kon, COO and president of Cohere.', 'title': 'McKinsey and Cohere collaborate to transform clients with enterprise generative AI', 'url': 'https://www.mckinsey.com/about-us/new-at-mckinsey-blog/mckinsey-and-cohere-collaborate-to-transform-clients-with-enterprise-generative-ai'}\n",
+      "\n",
+      "\n",
+      " Cofounded by Aidan Gomez, a Google Brain alum and coauthor of the seminal transformer research paper, Cohere describes itself as being “on a mission to transform enterprises and their products with AI to unlock a more intuitive way to generate, search, and summarize information than ever before.” One key element of Cohere’s approach is its focus on data protection, deploying its models inside enterprises’ secure data environment.\n",
+      "\n",
+      "“We are both independent and cloud-agnostic, meaning we are not beholden to any one tech company and empower enterprises to implement customized AI solutions on the cloud of their choosing, or even on-premises,” says Martin Kon, COO and president of Cohere.\n",
+      "\n",
+      "\n",
+      "------------------------------\n",
+      "\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "_pretty_print(rag.get_relevant_documents(\"What is cohere ai?\"))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "4b888336",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'id': 'web-search_9:0', 'snippet': 'Cohere is a Canadian multinational technology company focused on artificial intelligence for the enterprise, specializing in large language models. Cohere was founded in 2019 by Aidan Gomez, Ivan Zhang, and Nick Frosst, and is headquartered in Toronto and San Francisco, with offices in Palo Alto and London.\\n\\nIn 2017, a team of researchers at Google Brain, which included Aidan Gomez, published a paper called \"Attention is All You Need,\" which introduced the transformer machine learning architecture, setting state-of-the-art performance on a variety of natural language processing tasks. In 2019, Gomez and Nick Frosst, another researcher at Google Brain, founded Cohere along with Ivan Zhang, with whom Gomez had done research at FOR.ai. All of the co-founders attended University of Toronto.', 'title': 'Cohere - Wikipedia', 'url': 'https://en.wikipedia.org/wiki/Cohere'}\n",
+      "\n",
+      "\n",
+      "Cohere is a Canadian multinational technology company focused on artificial intelligence for the enterprise, specializing in large language models. Cohere was founded in 2019 by Aidan Gomez, Ivan Zhang, and Nick Frosst, and is headquartered in Toronto and San Francisco, with offices in Palo Alto and London.\n",
+      "\n",
+      "In 2017, a team of researchers at Google Brain, which included Aidan Gomez, published a paper called \"Attention is All You Need,\" which introduced the transformer machine learning architecture, setting state-of-the-art performance on a variety of natural language processing tasks. In 2019, Gomez and Nick Frosst, another researcher at Google Brain, founded Cohere along with Ivan Zhang, with whom Gomez had done research at FOR.ai. All of the co-founders attended University of Toronto.\n",
+      "\n",
+      "\n",
+      "------------------------------\n",
+      "\n",
+      "\n",
+      "{'id': 'web-search_8:2', 'snippet': ' Cofounded by Aidan Gomez, a Google Brain alum and coauthor of the seminal transformer research paper, Cohere describes itself as being “on a mission to transform enterprises and their products with AI to unlock a more intuitive way to generate, search, and summarize information than ever before.” One key element of Cohere’s approach is its focus on data protection, deploying its models inside enterprises’ secure data environment.\\n\\n“We are both independent and cloud-agnostic, meaning we are not beholden to any one tech company and empower enterprises to implement customized AI solutions on the cloud of their choosing, or even on-premises,” says Martin Kon, COO and president of Cohere.', 'title': 'McKinsey and Cohere collaborate to transform clients with enterprise generative AI', 'url': 'https://www.mckinsey.com/about-us/new-at-mckinsey-blog/mckinsey-and-cohere-collaborate-to-transform-clients-with-enterprise-generative-ai'}\n",
+      "\n",
+      "\n",
+      " Cofounded by Aidan Gomez, a Google Brain alum and coauthor of the seminal transformer research paper, Cohere describes itself as being “on a mission to transform enterprises and their products with AI to unlock a more intuitive way to generate, search, and summarize information than ever before.” One key element of Cohere’s approach is its focus on data protection, deploying its models inside enterprises’ secure data environment.\n",
+      "\n",
+      "“We are both independent and cloud-agnostic, meaning we are not beholden to any one tech company and empower enterprises to implement customized AI solutions on the cloud of their choosing, or even on-premises,” says Martin Kon, COO and president of Cohere.\n",
+      "\n",
+      "\n",
+      "------------------------------\n",
+      "\n",
+      "\n",
+      "{'id': 'web-search_4:0', 'snippet': 'AI startup Cohere, now valued at over $2.1B, raises $270M\\n\\nKyle Wiggers 4 months\\n\\nIn a sign that there’s plenty of cash to go around for generative AI startups, Cohere, which is developing an AI model ecosystem for the enterprise, today announced that it raised $270 million as part of its Series C round.\\n\\nReuters reported earlier in the year that Cohere was in talks to raise “hundreds of millions” of dollars at a valuation of upward of just over $6 billion. If there’s credence to that reporting, Cohere appears to have missed the valuation mark substantially; a source familiar with the matter tells TechCrunch that this tranche values the company at between $2.1 billion and $2.2 billion.', 'title': 'AI startup Cohere, now valued at over $2.1B, raises $270M | TechCrunch', 'url': 'https://techcrunch.com/2023/06/08/ai-startup-cohere-now-valued-at-over-2-1b-raises-270m/'}\n",
+      "\n",
+      "\n",
+      "AI startup Cohere, now valued at over $2.1B, raises $270M\n",
+      "\n",
+      "Kyle Wiggers 4 months\n",
+      "\n",
+      "In a sign that there’s plenty of cash to go around for generative AI startups, Cohere, which is developing an AI model ecosystem for the enterprise, today announced that it raised $270 million as part of its Series C round.\n",
+      "\n",
+      "Reuters reported earlier in the year that Cohere was in talks to raise “hundreds of millions” of dollars at a valuation of upward of just over $6 billion. If there’s credence to that reporting, Cohere appears to have missed the valuation mark substantially; a source familiar with the matter tells TechCrunch that this tranche values the company at between $2.1 billion and $2.2 billion.\n",
+      "\n",
+      "\n",
+      "------------------------------\n",
+      "\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "_pretty_print(await rag.aget_relevant_documents(\"What is cohere ai?\")) # async version"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "3742ba0f",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'id': 'doc-0', 'snippet': 'Langchain supports cohere RAG!'}\n",
+      "\n",
+      "\n",
+      "Langchain supports cohere RAG!\n",
+      "\n",
+      "\n",
+      "------------------------------\n",
+      "\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "docs = rag.get_relevant_documents(\n",
+    "        \"Does langchain support cohere RAG?\",\n",
+    "        source_documents=[Document(page_content=\"Langchain supports cohere RAG!\"), Document(page_content=\"The sky is blue!\")]\n",
+    "    )\n",
+    "_pretty_print(docs)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f96b4b09-7e0b-412f-bd7b-2b2d19d8ac6a",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "poetry-venv",
+   "language": "python",
+   "name": "poetry-venv"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.1"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
--- a/libs/langchain/langchain/chat_models/cohere.py
+++ b/libs/langchain/langchain/chat_models/cohere.py
@ -32,6 +32,44 @@ def get_role(message: BaseMessage) -> str:
        raise ValueError(f"Got unknown type {message}")


+def get_cohere_chat_request(
+    messages: List[BaseMessage],
+    *,
+    connectors: Optional[List[Dict[str, str]]] = None,
+    **kwargs: Any,
+) -> Dict[str, Any]:
+    documents = (
+        None
+        if "source_documents" not in kwargs
+        else [
+            {
+                "snippet": doc.page_content,
+                "id": doc.metadata.get("id") or f"doc-{str(i)}",
+            }
+            for i, doc in enumerate(kwargs["source_documents"])
+        ]
+    )
+    kwargs.pop("source_documents", None)
+    maybe_connectors = connectors if documents is None else None
+
+    # by enabling automatic prompt truncation, the probability of request failure is
+    # reduced with minimal impact on response quality
+    prompt_truncation = (
+        "AUTO" if documents is not None or connectors is not None else None
+    )
+
+    return {
+        "message": messages[0].content,
+        "chat_history": [
+            {"role": get_role(x), "message": x.content} for x in messages[1:]
+        ],
+        "documents": documents,
+        "connectors": maybe_connectors,
+        "prompt_truncation": prompt_truncation,
+        **kwargs,
+    }
+
+
 class ChatCohere(BaseChatModel, BaseCohere):
    """`Cohere` chat large language models.

@ -73,18 +111,6 @@ class ChatCohere(BaseChatModel, BaseCohere):
        """Get the identifying parameters."""
        return {**{"model": self.model}, **self._default_params}

-    def get_cohere_chat_request(
-        self, messages: List[BaseMessage], **kwargs: Any
-    ) -> Dict[str, Any]:
-        return {
-            "message": messages[0].content,
-            "chat_history": [
-                {"role": get_role(x), "message": x.content} for x in messages[1:]
-            ],
-            **self._default_params,
-            **kwargs,
-        }
-
    def _stream(
        self,
        messages: List[BaseMessage],
@ -92,7 +118,7 @@ class ChatCohere(BaseChatModel, BaseCohere):
        run_manager: Optional[CallbackManagerForLLMRun] = None,
        **kwargs: Any,
    ) -> Iterator[ChatGenerationChunk]:
-        request = self.get_cohere_chat_request(messages, **kwargs)
+        request = get_cohere_chat_request(messages, **self._default_params, **kwargs)
        stream = self.client.chat(**request, stream=True)

        for data in stream:
@ -109,7 +135,7 @@ class ChatCohere(BaseChatModel, BaseCohere):
        run_manager: Optional[AsyncCallbackManagerForLLMRun] = None,
        **kwargs: Any,
    ) -> AsyncIterator[ChatGenerationChunk]:
-        request = self.get_cohere_chat_request(messages, **kwargs)
+        request = get_cohere_chat_request(messages, **self._default_params, **kwargs)
        stream = await self.async_client.chat(**request, stream=True)

        async for data in stream:
@ -132,11 +158,18 @@ class ChatCohere(BaseChatModel, BaseCohere):
            )
            return _generate_from_stream(stream_iter)

-        request = self.get_cohere_chat_request(messages, **kwargs)
+        request = get_cohere_chat_request(messages, **self._default_params, **kwargs)
        response = self.client.chat(**request)

        message = AIMessage(content=response.text)
-        return ChatResult(generations=[ChatGeneration(message=message)])
+        generation_info = None
+        if hasattr(response, "documents"):
+            generation_info = {"documents": response.documents}
+        return ChatResult(
+            generations=[
+                ChatGeneration(message=message, generation_info=generation_info)
+            ]
+        )

    async def _agenerate(
        self,
@ -151,11 +184,18 @@ class ChatCohere(BaseChatModel, BaseCohere):
            )
            return await _agenerate_from_stream(stream_iter)

-        request = self.get_cohere_chat_request(messages, **kwargs)
+        request = get_cohere_chat_request(messages, **self._default_params, **kwargs)
        response = self.client.chat(**request, stream=False)

        message = AIMessage(content=response.text)
-        return ChatResult(generations=[ChatGeneration(message=message)])
+        generation_info = None
+        if hasattr(response, "documents"):
+            generation_info = {"documents": response.documents}
+        return ChatResult(
+            generations=[
+                ChatGeneration(message=message, generation_info=generation_info)
+            ]
+        )

    def get_num_tokens(self, text: str) -> int:
        """Calculate number of tokens."""
--- a/libs/langchain/langchain/retrievers/init.py
+++ b/libs/langchain/langchain/retrievers/init.py
@ -24,6 +24,7 @@ from langchain.retrievers.azure_cognitive_search import AzureCognitiveSearchRetr
 from langchain.retrievers.bm25 import BM25Retriever
 from langchain.retrievers.chaindesk import ChaindeskRetriever
 from langchain.retrievers.chatgpt_plugin_retriever import ChatGPTPluginRetriever
+from langchain.retrievers.cohere_rag_retriever import CohereRagRetriever
 from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
 from langchain.retrievers.docarray import DocArrayRetriever
 from langchain.retrievers.elastic_search_bm25 import ElasticSearchBM25Retriever
@ -77,6 +78,7 @@ __all__ = [
    "ChatGPTPluginRetriever",
    "ContextualCompressionRetriever",
    "ChaindeskRetriever",
+    "CohereRagRetriever",
    "ElasticSearchBM25Retriever",
    "GoogleDocumentAIWarehouseRetriever",
    "GoogleCloudEnterpriseSearchRetriever",
--- a/libs/langchain/langchain/retrievers/cohere_rag_retriever.py
+++ b/libs/langchain/langchain/retrievers/cohere_rag_retriever.py
@ -0,0 +1,76 @@
+from __future__ import annotations
+
+from typing import TYPE_CHECKING, Any, Dict, List
+
+from langchain.callbacks.manager import (
+    AsyncCallbackManagerForRetrieverRun,
+    CallbackManagerForRetrieverRun,
+)
+from langchain.chat_models.base import BaseChatModel
+from langchain.pydantic_v1 import Field
+from langchain.schema import BaseRetriever, Document, HumanMessage
+
+if TYPE_CHECKING:
+    from langchain.schema.messages import BaseMessage
+
+
+def _get_docs(response: Any) -> List[Document]:
+    return [
+        Document(page_content=doc["snippet"], metadata=doc)
+        for doc in response.generation_info["documents"]
+    ]
+
+
+class CohereRagRetriever(BaseRetriever):
+    """`ChatGPT plugin` retriever."""
+
+    top_k: int = 3
+    """Number of documents to return."""
+
+    connectors: List[Dict] = Field(default_factory=lambda: [{"id": "web-search"}])
+    """
+    When specified, the model's reply will be enriched with information found by
+    querying each of the connectors (RAG). These will be returned as langchain
+    documents.
+
+    Currently only accepts {"id": "web-search"}.
+    """
+
+    llm: BaseChatModel
+    """Cohere ChatModel to use."""
+
+    class Config:
+        """Configuration for this pydantic object."""
+
+        arbitrary_types_allowed = True
+        """Allow arbitrary types."""
+
+    def _get_relevant_documents(
+        self, query: str, *, run_manager: CallbackManagerForRetrieverRun, **kwargs: Any
+    ) -> List[Document]:
+        messages: List[List[BaseMessage]] = [[HumanMessage(content=query)]]
+        res = self.llm.generate(
+            messages,
+            connectors=self.connectors,
+            callbacks=run_manager.get_child(),
+            **kwargs,
+        ).generations[0][0]
+        return _get_docs(res)[: self.top_k]
+
+    async def _aget_relevant_documents(
+        self,
+        query: str,
+        *,
+        run_manager: AsyncCallbackManagerForRetrieverRun,
+        **kwargs: Any,
+    ) -> List[Document]:
+        messages: List[List[BaseMessage]] = [[HumanMessage(content=query)]]
+        res = (
+            await self.llm.agenerate(
+                messages,
+                connectors=self.connectors,
+                callbacks=run_manager.get_child(),
+                **kwargs,
+            )
+        ).generations[0][0]
+        return _get_docs(res)[: self.top_k]