mirror of
https://github.com/hwchase17/langchain
synced 2024-10-31 15:20:26 +00:00
89ef440c14
- **Description**: Adding retrievers for [kay.ai](https://kay.ai) and SEC filings powered by Kay and Cybersyn. Kay provides context as a service: it's an API built for RAG. - **Issue**: N/A - **Dependencies**: Just added a dep to the [kay](https://pypi.org/project/kay/) package - **Tag maintainer**: @baskaryan @hwchase17 Discussed in slack - **Twtter handle:** [@vishalrohra_](https://twitter.com/vishalrohra_) --------- Co-authored-by: Bagatur <baskaryan@gmail.com>
208 lines
13 KiB
Plaintext
208 lines
13 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "263f914c-9d67-4316-8b3d-03c3b99ba9d8",
|
||
"metadata": {},
|
||
"source": [
|
||
"Kay.ai\n",
|
||
"=\n",
|
||
"\n",
|
||
"> Data API built for RAG 🕵️ We are curating the world's largest datasets as high-quality embeddings so your AI agents can retrieve context on the fly. Latest models, fast retrieval, and zero infra.\n",
|
||
"\n",
|
||
"This notebook shows you how to retrieve datasets supported by [Kay](https://kay.ai/). You can currently search SEC Filings and Press Releases of US companies. Visit [kay.ai](https://kay.ai) for the latest data drops. For any questions, join our [discord](https://discord.gg/hAnE4e5T6M) or [tweet at us](https://twitter.com/vishalrohra_)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "fc507b8e-ea51-417c-93da-42bf998a1195",
|
||
"metadata": {},
|
||
"source": [
|
||
"Installation\n",
|
||
"=\n",
|
||
"\n",
|
||
"First you will need to install the [`kay` package](https://pypi.org/project/kay/). You will also need an API key: you can get one for free at [https://kay.ai](https://kay.ai/). Once you have an API key, you must set it as an environment variable `KAY_API_KEY`.\n",
|
||
"\n",
|
||
"`KayAiRetriever` has a static `.create()` factory method that takes the following arguments:\n",
|
||
"\n",
|
||
"* `dataset_id: string` required -- A Kay dataset id. This is a collection of data about a particular entity such as companies, people, or places. For example, try `\"company\"` \n",
|
||
"* `data_type: List[string]` optional -- This is a category within a dataset based on its origin or format, such as ‘SEC Filings’, ‘Press Releases’, or ‘Reports’ within the “company” dataset. For example, try [\"10-K\", \"10-Q\", \"PressRelease\"] under the “company” dataset. If left empty, Kay will retrieve the most relevant context across all types.\n",
|
||
"* `num_contexts: int` optional, defaults to 6 -- The number of document chunks to retrieve on each call to `get_relevant_documents()`"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "c923bea0-585a-4f62-8662-efc167e8d793",
|
||
"metadata": {},
|
||
"source": [
|
||
"Examples\n",
|
||
"=\n",
|
||
"\n",
|
||
"Basic Retriever Usage\n",
|
||
"-"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 2,
|
||
"id": "f7b8c99c-0341-4f3c-912f-a11e98f7de71",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdin",
|
||
"output_type": "stream",
|
||
"text": [
|
||
" ········\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"# Setup API key\n",
|
||
"from getpass import getpass\n",
|
||
"KAY_API_KEY = getpass()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 20,
|
||
"id": "b4d4d386-2a6b-4942-863e-9202f5a9f1d6",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"from langchain.retrievers import KayAiRetriever\n",
|
||
"import os\n",
|
||
"from kay.rag.retrievers import KayRetriever\n",
|
||
"os.environ[\"KAY_API_KEY\"] = KAY_API_KEY\n",
|
||
"retriever = KayAiRetriever.create(dataset_id=\"company\", data_types=[\"10-K\", \"10-Q\", \"PressRelease\"], num_contexts=3)\n",
|
||
"docs = retriever.get_relevant_documents(\"What were the biggest strategy changes and partnerships made by Roku in 2023??\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 21,
|
||
"id": "04ee2d6b-c2ab-4e15-8a8b-afaf6ef8c0f6",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"[Document(page_content='Company Name: ROKU INC\\nCompany Industry: CABLE & OTHER PAY TELEVISION SERVICES\\nArticle Title: Roku and FreeWheel Announce Strategic Partnership to Bring Roku’s Leading Ad Tech to FreeWheel Customers\\nText: Additionally, eMarketer Link: https://cts.businesswire.com/ct/CT?id=smartlink&url=https%3A%2F%2Fwww.insiderintelligence.com%2Finsights%2Favod-more-than-50-percent-of-us-digital-video-viewers%2F&esheet=53451144&newsitemid=20230712907788&lan=en-US&anchor=eMarketer&index=4&md5=b64dea72bcf6b6379474462602781d83 projects 57% of U.S. digital video users will stream an advertising-based video on demand (AVOD) service this year.\\nHaving solutions aimed at driving greater interoperability and automation will help accelerate this growth.\\nKey highlights of this collaboration include:\\nStreamlined Integration: Roku has now integrated its demand application programming interface (dAPI) with FreeWheel s TV platform. Roku s demand API gives publishers direct, automatic and real-time access to more advertiser demand. This enhanced integration allows for streamlined ad operation workflows and better inventory quality control, both of which will improve publisher yield and revenue.\\nSeamless Data Targeting: Publishers can now use Roku platform signals to enable advertisers to target audiences and measure campaign performance without relying on cookies. Additionally, FreeWheel and Roku will rely on data clean room technology to enable the activation of additional data sets providing better measurement and monetization to publishers and agencies.', metadata={'_additional': {'id': '962b79e0-f9d1-43ae-9f7a-8a9b42bc7a9a'}, 'chunk_type': 'text', 'chunk_years_mentioned': [], 'company_name': 'ROKU INC', 'company_sic_code_description': 'CABLE & OTHER PAY TELEVISION SERVICES', 'data_source': 'PressRelease', 'data_source_link': 'https://www.nasdaq.com/press-release/roku-and-freewheel-announce-strategic-partnership-to-bring-rokus-leading-ad-tech-to', 'data_source_publish_date': '2023-07-12T00:00:00Z', 'data_source_uid': 'a46f309c-705d-3946-96db-87aa4e73261f', 'title': 'ROKU INC | Roku and FreeWheel Announce Strategic Partnership to Bring Roku’s Leading Ad Tech to FreeWheel Customers'}),\n",
|
||
" Document(page_content='Company Name: ROKU INC \\n Company Industry: CABLE & OTHER PAY TELEVISION SERVICES \\n Form Title: 10-K 2022-FY \\n Form Section: Risk Factors \\n Text: nd the Note Regarding Forward Looking Statements.This section of this Annual Report generally discusses fiscal years 2022 and 2021 and year to year comparisons between those years.Discussions of fiscal year 2020 and year to year comparisons between fiscal years 2021 and 2020 that are not included in this Annual Report can be found in Management\\'s Discussion and Analysis of Financial Condition and Results of Operations in Part II, Item 7 of our Annual Report for the fiscal year ended December 31, 2021 filed with the SEC on February 18, 2022.Overview Effective as of the fourth quarter of fiscal 2022, we reorganized our reportable segments to better align with management\\'s reporting of information reviewed by the Chief Operating Decision Maker (\"CODM\") for each segment.We renamed our \"player\" segment to \"devices\" which now includes our licensing arrangements with service operators and licensed Roku TV partners in addition to sales of our streaming players, audio products, smart home products and Roku branded TVs that will be designed, made, and sold by us in 2023.Our historical segment information is recast to conform to our new presentation in our financial statements and accompanying notes included in Item 8 of this Annual Report.Our two reportable segments are the platform segment and the devices segment.', metadata={'_additional': {'id': 'a76c5fed-5d63-45a7-b63a-2c30e05140fc'}, 'chunk_type': 'text', 'chunk_years_mentioned': [2020, 2021, 2022, 2023], 'company_name': 'ROKU INC', 'company_sic_code_description': 'CABLE & OTHER PAY TELEVISION SERVICES', 'data_source': '10-K', 'data_source_link': 'https://www.sec.gov/Archives/edgar/data/1428439/000142843923000007', 'data_source_publish_date': '2022-01-01T00:00:00Z', 'data_source_uid': '0001428439-23-000007', 'title': 'ROKU INC | 10-K 2022-FY '}),\n",
|
||
" Document(page_content='Company Name: ROKU INC \\n Company Industry: CABLE & OTHER PAY TELEVISION SERVICES \\n Form Title: 10-Q 2023-Q1 \\n Form Section: Risk Factors \\n Text: Our current and potential partners include TV brands, cable and satellite companies, and telecommunication providers.Under these license arrangements, we generally have limited or no control over the amount and timing of resources these entities dedicate to the relationship.In the past, our licensed Roku TV partners have failed to meet their forecasts and anticipated market launch dates for distributing Roku TV models, and they may fail to meet their forecasts or such launches in the future.If our licensed Roku TV partners or service operator partners fail to meet their forecasts or such launches for distributing licensed streaming devices or choose to deploy competing streaming solutions within their product lines, our business may be harmed.We depend on a small number of content publishers for a majority of our streaming hours, and if we fail to maintain these relationships, our business could be harmed.*Historically, a small number of content publishers have accounted for a significant portion of the hours streamed on our platform.In the three months ended March 31, 2023, the top three streaming services represented over 50% of all hours streamed in the period.If, for any reason, we cease distributing channels that have historically streamed a large percentage of the aggregate streaming hours on our platform, our streaming hours, our active accounts, or Roku streaming device sales may be adversely affected, and our business may be harmed.', metadata={'_additional': {'id': '2a92b2bb-02a0-4e15-8b64-d7e04078a205'}, 'chunk_type': 'text', 'chunk_years_mentioned': [2023], 'company_name': 'ROKU INC', 'company_sic_code_description': 'CABLE & OTHER PAY TELEVISION SERVICES', 'data_source': '10-Q', 'data_source_link': 'https://www.sec.gov/Archives/edgar/data/1428439/000142843923000017', 'data_source_publish_date': '2023-01-01T00:00:00Z', 'data_source_uid': '0001428439-23-000017', 'title': 'ROKU INC | 10-Q 2023-Q1 '})]"
|
||
]
|
||
},
|
||
"execution_count": 21,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"docs"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "21f6e9e5-478c-4b2c-9d61-f7a84f4d2f8f",
|
||
"metadata": {},
|
||
"source": [
|
||
"Usage in a chain\n",
|
||
"-"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 5,
|
||
"id": "d1cba716-ab8d-4518-9196-43f17eb189dc",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdin",
|
||
"output_type": "stream",
|
||
"text": [
|
||
" ········\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"OPENAI_API_KEY = getpass()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 6,
|
||
"id": "79441f1f-fa06-452c-bcd6-160ad0debc6a",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"os.environ[\"OPENAI_API_KEY\"] = OPENAI_API_KEY"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 16,
|
||
"id": "0c504bcd-f6e0-4028-a797-b31fb4b6d027",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"from langchain.chat_models import ChatOpenAI\n",
|
||
"from langchain.chains import ConversationalRetrievalChain\n",
|
||
"\n",
|
||
"model = ChatOpenAI(model_name=\"gpt-3.5-turbo\")\n",
|
||
"qa = ConversationalRetrievalChain.from_llm(model, retriever=retriever)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 22,
|
||
"id": "977f158b-38d3-4b5f-9379-7cdd09436327",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"-> **Question**: What were the biggest strategy changes and partnerships made by Roku in 2023? \n",
|
||
"\n",
|
||
"**Answer**: In 2023, Roku made a strategic partnership with FreeWheel to bring Roku's leading ad tech to FreeWheel customers. This partnership aimed to drive greater interoperability and automation in the advertising-based video on demand (AVOD) space. Key highlights of this collaboration include streamlined integration of Roku's demand application programming interface (dAPI) with FreeWheel's TV platform, allowing for better inventory quality control and improved publisher yield and revenue. Additionally, publishers can now use Roku platform signals to enable advertisers to target audiences and measure campaign performance without relying on cookies. This partnership also involves the use of data clean room technology to enable the activation of additional data sets for better measurement and monetization for publishers and agencies. These partnerships and strategies aim to support Roku's growth in the AVOD market. \n",
|
||
"\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"questions = [\n",
|
||
" \"What were the biggest strategy changes and partnerships made by Roku in 2023?\"\n",
|
||
" # \"Where is Wex making the most money in 2023?\",\n",
|
||
"]\n",
|
||
"chat_history = []\n",
|
||
"\n",
|
||
"for question in questions:\n",
|
||
" result = qa({\"question\": question, \"chat_history\": chat_history})\n",
|
||
" chat_history.append((question, result[\"answer\"]))\n",
|
||
" print(f\"-> **Question**: {question} \\n\")\n",
|
||
" print(f\"**Answer**: {result['answer']} \\n\")"
|
||
]
|
||
}
|
||
],
|
||
"metadata": {
|
||
"kernelspec": {
|
||
"display_name": "Python 3 (ipykernel)",
|
||
"language": "python",
|
||
"name": "python3"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.9.18"
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 5
|
||
}
|