mirror of
https://github.com/hwchase17/langchain
synced 2024-11-08 07:10:35 +00:00
a0d847f636
<!-- Thank you for contributing to LangChain! Your PR will appear in our release under the title you set. Please make sure it highlights your valuable contribution. Replace this with a description of the change, the issue it fixes (if applicable), and relevant context. List any dependencies required for this change. After you're done, someone will review your PR. They may suggest improvements. If no one reviews your PR within a few days, feel free to @-mention the same people again, as notifications can get lost. Finally, we'd love to show appreciation for your contribution - if you'd like us to shout you out on Twitter, please also include your handle! --> <!-- Remove if not applicable --> Some links were broken from the previous merge. This PR fixes them. Tested locally. #### Before submitting <!-- If you're adding a new integration, please include: 1. a test for the integration - favor unit tests that does not rely on network access. 2. an example notebook showing its use See contribution guidelines for more information on how to write tests, lint etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md --> #### Who can review? Tag maintainers/contributors who might be interested: <!-- For a quicker response, figure out the right person to tag with @ @hwchase17 - project lead Tracing / Callbacks - @agola11 Async - @agola11 DataLoaders - @eyurtsev Models - @hwchase17 - @agola11 Agents / Tools / Toolkits - @vowelparrot VectorStores / Retrievers / Memory - @dev2049 --> Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
234 lines
6.0 KiB
Plaintext
234 lines
6.0 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Ray Serve\n",
|
|
"\n",
|
|
"[Ray Serve](https://docs.ray.io/en/latest/serve/index.html) is a scalable model serving library for building online inference APIs. Serve is particularly well suited for system composition, enabling you to build a complex inference service consisting of multiple chains and business logic all in Python code. "
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Goal of this notebook\n",
|
|
"This notebook shows a simple example of how to deploy an OpenAI chain into production. You can extend it to deploy your own self-hosted models where you can easily define amount of hardware resources (GPUs and CPUs) needed to run your model in production efficiently. Read more about available options including autoscaling in the Ray Serve [documentation](https://docs.ray.io/en/latest/serve/getting_started.html).\n"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Setup Ray Serve\n",
|
|
"Install ray with `pip install ray[serve]`. "
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## General Skeleton"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"The general skeleton for deploying a service is the following:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# 0: Import ray serve and request from starlette\n",
|
|
"from ray import serve\n",
|
|
"from starlette.requests import Request\n",
|
|
"\n",
|
|
"# 1: Define a Ray Serve deployment.\n",
|
|
"@serve.deployment\n",
|
|
"class LLMServe:\n",
|
|
"\n",
|
|
" def __init__(self) -> None:\n",
|
|
" # All the initialization code goes here\n",
|
|
" pass\n",
|
|
"\n",
|
|
" async def __call__(self, request: Request) -> str:\n",
|
|
" # You can parse the request here\n",
|
|
" # and return a response\n",
|
|
" return \"Hello World\"\n",
|
|
"\n",
|
|
"# 2: Bind the model to deployment\n",
|
|
"deployment = LLMServe.bind()\n",
|
|
"\n",
|
|
"# 3: Run the deployment\n",
|
|
"serve.api.run(deployment)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Shutdown the deployment\n",
|
|
"serve.api.shutdown()"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Example of deploying and OpenAI chain with custom prompts"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Get an OpenAI API key from [here](https://platform.openai.com/account/api-keys). By running the following code, you will be asked to provide your API key."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain.llms import OpenAI\n",
|
|
"from langchain import PromptTemplate, LLMChain"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from getpass import getpass\n",
|
|
"OPENAI_API_KEY = getpass()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"@serve.deployment\n",
|
|
"class DeployLLM:\n",
|
|
"\n",
|
|
" def __init__(self):\n",
|
|
" # We initialize the LLM, template and the chain here\n",
|
|
" llm = OpenAI(openai_api_key=OPENAI_API_KEY)\n",
|
|
" template = \"Question: {question}\\n\\nAnswer: Let's think step by step.\"\n",
|
|
" prompt = PromptTemplate(template=template, input_variables=[\"question\"])\n",
|
|
" self.chain = LLMChain(llm=llm, prompt=prompt)\n",
|
|
"\n",
|
|
" def _run_chain(self, text: str):\n",
|
|
" return self.chain(text)\n",
|
|
"\n",
|
|
" async def __call__(self, request: Request):\n",
|
|
" # 1. Parse the request\n",
|
|
" text = request.query_params[\"text\"]\n",
|
|
" # 2. Run the chain\n",
|
|
" resp = self._run_chain(text)\n",
|
|
" # 3. Return the response\n",
|
|
" return resp[\"text\"]"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Now we can bind the deployment."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Bind the model to deployment\n",
|
|
"deployment = DeployLLM.bind()"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"We can assign the port number and host when we want to run the deployment. "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Example port number\n",
|
|
"PORT_NUMBER = 8282\n",
|
|
"# Run the deployment\n",
|
|
"serve.api.run(deployment, port=PORT_NUMBER)"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Now that service is deployed on port `localhost:8282` we can send a post request to get the results back."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import requests\n",
|
|
"\n",
|
|
"text = \"What NFL team won the Super Bowl in the year Justin Beiber was born?\"\n",
|
|
"response = requests.post(f'http://localhost:{PORT_NUMBER}/?text={text}')\n",
|
|
"print(response.content.decode())"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "ray",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.10.9"
|
|
},
|
|
"orig_nbformat": 4
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 2
|
|
}
|