langchain/docs/integrations/ray_serve.ipynb

{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Ray Serve\n",
    "\n",
    "[Ray Serve](https://docs.ray.io/en/latest/serve/index.html) is a scalable model serving library for building online inference APIs. Serve is particularly well suited for system composition, enabling you to build a complex inference service consisting of multiple chains and business logic all in Python code. "
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Goal of this notebook\n",
    "This notebook shows a simple example of how to deploy an OpenAI chain into production. You can extend it to deploy your own self-hosted models where you can easily define amount of hardware resources (GPUs and CPUs) needed to run your model in production efficiently. Read more about available options including autoscaling in the Ray Serve [documentation](https://docs.ray.io/en/latest/serve/getting_started.html).\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup Ray Serve\n",
    "Install ray with `pip install ray[serve]`. "
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## General Skeleton"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The general skeleton for deploying a service is the following:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 0: Import ray serve and request from starlette\n",
    "from ray import serve\n",
    "from starlette.requests import Request\n",
    "\n",
    "# 1: Define a Ray Serve deployment.\n",
    "@serve.deployment\n",
    "class LLMServe:\n",
    "\n",
    "    def __init__(self) -> None:\n",
    "        # All the initialization code goes here\n",
    "        pass\n",
    "\n",
    "    async def __call__(self, request: Request) -> str:\n",
    "        # You can parse the request here\n",
    "        # and return a response\n",
    "        return \"Hello World\"\n",
    "\n",
    "# 2: Bind the model to deployment\n",
    "deployment = LLMServe.bind()\n",
    "\n",
    "# 3: Run the deployment\n",
    "serve.api.run(deployment)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Shutdown the deployment\n",
    "serve.api.shutdown()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example of deploying and OpenAI chain with custom prompts"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Get an OpenAI API key from [here](https://platform.openai.com/account/api-keys). By running the following code, you will be asked to provide your API key."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.llms import OpenAI\n",
    "from langchain import PromptTemplate, LLMChain"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from getpass import getpass\n",
    "OPENAI_API_KEY = getpass()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "@serve.deployment\n",
    "class DeployLLM:\n",
    "\n",
    "    def __init__(self):\n",
    "        # We initialize the LLM, template and the chain here\n",
    "        llm = OpenAI(openai_api_key=OPENAI_API_KEY)\n",
    "        template = \"Question: {question}\\n\\nAnswer: Let's think step by step.\"\n",
    "        prompt = PromptTemplate(template=template, input_variables=[\"question\"])\n",
    "        self.chain = LLMChain(llm=llm, prompt=prompt)\n",
    "\n",
    "    def _run_chain(self, text: str):\n",
    "        return self.chain(text)\n",
    "\n",
    "    async def __call__(self, request: Request):\n",
    "        # 1. Parse the request\n",
    "        text = request.query_params[\"text\"]\n",
    "        # 2. Run the chain\n",
    "        resp = self._run_chain(text)\n",
    "        # 3. Return the response\n",
    "        return resp[\"text\"]"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can bind the deployment."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Bind the model to deployment\n",
    "deployment = DeployLLM.bind()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can assign the port number and host when we want to run the deployment. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Example port number\n",
    "PORT_NUMBER = 8282\n",
    "# Run the deployment\n",
    "serve.api.run(deployment, port=PORT_NUMBER)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that service is deployed on port `localhost:8282` we can send a post request to get the results back."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import requests\n",
    "\n",
    "text = \"What NFL team won the Super Bowl in the year Justin Beiber was born?\"\n",
    "response = requests.post(f'http://localhost:{PORT_NUMBER}/?text={text}')\n",
    "print(response.content.decode())"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "ray",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.9"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
docs: Added Deploying LLMs into production + a new ecosystem (#4047) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com> Co-authored-by: Kamil Kaczmarek <kaczmarek.poczta@gmail.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> 2023-06-05 19:47:27 +00:00			`{`
			`"cells": [`
			`{`
			`"attachments": {},`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"# Ray Serve\n",`
			`"\n",`
			`"[Ray Serve](https://docs.ray.io/en/latest/serve/index.html) is a scalable model serving library for building online inference APIs. Serve is particularly well suited for system composition, enabling you to build a complex inference service consisting of multiple chains and business logic all in Python code. "`
			`]`
			`},`
			`{`
			`"attachments": {},`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"## Goal of this notebook\n",`
			`"This notebook shows a simple example of how to deploy an OpenAI chain into production. You can extend it to deploy your own self-hosted models where you can easily define amount of hardware resources (GPUs and CPUs) needed to run your model in production efficiently. Read more about available options including autoscaling in the Ray Serve [documentation](https://docs.ray.io/en/latest/serve/getting_started.html).\n"`
			`]`
			`},`
			`{`
			`"attachments": {},`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"## Setup Ray Serve\n",`
			"Install ray with `pip install ray[serve]`. "
			`]`
			`},`
			`{`
			`"attachments": {},`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"## General Skeleton"`
			`]`
			`},`
			`{`
			`"attachments": {},`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"The general skeleton for deploying a service is the following:"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"# 0: Import ray serve and request from starlette\n",`
			`"from ray import serve\n",`
			`"from starlette.requests import Request\n",`
			`"\n",`
			`"# 1: Define a Ray Serve deployment.\n",`
			`"@serve.deployment\n",`
			`"class LLMServe:\n",`
			`"\n",`
			`" def __init__(self) -> None:\n",`
			`" # All the initialization code goes here\n",`
			`" pass\n",`
			`"\n",`
			`" async def __call__(self, request: Request) -> str:\n",`
			`" # You can parse the request here\n",`
			`" # and return a response\n",`
			`" return \"Hello World\"\n",`
			`"\n",`
			`"# 2: Bind the model to deployment\n",`
			`"deployment = LLMServe.bind()\n",`
			`"\n",`
			`"# 3: Run the deployment\n",`
			`"serve.api.run(deployment)"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"# Shutdown the deployment\n",`
			`"serve.api.shutdown()"`
			`]`
			`},`
			`{`
			`"attachments": {},`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"## Example of deploying and OpenAI chain with custom prompts"`
			`]`
			`},`
			`{`
			`"attachments": {},`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"Get an OpenAI API key from [here](https://platform.openai.com/account/api-keys). By running the following code, you will be asked to provide your API key."`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"from langchain.llms import OpenAI\n",`
			`"from langchain import PromptTemplate, LLMChain"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"from getpass import getpass\n",`
			`"OPENAI_API_KEY = getpass()"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"@serve.deployment\n",`
			`"class DeployLLM:\n",`
			`"\n",`
			`" def __init__(self):\n",`
			`" # We initialize the LLM, template and the chain here\n",`
			`" llm = OpenAI(openai_api_key=OPENAI_API_KEY)\n",`
			`" template = \"Question: {question}\\n\\nAnswer: Let's think step by step.\"\n",`
			`" prompt = PromptTemplate(template=template, input_variables=[\"question\"])\n",`
			`" self.chain = LLMChain(llm=llm, prompt=prompt)\n",`
			`"\n",`
			`" def _run_chain(self, text: str):\n",`
			`" return self.chain(text)\n",`
			`"\n",`
			`" async def __call__(self, request: Request):\n",`
			`" # 1. Parse the request\n",`
			`" text = request.query_params[\"text\"]\n",`
			`" # 2. Run the chain\n",`
			`" resp = self._run_chain(text)\n",`
			`" # 3. Return the response\n",`
			`" return resp[\"text\"]"`
			`]`
			`},`
			`{`
			`"attachments": {},`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"Now we can bind the deployment."`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"# Bind the model to deployment\n",`
			`"deployment = DeployLLM.bind()"`
			`]`
			`},`
			`{`
			`"attachments": {},`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"We can assign the port number and host when we want to run the deployment. "`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"# Example port number\n",`
			`"PORT_NUMBER = 8282\n",`
			`"# Run the deployment\n",`
			`"serve.api.run(deployment, port=PORT_NUMBER)"`
			`]`
			`},`
			`{`
			`"attachments": {},`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			"Now that service is deployed on port `localhost:8282` we can send a post request to get the results back."
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"import requests\n",`
			`"\n",`
			`"text = \"What NFL team won the Super Bowl in the year Justin Beiber was born?\"\n",`
			`"response = requests.post(f'http://localhost:{PORT_NUMBER}/?text={text}')\n",`
			`"print(response.content.decode())"`
			`]`
			`}`
			`],`
			`"metadata": {`
			`"kernelspec": {`
			`"display_name": "ray",`
			`"language": "python",`
			`"name": "python3"`
			`},`
			`"language_info": {`
			`"codemirror_mode": {`
			`"name": "ipython",`
			`"version": 3`
			`},`
			`"file_extension": ".py",`
			`"mimetype": "text/x-python",`
			`"name": "python",`
			`"nbconvert_exporter": "python",`
			`"pygments_lexer": "ipython3",`
			`"version": "3.10.9"`
			`},`
			`"orig_nbformat": 4`
			`},`
			`"nbformat": 4,`
			`"nbformat_minor": 2`
			`}`