langchain/docs/docs/integrations/llms/azure_ml.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Azure ML\n",
    "\n",
    "[Azure ML](https://azure.microsoft.com/en-us/products/machine-learning/) is a platform used to build, train, and deploy machine learning models. Users can explore the types of models to deploy in the Model Catalog, which provides foundational and general purpose models from different providers.\n",
    "\n",
    "This notebook goes over how to use an LLM hosted on an `Azure ML Online Endpoint`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_community.llms.azureml_endpoint import AzureMLOnlineEndpoint"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Set up\n",
    "\n",
    "You must [deploy a model on Azure ML](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-use-foundation-models?view=azureml-api-2#deploying-foundation-models-to-endpoints-for-inferencing) or [to Azure AI studio](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/deploy-models-open) and obtain the following parameters:\n",
    "\n",
    "* `endpoint_url`: The REST endpoint url provided by the endpoint.\n",
    "* `endpoint_api_type`: Use `endpoint_type='realtime'` when deploying models to **Realtime endpoints** (hosted managed infrastructure). Use `endpoint_type='serverless'` when deploying models using the **Pay-as-you-go** offering (model as a service).\n",
    "* `endpoint_api_key`: The API key provided by the endpoint.\n",
    "* `deployment_name`: (Optional) The deployment name of the model using the endpoint."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Content Formatter\n",
    "\n",
    "The `content_formatter` parameter is a handler class for transforming the request and response of an AzureML endpoint to match with required schema. Since there are a wide range of models in the model catalog, each of which may process data differently from one another, a `ContentFormatterBase` class is provided to allow users to transform data to their liking. The following content formatters are provided:\n",
    "\n",
    "* `GPT2ContentFormatter`: Formats request and response data for GPT2\n",
    "* `DollyContentFormatter`: Formats request and response data for the Dolly-v2\n",
    "* `HFContentFormatter`: Formats request and response data for text-generation Hugging Face models\n",
    "* `LLamaContentFormatter`: Formats request and response data for LLaMa2\n",
    "\n",
    "*Note: `OSSContentFormatter` is being deprecated and replaced with `GPT2ContentFormatter`. The logic is the same but `GPT2ContentFormatter` is a more suitable name. You can still continue to use `OSSContentFormatter` as the changes are backwards compatible.*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Examples"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Example: LlaMa 2 completions with real-time endpoints"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_community.llms.azureml_endpoint import (\n",
    "    AzureMLEndpointApiType,\n",
    "    LlamaContentFormatter,\n",
    ")\n",
    "from langchain_core.messages import HumanMessage\n",
    "\n",
    "llm = AzureMLOnlineEndpoint(\n",
    "    endpoint_url=\"https://<your-endpoint>.<your_region>.inference.ml.azure.com/score\",\n",
    "    endpoint_api_type=AzureMLEndpointApiType.realtime,\n",
    "    endpoint_api_key=\"my-api-key\",\n",
    "    content_formatter=LlamaContentFormatter(),\n",
    "    model_kwargs={\"temperature\": 0.8, \"max_new_tokens\": 400},\n",
    ")\n",
    "response = llm.invoke(\"Write me a song about sparkling water:\")\n",
    "response"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Model parameters can also be indicated during invocation:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "response = llm.invoke(\"Write me a song about sparkling water:\", temperature=0.5)\n",
    "response"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Example: Chat completions with pay-as-you-go deployments (model as a service)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_community.llms.azureml_endpoint import (\n",
    "    AzureMLEndpointApiType,\n",
    "    LlamaContentFormatter,\n",
    ")\n",
    "from langchain_core.messages import HumanMessage\n",
    "\n",
    "llm = AzureMLOnlineEndpoint(\n",
    "    endpoint_url=\"https://<your-endpoint>.<your_region>.inference.ml.azure.com/v1/completions\",\n",
    "    endpoint_api_type=AzureMLEndpointApiType.serverless,\n",
    "    endpoint_api_key=\"my-api-key\",\n",
    "    content_formatter=LlamaContentFormatter(),\n",
    "    model_kwargs={\"temperature\": 0.8, \"max_new_tokens\": 400},\n",
    ")\n",
    "response = llm.invoke(\"Write me a song about sparkling water:\")\n",
    "response"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Example: Custom content formatter\n",
    "\n",
    "Below is an example using a summarization model from Hugging Face."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "import os\n",
    "from typing import Dict\n",
    "\n",
    "from langchain_community.llms.azureml_endpoint import (\n",
    "    AzureMLOnlineEndpoint,\n",
    "    ContentFormatterBase,\n",
    ")\n",
    "\n",
    "\n",
    "class CustomFormatter(ContentFormatterBase):\n",
    "    content_type = \"application/json\"\n",
    "    accepts = \"application/json\"\n",
    "\n",
    "    def format_request_payload(self, prompt: str, model_kwargs: Dict) -> bytes:\n",
    "        input_str = json.dumps(\n",
    "            {\n",
    "                \"inputs\": [prompt],\n",
    "                \"parameters\": model_kwargs,\n",
    "                \"options\": {\"use_cache\": False, \"wait_for_model\": True},\n",
    "            }\n",
    "        )\n",
    "        return str.encode(input_str)\n",
    "\n",
    "    def format_response_payload(self, output: bytes) -> str:\n",
    "        response_json = json.loads(output)\n",
    "        return response_json[0][\"summary_text\"]\n",
    "\n",
    "\n",
    "content_formatter = CustomFormatter()\n",
    "\n",
    "llm = AzureMLOnlineEndpoint(\n",
    "    endpoint_api_type=\"realtime\",\n",
    "    endpoint_api_key=os.getenv(\"BART_ENDPOINT_API_KEY\"),\n",
    "    endpoint_url=os.getenv(\"BART_ENDPOINT_URL\"),\n",
    "    model_kwargs={\"temperature\": 0.8, \"max_new_tokens\": 400},\n",
    "    content_formatter=content_formatter,\n",
    ")\n",
    "large_text = \"\"\"On January 7, 2020, Blockberry Creative announced that HaSeul would not participate in the promotion for Loona's \n",
    "next album because of mental health concerns. She was said to be diagnosed with \"intermittent anxiety symptoms\" and would be \n",
    "taking time to focus on her health.[39] On February 5, 2020, Loona released their second EP titled [#] (read as hash), along \n",
    "with the title track \"So What\".[40] Although HaSeul did not appear in the title track, her vocals are featured on three other \n",
    "songs on the album, including \"365\". Once peaked at number 1 on the daily Gaon Retail Album Chart,[41] the EP then debuted at \n",
    "number 2 on the weekly Gaon Album Chart. On March 12, 2020, Loona won their first music show trophy with \"So What\" on Mnet's \n",
    "M Countdown.[42]\n",
    "\n",
    "On October 19, 2020, Loona released their third EP titled [12:00] (read as midnight),[43] accompanied by its first single \n",
    "\"Why Not?\". HaSeul was again not involved in the album, out of her own decision to focus on the recovery of her health.[44] \n",
    "The EP then became their first album to enter the Billboard 200, debuting at number 112.[45] On November 18, Loona released \n",
    "the music video for \"Star\", another song on [12:00].[46] Peaking at number 40, \"Star\" is Loona's first entry on the Billboard \n",
    "Mainstream Top 40, making them the second K-pop girl group to enter the chart.[47]\n",
    "\n",
    "On June 1, 2021, Loona announced that they would be having a comeback on June 28, with their fourth EP, [&] (read as and).\n",
    "[48] The following day, on June 2, a teaser was posted to Loona's official social media accounts showing twelve sets of eyes, \n",
    "confirming the return of member HaSeul who had been on hiatus since early 2020.[49] On June 12, group members YeoJin, Kim Lip, \n",
    "Choerry, and Go Won released the song \"Yum-Yum\" as a collaboration with Cocomong.[50] On September 8, they released another \n",
    "collaboration song named \"Yummy-Yummy\".[51] On June 27, 2021, Loona announced at the end of their special clip that they are \n",
    "making their Japanese debut on September 15 under Universal Music Japan sublabel EMI Records.[52] On August 27, it was announced \n",
    "that Loona will release the double A-side single, \"Hula Hoop / Star Seed\" on September 15, with a physical CD release on October \n",
    "20.[53] In December, Chuu filed an injunction to suspend her exclusive contract with Blockberry Creative.[54][55]\n",
    "\"\"\"\n",
    "summarized_text = llm.invoke(large_text)\n",
    "print(summarized_text)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Example: Dolly with LLMChain"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.chains import LLMChain\n",
    "from langchain.prompts import PromptTemplate\n",
    "from langchain_community.llms.azureml_endpoint import DollyContentFormatter\n",
    "\n",
    "formatter_template = \"Write a {word_count} word essay about {topic}.\"\n",
    "\n",
    "prompt = PromptTemplate(\n",
    "    input_variables=[\"word_count\", \"topic\"], template=formatter_template\n",
    ")\n",
    "\n",
    "content_formatter = DollyContentFormatter()\n",
    "\n",
    "llm = AzureMLOnlineEndpoint(\n",
    "    endpoint_api_key=os.getenv(\"DOLLY_ENDPOINT_API_KEY\"),\n",
    "    endpoint_url=os.getenv(\"DOLLY_ENDPOINT_URL\"),\n",
    "    model_kwargs={\"temperature\": 0.8, \"max_tokens\": 300},\n",
    "    content_formatter=content_formatter,\n",
    ")\n",
    "\n",
    "chain = LLMChain(llm=llm, prompt=prompt)\n",
    "print(chain.invoke({\"word_count\": 100, \"topic\": \"how to make friends\"}))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Serializing an LLM\n",
    "You can also save and load LLM configurations"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_community.llms.loading import load_llm\n",
    "\n",
    "save_llm = AzureMLOnlineEndpoint(\n",
    "    deployment_name=\"databricks-dolly-v2-12b-4\",\n",
    "    model_kwargs={\n",
    "        \"temperature\": 0.2,\n",
    "        \"max_tokens\": 150,\n",
    "        \"top_p\": 0.8,\n",
    "        \"frequency_penalty\": 0.32,\n",
    "        \"presence_penalty\": 72e-3,\n",
    "    },\n",
    ")\n",
    "save_llm.save(\"azureml.json\")\n",
    "loaded_llm = load_llm(\"azureml.json\")\n",
    "\n",
    "print(loaded_llm)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "langchain",
   "language": "python",
   "name": "langchain"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}