mirror of
https://github.com/hwchase17/langchain
synced 2024-10-29 17:07:25 +00:00
d7f807b71f
# Add AzureCognitiveServicesToolkit to call Azure Cognitive Services API: achieve some multimodal capabilities This PR adds a toolkit named AzureCognitiveServicesToolkit which bundles the following tools: - AzureCogsImageAnalysisTool: calls Azure Cognitive Services image analysis API to extract caption, objects, tags, and text from images. - AzureCogsFormRecognizerTool: calls Azure Cognitive Services form recognizer API to extract text, tables, and key-value pairs from documents. - AzureCogsSpeech2TextTool: calls Azure Cognitive Services speech to text API to transcribe speech to text. - AzureCogsText2SpeechTool: calls Azure Cognitive Services text to speech API to synthesize text to speech. This toolkit can be used to process image, document, and audio inputs. --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
271 lines
7.2 KiB
Plaintext
271 lines
7.2 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Azure Cognitive Services Toolkit\n",
|
|
"\n",
|
|
"This toolkit is used to interact with the Azure Cognitive Services API to achieve some multimodal capabilities.\n",
|
|
"\n",
|
|
"Currently There are four tools bundled in this toolkit:\n",
|
|
"- AzureCogsImageAnalysisTool: used to extract caption, objects, tags, and text from images. (Note: this tool is not available on Mac OS yet, due to the dependency on `azure-ai-vision` package, which is only supported on Windows and Linux currently.)\n",
|
|
"- AzureCogsFormRecognizerTool: used to extract text, tables, and key-value pairs from documents.\n",
|
|
"- AzureCogsSpeech2TextTool: used to transcribe speech to text.\n",
|
|
"- AzureCogsText2SpeechTool: used to synthesize text to speech."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"First, you need to set up an Azure account and create a Cognitive Services resource. You can follow the instructions [here](https://docs.microsoft.com/en-us/azure/cognitive-services/cognitive-services-apis-create-account?tabs=multiservice%2Cwindows) to create a resource. \n",
|
|
"\n",
|
|
"Then, you need to get the endpoint, key and region of your resource, and set them as environment variables. You can find them in the \"Keys and Endpoint\" page of your resource."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# !pip install --upgrade azure-ai-formrecognizer > /dev/null\n",
|
|
"# !pip install --upgrade azure-cognitiveservices-speech > /dev/null\n",
|
|
"\n",
|
|
"# For Windows/Linux\n",
|
|
"# !pip install --upgrade azure-ai-vision > /dev/null"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import os\n",
|
|
"\n",
|
|
"os.environ[\"OPENAI_API_KEY\"] = \"sk-\"\n",
|
|
"os.environ[\"AZURE_COGS_KEY\"] = \"\"\n",
|
|
"os.environ[\"AZURE_COGS_ENDPOINT\"] = \"\"\n",
|
|
"os.environ[\"AZURE_COGS_REGION\"] = \"\""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Create the Toolkit"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain.agents.agent_toolkits import AzureCognitiveServicesToolkit\n",
|
|
"\n",
|
|
"toolkit = AzureCognitiveServicesToolkit()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"['Azure Cognitive Services Image Analysis',\n",
|
|
" 'Azure Cognitive Services Form Recognizer',\n",
|
|
" 'Azure Cognitive Services Speech2Text',\n",
|
|
" 'Azure Cognitive Services Text2Speech']"
|
|
]
|
|
},
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"[tool.name for tool in toolkit.get_tools()]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Use within an Agent"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain import OpenAI\n",
|
|
"from langchain.agents import initialize_agent, AgentType"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"llm = OpenAI(temperature=0)\n",
|
|
"agent = initialize_agent(\n",
|
|
" tools=toolkit.get_tools(),\n",
|
|
" llm=llm,\n",
|
|
" agent=AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION,\n",
|
|
" verbose=True,\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"\n",
|
|
"\n",
|
|
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
|
"\u001b[32;1m\u001b[1;3m\n",
|
|
"Action:\n",
|
|
"```\n",
|
|
"{\n",
|
|
" \"action\": \"Azure Cognitive Services Image Analysis\",\n",
|
|
" \"action_input\": \"https://images.openai.com/blob/9ad5a2ab-041f-475f-ad6a-b51899c50182/ingredients.png\"\n",
|
|
"}\n",
|
|
"```\n",
|
|
"\n",
|
|
"\u001b[0m\n",
|
|
"Observation: \u001b[36;1m\u001b[1;3mCaption: a group of eggs and flour in bowls\n",
|
|
"Objects: Egg, Egg, Food\n",
|
|
"Tags: dairy, ingredient, indoor, thickening agent, food, mixing bowl, powder, flour, egg, bowl\u001b[0m\n",
|
|
"Thought:\u001b[32;1m\u001b[1;3m I can use the objects and tags to suggest recipes\n",
|
|
"Action:\n",
|
|
"```\n",
|
|
"{\n",
|
|
" \"action\": \"Final Answer\",\n",
|
|
" \"action_input\": \"You can make pancakes, omelettes, or quiches with these ingredients!\"\n",
|
|
"}\n",
|
|
"```\u001b[0m\n",
|
|
"\n",
|
|
"\u001b[1m> Finished chain.\u001b[0m\n"
|
|
]
|
|
},
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"'You can make pancakes, omelettes, or quiches with these ingredients!'"
|
|
]
|
|
},
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"agent.run(\"What can I make with these ingredients?\"\n",
|
|
" \"https://images.openai.com/blob/9ad5a2ab-041f-475f-ad6a-b51899c50182/ingredients.png\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"\n",
|
|
"\n",
|
|
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
|
"\u001b[32;1m\u001b[1;3mAction:\n",
|
|
"```\n",
|
|
"{\n",
|
|
" \"action\": \"Azure Cognitive Services Text2Speech\",\n",
|
|
" \"action_input\": \"Why did the chicken cross the playground? To get to the other slide!\"\n",
|
|
"}\n",
|
|
"```\n",
|
|
"\n",
|
|
"\u001b[0m\n",
|
|
"Observation: \u001b[31;1m\u001b[1;3m/tmp/tmpa3uu_j6b.wav\u001b[0m\n",
|
|
"Thought:\u001b[32;1m\u001b[1;3m I have the audio file of the joke\n",
|
|
"Action:\n",
|
|
"```\n",
|
|
"{\n",
|
|
" \"action\": \"Final Answer\",\n",
|
|
" \"action_input\": \"/tmp/tmpa3uu_j6b.wav\"\n",
|
|
"}\n",
|
|
"```\u001b[0m\n",
|
|
"\n",
|
|
"\u001b[1m> Finished chain.\u001b[0m\n"
|
|
]
|
|
},
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"'/tmp/tmpa3uu_j6b.wav'"
|
|
]
|
|
},
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"audio_file = agent.run(\"Tell me a joke and read it out for me.\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from IPython import display\n",
|
|
"\n",
|
|
"audio = display.Audio(audio_file)\n",
|
|
"display.display(audio)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": []
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.11.3"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 2
|
|
}
|