langchain/docs/extras/integrations/tools/gradio_tools.ipynb
Bagatur b485c3048b
rm base64 images from docs (#10110)
Causing problems indexing docs and notebook images don't render after markdown conversion anyways
2023-09-01 15:15:12 -07:00

242 lines
8.0 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"id": "c613812f",
"metadata": {},
"source": [
"# Gradio\n",
"\n",
"There are many 1000s of `Gradio` apps on `Hugging Face Spaces`. This library puts them at the tips of your LLM's fingers 🦾\n",
"\n",
"Specifically, `gradio-tools` is a Python library for converting `Gradio` apps into tools that can be leveraged by a large language model (LLM)-based agent to complete its task. For example, an LLM could use a `Gradio` tool to transcribe a voice recording it finds online and then summarize it for you. Or it could use a different `Gradio` tool to apply OCR to a document on your Google Drive and then answer questions about it.\n",
"\n",
"It's very easy to create you own tool if you want to use a space that's not one of the pre-built tools. Please see this section of the gradio-tools documentation for information on how to do that. All contributions are welcome!"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "231b46c2",
"metadata": {},
"outputs": [],
"source": [
"# !pip install gradio_tools"
]
},
{
"cell_type": "markdown",
"id": "17608431",
"metadata": {},
"source": [
"## Using a tool"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "423f9dad",
"metadata": {},
"outputs": [],
"source": [
"from gradio_tools.tools import StableDiffusionTool"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "30b8f077",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Loaded as API: https://gradio-client-demos-stable-diffusion.hf.space ✔\n",
"\n",
"Job Status: Status.STARTING eta: None\n"
]
},
{
"data": {
"text/plain": [
"'/Users/harrisonchase/workplace/langchain/docs/modules/agents/tools/integrations/b61c1dd9-47e2-46f1-a47c-20d27640993d/tmp4ap48vnm.jpg'"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"local_file_path = StableDiffusionTool().langchain.run(\n",
" \"Please create a photo of a dog riding a skateboard\"\n",
")\n",
"local_file_path"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "b7bdfd26",
"metadata": {},
"outputs": [],
"source": [
"from PIL import Image"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "98e09784",
"metadata": {},
"outputs": [],
"source": [
"im = Image.open(local_file_path)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "98e1e602",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"display(im)"
]
},
{
"cell_type": "markdown",
"id": "3aeeeeb5",
"metadata": {},
"source": [
"## Using within an agent"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "4a9d45b7",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Loaded as API: https://gradio-client-demos-stable-diffusion.hf.space ✔\n",
"Loaded as API: https://taesiri-blip-2.hf.space ✔\n",
"Loaded as API: https://microsoft-promptist.hf.space ✔\n",
"Loaded as API: https://damo-vilab-modelscope-text-to-video-synthesis.hf.space ✔\n",
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m\n",
"Thought: Do I need to use a tool? Yes\n",
"Action: StableDiffusionPromptGenerator\n",
"Action Input: A dog riding a skateboard\u001b[0m\n",
"Job Status: Status.STARTING eta: None\n",
"\n",
"Observation: \u001b[38;5;200m\u001b[1;3mA dog riding a skateboard, digital painting, artstation, concept art, smooth, sharp focus, illustration, art by artgerm and greg rutkowski and alphonse mucha\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m Do I need to use a tool? Yes\n",
"Action: StableDiffusion\n",
"Action Input: A dog riding a skateboard, digital painting, artstation, concept art, smooth, sharp focus, illustration, art by artgerm and greg rutkowski and alphonse mucha\u001b[0m\n",
"Job Status: Status.STARTING eta: None\n",
"\n",
"Job Status: Status.PROCESSING eta: None\n",
"\n",
"Observation: \u001b[36;1m\u001b[1;3m/Users/harrisonchase/workplace/langchain/docs/modules/agents/tools/integrations/2e280ce4-4974-4420-8680-450825c31601/tmpfmiz2g1c.jpg\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m Do I need to use a tool? Yes\n",
"Action: ImageCaptioner\n",
"Action Input: /Users/harrisonchase/workplace/langchain/docs/modules/agents/tools/integrations/2e280ce4-4974-4420-8680-450825c31601/tmpfmiz2g1c.jpg\u001b[0m\n",
"Job Status: Status.STARTING eta: None\n",
"\n",
"Observation: \u001b[33;1m\u001b[1;3ma painting of a dog sitting on a skateboard\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m Do I need to use a tool? Yes\n",
"Action: TextToVideo\n",
"Action Input: a painting of a dog sitting on a skateboard\u001b[0m\n",
"Job Status: Status.STARTING eta: None\n",
"Due to heavy traffic on this app, the prediction will take approximately 73 seconds.For faster predictions without waiting in queue, you may duplicate the space using: Client.duplicate(damo-vilab/modelscope-text-to-video-synthesis)\n",
"\n",
"Job Status: Status.IN_QUEUE eta: 73.89824726581574\n",
"Due to heavy traffic on this app, the prediction will take approximately 42 seconds.For faster predictions without waiting in queue, you may duplicate the space using: Client.duplicate(damo-vilab/modelscope-text-to-video-synthesis)\n",
"\n",
"Job Status: Status.IN_QUEUE eta: 42.49370198879602\n",
"\n",
"Job Status: Status.IN_QUEUE eta: 21.314297944849187\n",
"\n",
"Observation: \u001b[31;1m\u001b[1;3m/var/folders/bm/ylzhm36n075cslb9fvvbgq640000gn/T/tmp5snj_nmzf20_cb3m.mp4\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m Do I need to use a tool? No\n",
"AI: Here is a video of a painting of a dog sitting on a skateboard.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
}
],
"source": [
"from langchain.agents import initialize_agent\n",
"from langchain.llms import OpenAI\n",
"from gradio_tools.tools import (\n",
" StableDiffusionTool,\n",
" ImageCaptioningTool,\n",
" StableDiffusionPromptGeneratorTool,\n",
" TextToVideoTool,\n",
")\n",
"\n",
"from langchain.memory import ConversationBufferMemory\n",
"\n",
"llm = OpenAI(temperature=0)\n",
"memory = ConversationBufferMemory(memory_key=\"chat_history\")\n",
"tools = [\n",
" StableDiffusionTool().langchain,\n",
" ImageCaptioningTool().langchain,\n",
" StableDiffusionPromptGeneratorTool().langchain,\n",
" TextToVideoTool().langchain,\n",
"]\n",
"\n",
"\n",
"agent = initialize_agent(\n",
" tools, llm, memory=memory, agent=\"conversational-react-description\", verbose=True\n",
")\n",
"output = agent.run(\n",
" input=(\n",
" \"Please create a photo of a dog riding a skateboard \"\n",
" \"but improve my prompt prior to using an image generator.\"\n",
" \"Please caption the generated image and create a video for it using the improved prompt.\"\n",
" )\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "67642c82",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}