{ "cells": [ { "cell_type": "markdown", "id": "c613812f", "metadata": {}, "source": [ "# Gradio\n", "\n", "There are many 1000s of `Gradio` apps on `Hugging Face Spaces`. This library puts them at the tips of your LLM's fingers 🦾\n", "\n", "Specifically, `gradio-tools` is a Python library for converting `Gradio` apps into tools that can be leveraged by a large language model (LLM)-based agent to complete its task. For example, an LLM could use a `Gradio` tool to transcribe a voice recording it finds online and then summarize it for you. Or it could use a different `Gradio` tool to apply OCR to a document on your Google Drive and then answer questions about it.\n", "\n", "It's very easy to create you own tool if you want to use a space that's not one of the pre-built tools. Please see this section of the gradio-tools documentation for information on how to do that. All contributions are welcome!" ] }, { "cell_type": "code", "execution_count": null, "id": "231b46c2", "metadata": {}, "outputs": [], "source": [ "# !pip install gradio_tools" ] }, { "cell_type": "markdown", "id": "17608431", "metadata": {}, "source": [ "## Using a tool" ] }, { "cell_type": "code", "execution_count": 3, "id": "423f9dad", "metadata": {}, "outputs": [], "source": [ "from gradio_tools.tools import StableDiffusionTool" ] }, { "cell_type": "code", "execution_count": 4, "id": "30b8f077", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Loaded as API: https://gradio-client-demos-stable-diffusion.hf.space ✔\n", "\n", "Job Status: Status.STARTING eta: None\n" ] }, { "data": { "text/plain": [ "'/Users/harrisonchase/workplace/langchain/docs/modules/agents/tools/integrations/b61c1dd9-47e2-46f1-a47c-20d27640993d/tmp4ap48vnm.jpg'" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "local_file_path = StableDiffusionTool().langchain.run(\n", " \"Please create a photo of a dog riding a skateboard\"\n", ")\n", "local_file_path" ] }, { "cell_type": "code", "execution_count": 6, "id": "b7bdfd26", "metadata": {}, "outputs": [], "source": [ "from PIL import Image" ] }, { "cell_type": "code", "execution_count": 9, "id": "98e09784", "metadata": {}, "outputs": [], "source": [ "im = Image.open(local_file_path)" ] }, { "cell_type": "code", "execution_count": null, "id": "98e1e602", "metadata": { "scrolled": true }, "outputs": [], "source": [ "display(im)" ] }, { "cell_type": "markdown", "id": "3aeeeeb5", "metadata": {}, "source": [ "## Using within an agent" ] }, { "cell_type": "code", "execution_count": 14, "id": "4a9d45b7", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Loaded as API: https://gradio-client-demos-stable-diffusion.hf.space ✔\n", "Loaded as API: https://taesiri-blip-2.hf.space ✔\n", "Loaded as API: https://microsoft-promptist.hf.space ✔\n", "Loaded as API: https://damo-vilab-modelscope-text-to-video-synthesis.hf.space ✔\n", "\n", "\n", "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n", "\u001b[32;1m\u001b[1;3m\n", "Thought: Do I need to use a tool? Yes\n", "Action: StableDiffusionPromptGenerator\n", "Action Input: A dog riding a skateboard\u001b[0m\n", "Job Status: Status.STARTING eta: None\n", "\n", "Observation: \u001b[38;5;200m\u001b[1;3mA dog riding a skateboard, digital painting, artstation, concept art, smooth, sharp focus, illustration, art by artgerm and greg rutkowski and alphonse mucha\u001b[0m\n", "Thought:\u001b[32;1m\u001b[1;3m Do I need to use a tool? Yes\n", "Action: StableDiffusion\n", "Action Input: A dog riding a skateboard, digital painting, artstation, concept art, smooth, sharp focus, illustration, art by artgerm and greg rutkowski and alphonse mucha\u001b[0m\n", "Job Status: Status.STARTING eta: None\n", "\n", "Job Status: Status.PROCESSING eta: None\n", "\n", "Observation: \u001b[36;1m\u001b[1;3m/Users/harrisonchase/workplace/langchain/docs/modules/agents/tools/integrations/2e280ce4-4974-4420-8680-450825c31601/tmpfmiz2g1c.jpg\u001b[0m\n", "Thought:\u001b[32;1m\u001b[1;3m Do I need to use a tool? Yes\n", "Action: ImageCaptioner\n", "Action Input: /Users/harrisonchase/workplace/langchain/docs/modules/agents/tools/integrations/2e280ce4-4974-4420-8680-450825c31601/tmpfmiz2g1c.jpg\u001b[0m\n", "Job Status: Status.STARTING eta: None\n", "\n", "Observation: \u001b[33;1m\u001b[1;3ma painting of a dog sitting on a skateboard\u001b[0m\n", "Thought:\u001b[32;1m\u001b[1;3m Do I need to use a tool? Yes\n", "Action: TextToVideo\n", "Action Input: a painting of a dog sitting on a skateboard\u001b[0m\n", "Job Status: Status.STARTING eta: None\n", "Due to heavy traffic on this app, the prediction will take approximately 73 seconds.For faster predictions without waiting in queue, you may duplicate the space using: Client.duplicate(damo-vilab/modelscope-text-to-video-synthesis)\n", "\n", "Job Status: Status.IN_QUEUE eta: 73.89824726581574\n", "Due to heavy traffic on this app, the prediction will take approximately 42 seconds.For faster predictions without waiting in queue, you may duplicate the space using: Client.duplicate(damo-vilab/modelscope-text-to-video-synthesis)\n", "\n", "Job Status: Status.IN_QUEUE eta: 42.49370198879602\n", "\n", "Job Status: Status.IN_QUEUE eta: 21.314297944849187\n", "\n", "Observation: \u001b[31;1m\u001b[1;3m/var/folders/bm/ylzhm36n075cslb9fvvbgq640000gn/T/tmp5snj_nmzf20_cb3m.mp4\u001b[0m\n", "Thought:\u001b[32;1m\u001b[1;3m Do I need to use a tool? No\n", "AI: Here is a video of a painting of a dog sitting on a skateboard.\u001b[0m\n", "\n", "\u001b[1m> Finished chain.\u001b[0m\n" ] } ], "source": [ "from langchain.agents import initialize_agent\n", "from langchain.llms import OpenAI\n", "from gradio_tools.tools import (\n", " StableDiffusionTool,\n", " ImageCaptioningTool,\n", " StableDiffusionPromptGeneratorTool,\n", " TextToVideoTool,\n", ")\n", "\n", "from langchain.memory import ConversationBufferMemory\n", "\n", "llm = OpenAI(temperature=0)\n", "memory = ConversationBufferMemory(memory_key=\"chat_history\")\n", "tools = [\n", " StableDiffusionTool().langchain,\n", " ImageCaptioningTool().langchain,\n", " StableDiffusionPromptGeneratorTool().langchain,\n", " TextToVideoTool().langchain,\n", "]\n", "\n", "\n", "agent = initialize_agent(\n", " tools, llm, memory=memory, agent=\"conversational-react-description\", verbose=True\n", ")\n", "output = agent.run(\n", " input=(\n", " \"Please create a photo of a dog riding a skateboard \"\n", " \"but improve my prompt prior to using an image generator.\"\n", " \"Please caption the generated image and create a video for it using the improved prompt.\"\n", " )\n", ")" ] }, { "cell_type": "code", "execution_count": null, "id": "67642c82", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.3" } }, "nbformat": 4, "nbformat_minor": 5 }