From 6721b991ab761d730dbf475a65f4f7ce9ad8a4a8 Mon Sep 17 00:00:00 2001 From: Aditya <31382824+Adi8885@users.noreply.github.com> Date: Thu, 27 Jun 2024 20:12:32 +0530 Subject: [PATCH] docs: realigned sections for langchain-google-vertexai (#23584) - **Description:** Re-aligned sections in documentation of Vertex AI LLMs - **Issue:** NA - **Dependencies:** NA - **Twitter handle:**NA --------- Co-authored-by: adityarane@google.com Co-authored-by: ccurme --- .../llms/google_vertex_ai_palm.ipynb | 161 ++++++++++++++++-- 1 file changed, 144 insertions(+), 17 deletions(-) diff --git a/docs/docs/integrations/llms/google_vertex_ai_palm.ipynb b/docs/docs/integrations/llms/google_vertex_ai_palm.ipynb index d55b451479..83911014e4 100644 --- a/docs/docs/integrations/llms/google_vertex_ai_palm.ipynb +++ b/docs/docs/integrations/llms/google_vertex_ai_palm.ipynb @@ -24,7 +24,8 @@ "**Note:** This is separate from the `Google Generative AI` integration, it exposes [Vertex AI Generative API](https://cloud.google.com/vertex-ai/docs/generative-ai/learn/overview) on `Google Cloud`.\n", "\n", "VertexAI exposes all foundational models available in google cloud:\n", - "- Gemini (`gemini-pro` and `gemini-pro-vision`)\n", + "- Gemini for Text ( `gemini-1.0-pro` )\n", + "- Gemini with Multimodality ( `gemini-1.5-pro-001` and `gemini-pro-vision`)\n", "- Palm 2 for Text (`text-bison`)\n", "- Codey for Code Generation (`code-bison`)\n", "\n", @@ -451,6 +452,7 @@ "\n", "llm = ChatVertexAI(model=\"gemini-pro-vision\")\n", "\n", + "# Prepare input for model consumption\n", "image_message = {\n", " \"type\": \"image_url\",\n", " \"image_url\": {\"url\": \"image_example.jpg\"},\n", @@ -460,7 +462,6 @@ " \"text\": \"What is shown in this image?\",\n", "}\n", "\n", - "# Prepare input for model consumption\n", "message = HumanMessage(content=[text_message, image_message])\n", "\n", "# invoke a model response\n", @@ -616,12 +617,12 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### ADVANCED : You can use Pdfs with Gemini Models" + "### Using Pdfs with Gemini Models" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 12, "metadata": {}, "outputs": [], "source": [ @@ -629,12 +630,12 @@ "from langchain_google_vertexai import ChatVertexAI\n", "\n", "# Use Gemini 1.5 Pro\n", - "llm = ChatVertexAI(model=\"gemini-1.5-pro-preview-0514\")" + "llm = ChatVertexAI(model=\"gemini-1.5-pro-001\")" ] }, { "cell_type": "code", - "execution_count": 69, + "execution_count": 13, "metadata": {}, "outputs": [], "source": [ @@ -649,22 +650,148 @@ " \"text\": \"Summarize the provided document.\",\n", "}\n", "\n", - "# Prepare input for model consumption\n", "message = HumanMessage(content=[text_message, pdf_message])" ] }, { "cell_type": "code", - "execution_count": 70, + "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "AIMessage(content='The document introduces Gemini 1.5 Pro, a multimodal AI model developed by Google. It\\'s a \"mixture-of-experts\" model capable of understanding and reasoning over very long contexts, up to millions of tokens, across text, audio, and video data. \\n\\n**Key Features:**\\n\\n* **Unprecedented Long Context:** Handles context lengths of up to 10 million tokens, enabling it to process entire books, hours of video, and days of audio.\\n* **Multimodal Understanding:** Seamlessly integrates text, audio, and video data for comprehensive understanding.\\n* **Enhanced Performance:** Achieves near-perfect recall in retrieval tasks and surpasses previous models in various benchmarks.\\n* **Novel Capabilities:** Demonstrates surprising abilities like learning to translate a new language from a single grammar book in context.\\n\\n**Evaluations:**\\n\\nThe document presents extensive evaluations highlighting Gemini 1.5 Pro\\'s capabilities. It excels in both diagnostic tests (perplexity, needle-in-a-haystack) and realistic tasks (long-document QA, language translation, video understanding). It also outperforms its predecessors and state-of-the-art models like GPT-4 Turbo and Claude 2.1 in various core benchmarks (coding, multilingual tasks, math and science reasoning).\\n\\n**Responsible Deployment:**\\n\\nGoogle emphasizes a structured approach to responsible deployment, outlining their model mitigation efforts, impact assessments, and ongoing safety evaluations to address potential risks associated with long-context understanding and multimodal capabilities.\\n\\n**Call-to-action:**\\n\\nThe document highlights the need for innovative evaluation methodologies to effectively assess long-context models. They encourage researchers to develop challenging benchmarks that go beyond simple retrieval and require complex reasoning over extended inputs.\\n\\n**Overall:**\\n\\nGemini 1.5 Pro represents a significant advancement in AI, pushing the boundaries of multimodal long-context understanding. Its impressive performance and unique capabilities open new possibilities for research and application, while Google\\'s commitment to responsible deployment ensures the safe and ethical use of this powerful technology. \\n', response_metadata={'is_blocked': False, 'safety_ratings': [{'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability_label': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability_label': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability_label': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability_label': 'NEGLIGIBLE', 'blocked': False}], 'usage_metadata': {'prompt_token_count': 19872, 'candidates_token_count': 415, 'total_token_count': 20287}}, id='run-99072700-55be-49d4-acca-205a52256bcd-0')" + "AIMessage(content='The document introduces Gemini 1.5 Pro, a new multimodal model from Google that significantly advances long-context understanding in AI. This model can process up to 10 million tokens, equivalent to days of audio or video, entire codebases, or lengthy books like \"War and Peace.\" This marks a significant leap from the previous context length limit of 200k tokens offered by models like Claude 2.1.\\n\\nGemini 1.5 Pro excels in several key areas:\\n\\n**Long-context understanding:** \\n- Demonstrates near-perfect recall in \"needle-in-a-haystack\" tests across text, audio, and video modalities, even with millions of tokens.\\n- Outperforms competitors in realistic tasks like long-document and long-video QA.\\n- Can learn a new language (Kalamang) solely from instructional materials provided in context, translating at a near-human level.\\n\\n**Core capabilities:**\\n- Maintains high performance on a wide range of benchmarks for tasks like coding, math, reasoning, multilinguality, and instruction following.\\n- Matches or surpasses the state-of-the-art model, Gemini 1.0 Ultra, despite using less training compute.\\n\\nThe document also highlights challenges in evaluating these advanced models and calls for new benchmarks that can effectively assess their long-context understanding capabilities. It emphasizes the need for evaluation methodologies that go beyond simple retrieval tasks and require complex reasoning over multiple pieces of information scattered across vast contexts. \\n\\nFinally, the document outlines Google\\'s approach to responsible deployment of Gemini 1.5 Pro, including impact assessments, mitigation efforts to address potential risks, and ongoing safety evaluations. It acknowledges the potential for both societal benefits and risks associated with these advanced capabilities and stresses the importance of continuous monitoring and evaluation as the model is deployed more broadly.\\n', response_metadata={'is_blocked': False, 'safety_ratings': [{'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability_label': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability_label': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability_label': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability_label': 'NEGLIGIBLE', 'blocked': False}], 'usage_metadata': {'prompt_token_count': 19872, 'candidates_token_count': 376, 'total_token_count': 20248}}, id='run-697179a8-43f6-4c4d-8443-7fe5c0dcd3e9-0')" ] }, - "execution_count": 70, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# invoke a model response\n", + "llm.invoke([message])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Using Video with Gemini Models" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_core.messages import HumanMessage\n", + "from langchain_google_vertexai import ChatVertexAI\n", + "\n", + "# Use Gemini 1.5 Pro\n", + "llm = ChatVertexAI(model=\"gemini-1.5-pro-001\")" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [], + "source": [ + "# Prepare input for model consumption\n", + "media_message = {\n", + " \"type\": \"image_url\",\n", + " \"image_url\": {\n", + " \"url\": \"gs://cloud-samples-data/generative-ai/video/pixel8.mp4\",\n", + " },\n", + "}\n", + "\n", + "text_message = {\n", + " \"type\": \"text\",\n", + " \"text\": \"\"\"Provide a description of the video.\"\"\",\n", + "}\n", + "\n", + "message = HumanMessage(content=[media_message, text_message])" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "AIMessage(content='The video showcases a young woman\\'s journey through the vibrant streets of Tokyo at night. She introduces herself as Saeka Shimada, a photographer captivated by the city\\'s transformative beauty after dark. \\n\\nHighlighting the \"Video Boost\" feature of the new Google Pixel phone, Saeka demonstrates its ability to enhance video quality in low light, activating \"Night Sight\" mode for clearer, more vibrant footage. \\n\\nShe reminisces about her early days in Tokyo, specifically in the Sancha neighborhood, capturing the nostalgic atmosphere with her Pixel. Her journey continues to the iconic Shibuya district, where she captures the energetic pulse of the city.\\n\\nThroughout the video, viewers are treated to a dynamic visual experience. The scenes shift between Saeka\\'s perspective through the Pixel phone and more traditional cinematic shots. This editing technique, coupled with the use of neon lights, reflections, and bustling crowds, creates a captivating portrayal of Tokyo\\'s nightlife. \\n', response_metadata={'is_blocked': False, 'safety_ratings': [{'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability_label': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability_label': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability_label': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability_label': 'NEGLIGIBLE', 'blocked': False}], 'usage_metadata': {'prompt_token_count': 1039, 'candidates_token_count': 193, 'total_token_count': 1232}}, id='run-6b1fbc7d-ea07-4c74-bf62-379a34e3d0cb-0')" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# invoke a model response\n", + "llm.invoke([message])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Using Audio with Gemini 1.5 Pro" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_core.messages import HumanMessage\n", + "from langchain_google_vertexai import ChatVertexAI\n", + "\n", + "# Use Gemini 1.5 Pro\n", + "llm = ChatVertexAI(model=\"gemini-1.5-pro-001\")" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [], + "source": [ + "# Prepare input for model consumption\n", + "media_message = {\n", + " \"type\": \"image_url\",\n", + " \"image_url\": {\n", + " \"url\": \"gs://cloud-samples-data/generative-ai/audio/pixel.mp3\",\n", + " },\n", + "}\n", + "\n", + "text_message = {\n", + " \"type\": \"text\",\n", + " \"text\": \"\"\"Can you transcribe this interview, in the format of timecode, speaker, caption.\n", + " Use speaker A, speaker B, etc. to identify speakers.\"\"\",\n", + "}\n", + "\n", + "message = HumanMessage(content=[media_message, text_message])" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "AIMessage(content=\"```\\n[00:00:00]\\nSpeaker A: your devices are getting better over time. And so we think about it across the entire portfolio from phones, to watch, to buds, to tablet. We get really excited about how we can tell a joint narrative across everything.\\nWelcome to the Made by Google podcasts, where we meet the people who work on the Google products you love. Here's your host, Rasheed Finch.\\nSpeaker B: Today we're talking to Aisha Sherif and DeCarlos Love. They're both product managers for various Pixel devices and work on something that all the Pixel owners love. The Pixel Feature Drops. This is the Made By Google Podcast.\\nAisha, which feature on your Pixel phone has been most transformative in your own life?\\nSpeaker A: So many features. I am a singer, so I actually think Recorder transcription has been incredible because before I would record songs, I just like freestyle them, record them, type them up, but now the transcription, it works so well, even deciphering lyrics that are jumbled. I think that's huge.\\nSpeaker B: Amazing. DeCarlos same question to you, but for a Pixel watch, of course. Long time listeners will know you work on Pixel watch. What has been the most transformative feature in your own life on Pixel watch?\\nSpeaker C: I work on the fitness experiences. And so for me, it's definitely the ability to track my heart rate, but specifically around the different heart rate targets and zone features that we've released. For me, it's been super helpful. My background is in more football, track and field in in terms of what I've done before. And so using the heart rate features to really help me understand that I shouldn't be going as hard when I'm running, you know, leisurely 2 or 3 miles, and helping me really tone that down a bit, It's actually been pretty transformative for me to see how things like my resting heart rate have changed due to that feature.\\nSpeaker B: Amazing. And Aisha, I know we spend a lot of time and energy on feature drops within the Pixel team. Why are they so important to us?\\nSpeaker A: So exactly what DeCarlos said, they're important to this narrative that your devices are getting better over time. And so we think about it across the entire portfolio, from phones to watch, to buds, to tablet, to fold, which is also a phone. But we've even thrown in like Chrome OS to our drops sometimes. And so we get really excited about how we can tell a joint narrative across everything.\\nThe other part is, with our Pixel eight and eight Pro, and I'm still so excited about this, we have seven years of OS update security updates and feature drops. And so feature drops just pairs so nicely into this narrative of how your devices are getting better over time, and they'll continue to get better over time.\\nSpeaker B: Yeah. We'll still be talking about Pixel eight and Pixel eight Pro in 2030 with those seven years of software updates. And I promise, we'll have an episode on that shortly.\\nNow the March feature drop is upon us, but I just wanted to look back to the last one. First one from January. Aisha, could you tell us some of the highlights from the January one that just launched?\\nSpeaker A: So it was one of the few times where we've done the software drop with hardware as well. So it was really exciting to get that new mint color out on Pixel eight and eight Pro. We also had the body temperature sensor launch in the US. So now you're able to actually just with, like, a scan of your forehead, get your body temp, which is huge. And then a ton of AI enhancements. Circle to search came to Pixel eight and eight Pro. So you can search from anywhere. One of my favorites, Photo Emoji. So now you can use photos that you have in your album and react to messages with them. Most random, I was hosting a donut ice cream party and literally had a picture of a donut ice cream sandwich that I used to react to messages. I love those little random, random reactions that you can put out there.\\nSpeaker B: Amazing. And and that was just two months ago. Now we're upon the March feature drop already. There's one for Pixel phones, then one for Pixel watches as well. Let's start now with the watch. DeCarlos, what's new in March?\\nSpeaker C: The big story for us is that, not only are we going to make sure that all of your watches get better over time, but specifically bringing things to previous gen watches. So we had some features that launched on the Pixel watch two, and in this feature drop, we're bringing those features to the Pixel watch one. Some of the things specifically we're looking at our pace features. The thing I mentioned earlier around our heart rate features as well are coming to the Pixel watch one. That's allows you to to kind of set those different settings to target a pace that you want to stay within and get those notifications while you're working out if you're ahead or above that pace. And similar with the heart rate zones as well. We're also bringing activity recognition to Pixel watch one. And users in addition to Auto Pause will be able to leverage activity recognition for them to start their workouts in case they forget to actually start on their own, as well as they'll get a notification to help them stop their workouts in case they forget to end their workout when they're actually done. Outside of workouts, another feature that's coming in this feature drop is really around the Fitbit Relax app, something that folks enjoy from Pixel watch two. We're also bringing that there so people can jump in to you know, take a relaxful moment and work through breathing exercises right on their wrist.\\nSpeaker B: Let's get to the March feature drop on the phone side now. Aisha, what's new for for Pixel phone users?\\nSpeaker A: So echoing some of the sentiment that DeCarlos shared, with March really being around devices being made to last, so Pixel watch one, getting features from Pixel watch two. We're seeing that on the phone side as well. So circle to search will be expanding to Pixel seven and seven Pro. We're also seeing 10 bit HDR move outside of just the camera. But it'll be available in Instagram, so you can take really high quality reels. We also have partial screen sharing. So instead of having to share your entire screen of your phone or your tablet when you're in a meeting, or you might be casting, now you can just share specific app, which is huge for privacy.\\nSpeaker B: Those are some amazing updates in the March feature drop. Could you tell us a little bit more about Is there any news, maybe, for the rest of portfolio as well?\\nSpeaker A: Yeah. So screen sharing is coming to tablet. We're also seeing Docs markup come to tablet. So you can actually just directly What is sounds like? Mark up docs. But draw on them, take notes in them, and you can do that on your phone as well. And then another one that's amazing, Bluetooth connection is getting even better. So if you've previously connected, maybe, buds to a phone, now you just bought a tablet, it'll show that those were associated with your account and you can much more easily connect those devices as well.\\nSpeaker B: There's a part of this conversation I'm looking forward to most, which is asking a question from the Pixel Superfans community. They're getting the opportunity each episode to ask a question. And today's question comes from Casey Carpenter. And they're asking what drives your choice of new software in releases. Which is a good one. So you mentioned now and DeCarlos, we'll start with you. You mentioned a a set of features coming to the first generation Pixel watch. Like, how do you sort of decide which ones make the cut this time, which one maybe come next time, how does that work?\\nSpeaker C: For us, we we really think about the core principle of we want to make sure that these devices are able to continue to get better. And we know that there has been improvements from Pixel watch two. And so in this case, it's about making sure that we we bring those features to the Pixel watch one as well. Obviously, we like to think about, can it actually happen? Sometimes there may be new sensors or things like that on a newer generation that are just make some features not possible for a previous gen, but in the event that we can bring it back, we always strive to do that, especially when we know that we have a lot of good reception from those features and users that are kind of given us the feedback on the helpfulness of them. What are the things that the users really value and really lean into that as helping shape how we think about what comes next?\\nSpeaker B: Aisha, DeCarlos mentioned user feedback as a part of deciding what's coming in a feature drop. How important is that in making all of the decisions?\\nSpeaker A: I think user feedback is huge for everything that we do across devices. So in our drops, we're always thinking about what improvements we can bring to people, based on user feedback, based on what we're hearing. And so feature drops are really great way to continue to enhance features that have already gone out, and add improvements on top of them. It's also a way for us to introduce things that are completely new. Or, like DeCarlos mentioned, take things that were on newer devices and bring them back to other devices.\\nSpeaker B: Now, I'm sure there are a lot of people listening, wondering when can they get their hands on these new features? When is the March feature drop actually landing on their devices? Any thoughts there?\\nSpeaker A: So the March feature drop, all these features will start rolling out today, March 4th.\\nSpeaker B: Now we've had many, many, many feature drops over the years. I'm wondering, are there any particular features that stand out to you that we launched in a feature drop? Maybe, Aisha, I can start with you.\\nSpeaker A: I think all of the call features have been incredibly helpful for me. So couple of my favorites, call screen. We had an enhancement in December, where you get contextual tips now. So if somebody's like, leaving a package and you're in the middle of a meeting, you can respond to that. Also, Direct My Call is available for non toll free numbers. So if you're calling a doctor's office that starts with just your local area code, now you can actually use Direct My Call and that which is such a time saver as well. And clear calling. Love that feature. Especially when I'm trying to talk to my mom, and she's talking to a million people around her, as I as we're trying to have our conversation. So all incredibly, incredibly helpful features.\\nSpeaker B: That's amazing. Such staples of the Pixel family right now. And they all came through a feature drop. DeCarlos of course, Pixel watch has had several feature drops as well. Any favorite in there for you?\\nSpeaker C: Yeah. I have a couple outside of the things that are launching right now. I think one was when we released the SPO2 feature in a feature drop. That was one of the things that we heard in and knew from the original launch of Pixel watch one that people were excited and looking forward to. So it measures your oxygen saturation. You can wear your watch when you sleep. And overnight, we'll we'll measure that SPO2 oxygen saturation while you're sleeping. So that was an exciting one. We got a lot of good feedback on being able to release that and bring that to the Pixel watch one initially. So that was special. Oh, actually, one of the things that's happening in this latest feature drop with the Relax app, I just really love the attention in the design around the breathing animations. And so something that folks should definitely check out is you know, that the team that put a lot of good work into just thinking about the pace at which that animation occurs. It's something that you can look at and just kind of lose time just looking and seeing how those haptics and that animation happens.\\nSpeaker B: Amazing. It's always the little things that make it extra special, right? That's perfect. Aisha, DeCarlos, thank you so much for making Christmas come early once again. And we're all looking forward to the feature drop in March.\\nSpeaker A: Thank you.\\nSpeaker C: Thank you.\\nSpeaker D: Thank you for listening to the Made By Google Podcast. Don't miss out on new episodes. Subscribe now wherever you get your podcasts to be the first to listen.\\n\\n```\", response_metadata={'is_blocked': False, 'safety_ratings': [{'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability_label': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability_label': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability_label': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability_label': 'NEGLIGIBLE', 'blocked': False}], 'usage_metadata': {'prompt_token_count': 1033, 'candidates_token_count': 2755, 'total_token_count': 3788}}, id='run-6697a990-bb8b-4fbf-bdc8-598d9872d833-0')" + ] + }, + "execution_count": 25, "metadata": {}, "output_type": "execute_result" } @@ -981,13 +1108,6 @@ ")" ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Anthropic on Vertex AI" - ] - }, { "cell_type": "code", "execution_count": 21, @@ -1055,6 +1175,13 @@ "chat_llm.invoke([message1])" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Anthropic on Vertex AI" + ] + }, { "cell_type": "markdown", "metadata": {},