langchain/docs/extras/integrations/text_embedding/baidu_qianfan_endpoint.ipynb

164 lines
5.7 KiB
Plaintext
Raw Normal View History

{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Baidu Qianfan\n",
"\n",
"Baidu AI Cloud Qianfan Platform is a one-stop large model development and service operation platform for enterprise developers. Qianfan not only provides including the model of Wenxin Yiyan (ERNIE-Bot) and the third-party open source models, but also provides various AI development tools and the whole set of development environment, which facilitates customers to use and develop large model applications easily.\n",
"\n",
"Basically, those model are split into the following type:\n",
"\n",
"- Embedding\n",
"- Chat\n",
"- Completion\n",
"\n",
"In this notebook, we will introduce how to use langchain with [Qianfan](https://cloud.baidu.com/doc/WENXINWORKSHOP/index.html) mainly in `Embedding` corresponding\n",
" to the package `langchain/embeddings` in langchain:\n",
"\n",
"\n",
"\n",
"## API Initialization\n",
"\n",
"To use the LLM services based on Baidu Qianfan, you have to initialize these parameters:\n",
"\n",
"You could either choose to init the AK,SK in enviroment variables or init params:\n",
"\n",
"```base\n",
"export QIANFAN_AK=XXX\n",
"export QIANFAN_SK=XXX\n",
"```\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"[INFO] [09-15 20:01:35] logging.py:55 [t:140292313159488]: trying to refresh access_token\n",
"[INFO] [09-15 20:01:35] logging.py:55 [t:140292313159488]: sucessfully refresh access_token\n",
"[INFO] [09-15 20:01:35] logging.py:55 [t:140292313159488]: requesting llm api endpoint: /embeddings/embedding-v1\n",
"[INFO] [09-15 20:01:35] logging.py:55 [t:140292313159488]: async requesting llm api endpoint: /embeddings/embedding-v1\n",
"[INFO] [09-15 20:01:35] logging.py:55 [t:140292313159488]: async requesting llm api endpoint: /embeddings/embedding-v1\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"[-0.03313107788562775, 0.052325375378131866, 0.04951248690485954, 0.0077608139254152775, -0.05907672271132469, -0.010798933915793896, 0.03741293027997017, 0.013969100080430508]\n",
" [0.0427522286772728, -0.030367236584424973, -0.14847028255462646, 0.055074431002140045, -0.04177454113960266, -0.059512972831726074, -0.043774791061878204, 0.0028191760648041964]\n",
" [0.03803155943751335, -0.013231384567916393, 0.0032379645854234695, 0.015074018388986588, -0.006529552862048149, -0.13813287019729614, 0.03297128155827522, 0.044519297778606415]\n"
]
}
],
"source": [
"\"\"\"For basic init and call\"\"\"\n",
"from langchain.embeddings import QianfanEmbeddingsEndpoint \n",
"\n",
"import os\n",
"os.environ[\"QIANFAN_AK\"] = \"your_ak\"\n",
"os.environ[\"QIANFAN_SK\"] = \"your_sk\"\n",
"\n",
"embed = QianfanEmbeddingsEndpoint(\n",
" # qianfan_ak='xxx', \n",
" # qianfan_sk='xxx'\n",
")\n",
"res = embed.embed_documents([\"hi\", \"world\"])\n",
"\n",
"async def aioEmbed():\n",
" res = await embed.aembed_query(\"qianfan\")\n",
" print(res[:8])\n",
"await aioEmbed()\n",
"\n",
"import asyncio\n",
"async def aioEmbedDocs():\n",
" res = await embed.aembed_documents([\"hi\", \"world\"])\n",
" for r in res:\n",
" print(\"\", r[:8])\n",
"await aioEmbedDocs()\n",
"\n",
"\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Use different models in Qianfan\n",
"\n",
"In the case you want to deploy your own model based on Ernie Bot or third-party open sources model, you could follow these steps:\n",
"\n",
"- 1. Optional, if the model are included in the default models, skip itDeploy your model in Qianfan Console, get your own customized deploy endpoint.\n",
"- 2. Set up the field called `endpoint` in the initlization:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"[INFO] [09-15 20:01:40] logging.py:55 [t:140292313159488]: requesting llm api endpoint: /embeddings/bge_large_zh\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"[-0.0001582596160005778, -0.025089964270591736, -0.03997539356350899, 0.013156415894627571, 0.000135212714667432, 0.012428865768015385, 0.016216561198234558, -0.04126659780740738]\n",
"[0.0019113451708108187, -0.008625439368188381, -0.0531032420694828, -0.0018436014652252197, -0.01818147301673889, 0.010310115292668343, -0.008867680095136166, -0.021067561581730843]\n"
]
}
],
"source": [
"embed = QianfanEmbeddingsEndpoint(\n",
" model=\"bge_large_zh\",\n",
" endpoint=\"bge_large_zh\"\n",
" )\n",
"\n",
"res = embed.embed_documents([\"hi\", \"world\"])\n",
"for r in res :\n",
" print(r[:8])"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "base",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.4"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "6fa70026b407ae751a5c9e6bd7f7d482379da8ad616f98512780b705c84ee157"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}