You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
openai-cookbook/examples/vector_databases/kusto/Getting_started_with_kusto_...

2 lines
56 KiB
Plaintext

{"cells":[{"attachments":{},"cell_type":"markdown","metadata":{"nteract":{"transient":{"deleting":false}}},"source":[]},{"attachments":{},"cell_type":"markdown","metadata":{"nteract":{"transient":{"deleting":false}}},"source":["# Kusto as a Vector database for AI embeddings"]},{"attachments":{},"cell_type":"markdown","metadata":{"nteract":{"transient":{"deleting":false}}},"source":["This Notebook provides step by step instuctions on using Azure Data Explorer (Kusto) as a vector database with OpenAI embeddings. "]},{"attachments":{},"cell_type":"markdown","metadata":{"nteract":{"transient":{"deleting":false}}},"source":["This notebook presents an end-to-end process of:\n","\n","1. Using precomputed embeddings created by OpenAI API.\n","2. Storing the embeddings in Kusto.\n","3. Converting raw text query to an embedding with OpenAI API.\n","4. Using Kusto to perform cosine similarity search in the stored embeddings\n"]},{"attachments":{},"cell_type":"markdown","metadata":{"nteract":{"transient":{"deleting":false}}},"source":["### Prerequisites"]},{"attachments":{},"cell_type":"markdown","metadata":{"nteract":{"transient":{"deleting":false}}},"source":["For the purposes of this exercise we need to prepare a couple of things:\n","\n","1. Azure Data Explorer(Kusto) server instance. https://azure.microsoft.com/en-us/products/data-explorer\n","3. Azure OpenAI credentials or OpenAI API key."]},{"cell_type":"code","execution_count":2,"metadata":{"jupyter":{"outputs_hidden":false,"source_hidden":false},"nteract":{"transient":{"deleting":false}}},"outputs":[{"data":{"application/vnd.livy.statement-meta+json":{"execution_finish_time":"2023-05-10T09:24:58.8253972Z","execution_start_time":"2023-05-10T09:24:58.8250545Z","livy_statement_state":"available","parent_msg_id":"affb2f05-b242-4152-99b2-f30e3f854c21","queued_time":"2023-05-10T09:24:43.3953963Z","session_id":"7e5070d2-4560-4fb8-a3a8-6a594acd58ab","session_start_time":null,"spark_jobs":{"jobs":[],"limit":20,"numbers":{"FAILED":0,"RUNNING":0,"SUCCEEDED":0,"UNKNOWN":0},"rule":"ALL_DESC"},"spark_pool":null,"state":"finished","statement_id":-1},"text/plain":["StatementMeta(, 7e5070d2-4560-4fb8-a3a8-6a594acd58ab, -1, Finished, Available)"]},"metadata":{},"output_type":"display_data"},{"data":{},"execution_count":2,"metadata":{},"output_type":"execute_result"},{"name":"stdout","output_type":"stream","text":["Collecting wget\n"," Downloading wget-3.2.zip (10 kB)\n"," Preparing metadata (setup.py) ... \u001b[?25ldone\n","\u001b[?25hBuilding wheels for collected packages: wget\n"," Building wheel for wget (setup.py) ... \u001b[?25l-\b \bdone\n","\u001b[?25h Created wheel for wget: filename=wget-3.2-py3-none-any.whl size=9657 sha256=10fd8aa1d20fd49c36389dc888acc721d0578c5a0635fc9fc5dc642c0f49522e\n"," Stored in directory: /home/trusted-service-user/.cache/pip/wheels/8b/f1/7f/5c94f0a7a505ca1c81cd1d9208ae2064675d97582078e6c769\n","Successfully built wget\n","Installing collected packages: wget\n","Successfully installed wget-3.2\n","\n","\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.0\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.1.2\u001b[0m\n","\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49m/nfs4/pyenv-27214bb4-edfd-4fdd-b888-8a99075a1416/bin/python -m pip install --upgrade pip\u001b[0m\n","Note: you may need to restart the kernel to use updated packages.\n"]},{"data":{},"execution_count":2,"metadata":{},"output_type":"execute_result"},{"name":"stdout","output_type":"stream","text":["Warning: PySpark kernel has been restarted to use updated packages.\n","\n"]}],"source":["%pip install wget"]},{"cell_type":"code","execution_count":3,"metadata":{"jupyter":{"outputs_hidden":false,"source_hidden":false},"nteract":{"transient":{"deleting":false}}},"outputs":[{"data":{"application/vnd.livy.statement-meta+json":{"execution_finish_time":"2023-05-10T09:25:13.0187836Z","execution_start_time":"2023-05-10T09:25:13.018