{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Cube Semantic Layer" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "This notebook demonstrates the process of retrieving Cube's data model metadata in a format suitable for passing to LLMs as embeddings, thereby enhancing contextual information." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### About Cube" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "[Cube](https://cube.dev/) is the Semantic Layer for building data apps. It helps data engineers and application developers access data from modern data stores, organize it into consistent definitions, and deliver it to every application." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Cube’s data model provides structure and definitions that are used as a context for LLM to understand data and generate correct queries. LLM doesn’t need to navigate complex joins and metrics calculations because Cube abstracts those and provides a simple interface that operates on the business-level terminology, instead of SQL table and column names. This simplification helps LLM to be less error-prone and avoid hallucinations." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Example" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "`Cube Semantic Loader` requires 2 arguments:\n", "| Input Parameter | Description |\n", "| --- | --- |\n", "| `cube_api_url` | The URL of your Cube's deployment REST API. Please refer to the [Cube documentation](https://cube.dev/docs/http-api/rest#configuration-base-path) for more information on configuring the base path. |\n", "| `cube_api_token` | The authentication token generated based on your Cube's API secret. Please refer to the [Cube documentation](https://cube.dev/docs/security#generating-json-web-tokens-jwt) for instructions on generating JSON Web Tokens (JWT). |\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import jwt\n", "from langchain.document_loaders import CubeSemanticLoader\n", "\n", "api_url = \"https://api-example.gcp-us-central1.cubecloudapp.dev/cubejs-api/v1/meta\"\n", "cubejs_api_secret = \"api-secret-here\"\n", "security_context = {}\n", "# Read more about security context here: https://cube.dev/docs/security\n", "api_token = jwt.encode(security_context, cubejs_api_secret, algorithm=\"HS256\")\n", "\n", "loader = CubeSemanticLoader(api_url, api_token)\n", "\n", "documents = loader.load()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Returns:\n", "\n", "A list of documents with the following attributes:\n", "\n", "- `page_content`\n", "- `metadata`\n", " - `table_name`\n", " - `column_name`\n", " - `column_data_type`\n", " - `column_title`\n", " - `column_description`" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "> page_content='table name: orders_view, column name: orders_view.total_amount, column data type: number, column title: Orders View Total Amount, column description: None' metadata={'table_name': 'orders_view', 'column_name': 'orders_view.total_amount', 'column_data_type': 'number', 'column_title': 'Orders View Total Amount', 'column_description': 'None'}" ] } ], "metadata": { "language_info": { "name": "python" }, "orig_nbformat": 4 }, "nbformat": 4, "nbformat_minor": 2 }