{ "cells": [ { "cell_type": "markdown", "source": [ "# Embaas\n", "[embaas](https://embaas.io) is a fully managed NLP API service that offers features like embedding generation, document text extraction, document to embeddings and more. You can choose a [variety of pre-trained models](https://embaas.io/docs/models/embeddings).\n", "\n", "### Prerequisites\n", "Create a free embaas account at [https://embaas.io/register](https://embaas.io/register) and generate an [API key](https://embaas.io/dashboard/api-keys)\n", "\n", "### Document Text Extraction API\n", "The document text extraction API allows you to extract the text from a given document. The API supports a variety of document formats, including PDF, mp3, mp4 and more. For a full list of supported formats, check out the API docs (link below)." ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": [ "# Set API key\n", "embaas_api_key = \"YOUR_API_KEY\"\n", "# or set environment variable\n", "os.environ[\"EMBAAS_API_KEY\"] = \"YOUR_API_KEY\"" ], "metadata": { "collapsed": false } }, { "cell_type": "markdown", "source": [ "#### Using a blob (bytes)" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": [ "from langchain.document_loaders.embaas import EmbaasBlobLoader\n", "from langchain.document_loaders.blob_loaders import Blob" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": [ "blob_loader = EmbaasBlobLoader()\n", "blob = Blob.from_path(\"example.pdf\")\n", "documents = blob_loader.load(blob)" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": [ "# You can also directly create embeddings with your preferred embeddings model\n", "blob_loader = EmbaasBlobLoader(params={\"model\": \"e5-large-v2\", \"should_embed\": True})\n", "blob = Blob.from_path(\"example.pdf\")\n", "documents = blob_loader.load(blob)\n", "\n", "print(documents[0][\"metadata\"][\"embedding\"])" ], "metadata": { "collapsed": false, "ExecuteTime": { "start_time": "2023-06-12T22:19:48.366886Z", "end_time": "2023-06-12T22:19:48.380467Z" } } }, { "cell_type": "markdown", "source": [ "#### Using a file" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": [ "from langchain.document_loaders.embaas import EmbaasLoader" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": [ "file_loader = EmbaasLoader(file_path=\"example.pdf\")\n", "documents = file_loader.load()" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": 15, "outputs": [], "source": [ "# Disable automatic text splitting\n", "file_loader = EmbaasLoader(file_path=\"example.mp3\", params={\"should_chunk\": False})\n", "documents = file_loader.load()" ], "metadata": { "collapsed": false, "ExecuteTime": { "start_time": "2023-06-12T22:24:31.880857Z", "end_time": "2023-06-12T22:24:31.894665Z" } } }, { "cell_type": "markdown", "source": [ "For more detailed information about the embaas document text extraction API, please refer to [the official embaas API documentation](https://embaas.io/api-reference)." ], "metadata": { "collapsed": false } } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.6" } }, "nbformat": 4, "nbformat_minor": 0 }