langchain/docs/modules/indexes/document_loaders/examples/audio.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "14bfa52e",
   "metadata": {},
   "source": [
    "# OpenAIWhisperParser\n",
    "\n",
    "This notebook goes over how to load data from an audio file, such as an mp3.\n",
    "\n",
    "We use the `OpenAIWhisperParser`, which will use the [OpenAI Whisper API](https://platform.openai.com/docs/guides/speech-to-text) to transcribe audio to text.\n",
    "\n",
    "Note: You will need to have an `OPENAI_API_KEY` supplied."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "e2257932",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.document_loaders.generic import GenericLoader\n",
    "from langchain.document_loaders.parsers import OpenAIWhisperParser"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "e21b9c4d",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Directory contains audio for the first 20 minutes of one Andrej Karpathy video \n",
    "# \"The spelled-out intro to neural networks and backpropagation: building micrograd\"\n",
    "# https://www.youtube.com/watch?v=VMj-3S1tku0\n",
    "audio_file_path = \"example_data/\"\n",
    "loader = GenericLoader.from_filesystem(audio_file_path, glob=\"*.mp3\", parser=OpenAIWhisperParser())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "f50fbf64",
   "metadata": {},
   "outputs": [],
   "source": [
    "docs = loader.load()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "ca414073",
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[Document(page_content=\"Hello, my name is Andrej and I've been training deep neural networks for a bit more than a decade. And in this lecture I'd like to show you what neural network training looks like under the hood. So in particular we are going to start with a blank Jupyter notebook and by the end of this lecture we will define and train a neural net and you'll get to see everything that goes on under the hood and exactly sort of how that works on an intuitive level. Now specifically what I would like to do is I would like to take you through building of micrograd. Now micrograd is this library that I released on GitHub about two years ago but at the time I only uploaded the source code and you'd have to go in by yourself and really figure out how it works. So in this lecture I will take you through it step by step and kind of comment on all the pieces of it. So what is micrograd and why is it interesting? Thank you. Micrograd is basically an autograd engine. Autograd is short for automatic gradient and really what it does is it implements back propagation. Now back propagation is this algorithm that allows you to efficiently evaluate the gradient of some kind of a loss function with respect to the weights of a neural network and what that allows us to do then is we can iteratively tune the weights of that neural network to minimize the loss function and therefore improve the accuracy of the network. So back propagation would be at the mathematical core of any modern deep neural network library like say PyTorch or JAX. So the functionality of micrograd is I think best illustrated by an example. So if we just scroll down here you'll see that micrograd basically allows you to build out mathematical expressions and here what we are doing is we have an expression that we're building out where you have two inputs a and b and you'll see that a and b are negative four and two but we are wrapping those values into this value object that we are going to build out as part of micrograd. So this value object will wrap the numbers themselves and then we are going to build out a mathematical expression here where a and b are transformed into c d and eventually e f and g and I'm showing some of the functionality of micrograd and the operations that it supports. So you can add two value objects, you can multiply them, you can raise them to a constant power, you can offset by one, negate, squash at zero, square, divide by constant, divide by it, etc. And so we're building out an expression graph with these two inputs a and b and we're creating an output value of g and micrograd will in the background build out this entire mathematical expression. So it will for example know that c is also a value, c was a result of an addition operation and the child nodes of c are a and b because the and it will maintain pointers to a and b value objects. So we'll basically know exactly how all of this is laid out and then not only can we do what we call the forward pass where we actually look at the value of g of course, that's pretty straightforward, we will access that using the dot data attribute and so the output of the forward pass, the value of g, is 24.7 it turns out. But the big deal is that we can also take this g value object and we can call dot backward and this will basically initialize backpropagation at the node g. And what backpropagation is going to do is it's going to start at g and it's going to go backwards through that expression graph and it's going to recursively apply the chain rule from calculus. And what that allows us to do then is we're going to evaluate basically the derivative of g with respect to all the internal nodes like e, d, and c but also with respect to the inputs a and b. And then we can actually query this derivative of g with respect to a, for example that's a.grad, in this case it happens to be 138, and the derivative of g with respect to b which also happens to be here 645. And this derivative we'll see soon is very important information because it's telling us how a and b are affecting g through
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "docs"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.16"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
Create OpenAIWhisperParser for generating Documents from audio files (#5580) # OpenAIWhisperParser This PR creates a new parser, `OpenAIWhisperParser`, that uses the [OpenAI Whisper model](https://platform.openai.com/docs/guides/speech-to-text/quickstart) to perform transcription of audio files to text (`Documents`). Please see the notebook for usage. 2023-06-05 22:51:13 +00:00			`{`
			`"cells": [`
			`{`
			`"cell_type": "markdown",`
			`"id": "14bfa52e",`
			`"metadata": {},`
			`"source": [`
			`"# OpenAIWhisperParser\n",`
			`"\n",`
			`"This notebook goes over how to load data from an audio file, such as an mp3.\n",`
			`"\n",`
			"We use the `OpenAIWhisperParser`, which will use the [OpenAI Whisper API](https://platform.openai.com/docs/guides/speech-to-text) to transcribe audio to text.\n",
			`"\n",`
			"Note: You will need to have an `OPENAI_API_KEY` supplied."
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": 1,`
			`"id": "e2257932",`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"from langchain.document_loaders.generic import GenericLoader\n",`
			`"from langchain.document_loaders.parsers import OpenAIWhisperParser"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": 2,`
			`"id": "e21b9c4d",`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"# Directory contains audio for the first 20 minutes of one Andrej Karpathy video \n",`
			`"# \"The spelled-out intro to neural networks and backpropagation: building micrograd\"\n",`
			`"# https://www.youtube.com/watch?v=VMj-3S1tku0\n",`
			`"audio_file_path = \"example_data/\"\n",`
			`"loader = GenericLoader.from_filesystem(audio_file_path, glob=\"*.mp3\", parser=OpenAIWhisperParser())"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": 3,`
			`"id": "f50fbf64",`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"docs = loader.load()"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": 4,`
			`"id": "ca414073",`
			`"metadata": {`
			`"scrolled": false`
			`},`
			`"outputs": [`
			`{`
			`"data": {`
			`"text/plain": [`
			"[Document(page_content=\"Hello, my name is Andrej and I've been training deep neural networks for a bit more than a decade. And in this lecture I'd like to show you what neural network training looks like under the hood. So in particular we are going to start with a blank Jupyter notebook and by the end of this lecture we will define and train a neural net and you'll get to see everything that goes on under the hood and exactly sort of how that works on an intuitive level. Now specifically what I would like to do is I would like to take you through building of micrograd. Now micrograd is this library that I released on GitHub about two years ago but at the time I only uploaded the source code and you'd have to go in by yourself and really figure out how it works. So in this lecture I will take you through it step by step and kind of comment on all the pieces of it. So what is micrograd and why is it interesting? Thank you. Micrograd is basically an autograd engine. Autograd is short for automatic gradient and really what it does is it implements back propagation. Now back propagation is this algorithm that allows you to efficiently evaluate the gradient of some kind of a loss function with respect to the weights of a neural network and what that allows us to do then is we can iteratively tune the weights of that neural network to minimize the loss function and therefore improve the accuracy of the network. So back propagation would be at the mathematical core of any modern deep neural network library like say PyTorch or JAX. So the functionality of micrograd is I think best illustrated by an example. So if we just scroll down here you'll see that micrograd basically allows you to build out mathematical expressions and here what we are doing is we have an expression that we're building out where you have two inputs a and b and you'll see that a and b are negative four and two but we are wrapping those values into this value object that we are going to build out as part of micrograd. So this value object will wrap the numbers themselves and then we are going to build out a mathematical expression here where a and b are transformed into c d and eventually e f and g and I'm showing some of the functionality of micrograd and the operations that it supports. So you can add two value objects, you can multiply them, you can raise them to a constant power, you can offset by one, negate, squash at zero, square, divide by constant, divide by it, etc. And so we're building out an expression graph with these two inputs a and b and we're creating an output value of g and micrograd will in the background build out this entire mathematical expression. So it will for example know that c is also a value, c was a result of an addition operation and the child nodes of c are a and b because the and it will maintain pointers to a and b value objects. So we'll basically know exactly how all of this is laid out and then not only can we do what we call the forward pass where we actually look at the value of g of course, that's pretty straightforward, we will access that using the dot data attribute and so the output of the forward pass, the value of g, is 24.7 it turns out. But the big deal is that we can also take this g value object and we can call dot backward and this will basically initialize backpropagation at the node g. And what backpropagation is going to do is it's going to start at g and it's going to go backwards through that expression graph and it's going to recursively apply the chain rule from calculus. And what that allows us to do then is we're going to evaluate basically the derivative of g with respect to all the internal nodes like e, d, and c but also with respect to the inputs a and b. And then we can actually query this derivative of g with respect to a, for example that's a.grad, in this case it happens to be 138, and the derivative of g with respect to b which also happens to be here 645. And this derivative we'll see soon is very important information because it's telling us how a and b are affecting g through
			`]`
			`},`
			`"execution_count": 4,`
			`"metadata": {},`
			`"output_type": "execute_result"`
			`}`
			`],`
			`"source": [`
			`"docs"`
			`]`
			`}`
			`],`
			`"metadata": {`
			`"kernelspec": {`
			`"display_name": "Python 3 (ipykernel)",`
			`"language": "python",`
			`"name": "python3"`
			`},`
			`"language_info": {`
			`"codemirror_mode": {`
			`"name": "ipython",`
			`"version": 3`
			`},`
			`"file_extension": ".py",`
			`"mimetype": "text/x-python",`
			`"name": "python",`
			`"nbconvert_exporter": "python",`
			`"pygments_lexer": "ipython3",`
			`"version": "3.9.16"`
			`}`
			`},`
			`"nbformat": 4,`
			`"nbformat_minor": 5`
			`}`