langchain/docs/extras/integrations/document_loaders
Patrick Loeber 5990651070
Add new document_loader: AssemblyAIAudioTranscriptLoader (#9667)
This PR adds a new document loader `AssemblyAIAudioTranscriptLoader`
that allows to transcribe audio files with the [AssemblyAI
API](https://www.assemblyai.com) and loads the transcribed text into
documents.

- Add new document_loader with class `AssemblyAIAudioTranscriptLoader`
- Add optional dependency `assemblyai`
- Add unit tests (using a Mock client)
- Add docs notebook

This is the equivalent to the JS integration already available in
LangChain.js. See the [LangChain JS docs AssemblyAI
page](https://js.langchain.com/docs/modules/data_connection/document_loaders/integrations/web_loaders/assemblyai_audio_transcription).

At its simplest, you can use the loader to get a transcript back from an
audio file like this:

```python
from langchain.document_loaders.assemblyai import AssemblyAIAudioTranscriptLoader

loader =  AssemblyAIAudioTranscriptLoader(file_path="./testfile.mp3")
docs = loader.load()
```

To use it, it needs the `assemblyai` python package installed, and the
environment variable `ASSEMBLYAI_API_KEY` set with your API key.
Alternatively, the API key can also be passed as an argument.

Twitter handles to shout out if so kindly 🙇
[@AssemblyAI](https://twitter.com/AssemblyAI) and
[@patloeber](https://twitter.com/patloeber)

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2023-08-23 22:51:19 -07:00
..
example_data RSS Feed / OPML loader (#8694) 2023-08-03 14:58:06 -07:00
acreom.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
airbyte_cdk.ipynb Airbyte based loaders (#8586) 2023-08-08 14:49:25 -07:00
airbyte_gong.ipynb Airbyte based loaders (#8586) 2023-08-08 14:49:25 -07:00
airbyte_hubspot.ipynb Airbyte based loaders (#8586) 2023-08-08 14:49:25 -07:00
airbyte_json.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
airbyte_salesforce.ipynb Airbyte based loaders (#8586) 2023-08-08 14:49:25 -07:00
airbyte_shopify.ipynb Airbyte based loaders (#8586) 2023-08-08 14:49:25 -07:00
airbyte_stripe.ipynb Airbyte based loaders (#8586) 2023-08-08 14:49:25 -07:00
airbyte_typeform.ipynb Airbyte based loaders (#8586) 2023-08-08 14:49:25 -07:00
airbyte_zendesk_support.ipynb Airbyte based loaders (#8586) 2023-08-08 14:49:25 -07:00
airtable.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
alibaba_cloud_maxcompute.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
apify_dataset.ipynb Update Integrations links (#8206) 2023-07-24 21:20:32 -07:00
arcgis.ipynb ArcGISLoader update (#9240) 2023-08-14 23:44:29 -07:00
arxiv.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
assemblyai.ipynb Add new document_loader: AssemblyAIAudioTranscriptLoader (#9667) 2023-08-23 22:51:19 -07:00
async_chromium.ipynb Added new use case docs for Web Scraping, Chromium loader, BS4 transformer (#8732) 2023-08-11 11:46:59 -07:00
async_html.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
aws_s3_directory.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
aws_s3_file.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
azlyrics.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
azure_blob_storage_container.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
azure_blob_storage_file.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
bibtex.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
bilibili.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
blackboard.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
blockchain.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
brave_search.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
browserless.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
chatgpt_loader.ipynb Add api cross ref linking (#8275) 2023-07-26 12:38:58 -07:00
college_confidential.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
concurrent.ipynb Add ConcurrentLoader (#7512) 2023-07-31 17:56:31 -07:00
confluence.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
conll-u.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
copypaste.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
csv.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
cube_semantic.ipynb Extend Cube Semantic Loader functionality (#8186) 2023-07-24 12:11:58 -07:00
datadog_logs.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
diffbot.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
discord.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
docugami.ipynb comment update for poetry install 2023-08-19 13:50:16 -07:00
dropbox.ipynb mv dropbox (#8438) 2023-07-28 16:07:56 -07:00
duckdb.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
email.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
embaas.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
epub.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
Etherscan.ipynb Fix typo in Etherscan.ipynb (#8340) 2023-07-27 01:57:19 -07:00
evernote.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
excel.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
facebook_chat.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
fauna.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
figma.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
geopandas.ipynb fix geopandas link (#8305) 2023-07-26 11:30:17 -07:00
git.ipynb docs:misc fixes (#9671) 2023-08-23 22:36:54 -07:00
gitbook.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
github.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
google_bigquery.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
google_cloud_storage_directory.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
google_cloud_storage_file.ipynb Allow to specify a custom loader for GcsFileLoader (#8868) 2023-08-07 22:57:31 -04:00
google_drive.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
grobid.ipynb Improved grobid documentation (#9025) 2023-08-10 10:47:22 -04:00
gutenberg.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
hacker_news.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
huawei_obs_directory.ipynb Add Support for Loading Documents from Huawei OBS (#8573) 2023-08-01 09:30:30 -07:00
huawei_obs_file.ipynb Add Support for Loading Documents from Huawei OBS (#8573) 2023-08-01 09:30:30 -07:00
hugging_face_dataset.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
ifixit.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
image_captions.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
image.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
imsdb.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
index.mdx mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
iugu.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
joplin.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
jupyter_notebook.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
larksuite.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
mastodon.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
mediawikidump.ipynb Github add "Create PR" tool + Docs update (#8235) 2023-07-27 19:19:44 -07:00
merge_doc_loader.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
mhtml.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
microsoft_onedrive.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
microsoft_powerpoint.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
microsoft_sharepoint.ipynb Add SharePoint Loader (#4284) 2023-08-21 07:49:07 -07:00
microsoft_word.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
modern_treasury.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
news.ipynb Newspaper (#8647) 2023-08-02 17:56:08 -07:00
notion.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
notiondb.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
nuclia.ipynb Bagatur/revert revert nuclia (#8833) 2023-08-06 11:24:36 -07:00
obsidian.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
odt.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
open_city_data.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
org_mode.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
pandas_dataframe.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
pdf-amazonTextractPDFLoader.ipynb AmazonTextractPDFLoader documentation updates (#9415) 2023-08-20 16:40:15 -07:00
polars_dataframe.ipynb Add to support polars (#9610) 2023-08-22 07:36:24 -07:00
psychic.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
pubmed.ipynb PubMed document loader (#8893) 2023-08-08 14:26:03 -04:00
pyspark_dataframe.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
readthedocs_documentation.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
recursive_url_loader.ipynb Async Recursive URL loader (#8502) 2023-08-06 16:22:31 -07:00
reddit.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
roam.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
rockset.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
rss.ipynb update rss doc (#8761) 2023-08-04 08:25:20 -07:00
rst.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
sitemap.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
slack.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
snowflake.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
source_code.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
spreedly.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
stripe.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
subtitle.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
telegram.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
tencent_cos_directory.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
tencent_cos_file.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
tensorflow_datasets.ipynb tensoflow_datasets document loader (#8721) 2023-08-08 15:19:28 -04:00
tomarkdown.ipynb mv popular and additional chains to use cases (#8242) 2023-07-27 12:55:13 -07:00
toml.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
trello.ipynb added lxml to the pip install example since it is required (#8260) 2023-07-25 18:16:07 -07:00
tsv.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
twitter.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
unstructured_file.ipynb fix: apply unstructured preprocess functions (#9473) 2023-08-18 18:54:28 -07:00
url.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
weather.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
web_base.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
whatsapp_chat.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
wikipedia.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
xml.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
xorbits.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
youtube_audio.ipynb Add local support for audio models (PR #7329) (#7591) 2023-08-02 01:24:53 -07:00
youtube_transcript.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00