langchain/docs/modules/indexes/document_loaders/examples
Julius Lipp 5b6bbf4ab2
Add embaas document extraction api endpoints (#6048)
# Introduces embaas document extraction api endpoints

In this PR, we add support for embaas document extraction endpoints to
Text Embedding Models (with LLMs, in different PRs coming). We currently
offer the MTEB leaderboard top performers, will continue to add top
embedding models and soon add support for customers to deploy thier own
models. Additional Documentation + Infomation can be found
[here](https://embaas.io).

While developing this integration, I closely followed the patterns
established by other langchain integrations. Nonetheless, if there are
any aspects that require adjustments or if there's a better way to
present a new integration, let me know! :)

Additionally, I fixed some docs in the embeddings integration.

Related PR: #5976 

#### Who can review?
  DataLoaders
  - @eyurtsev
2023-06-12 19:13:52 -07:00
..
example_data feat: Add UnstructuredXMLLoader for .xml files (#5955) 2023-06-10 16:24:42 -07:00
airbyte_json.ipynb Vwp/docs improved document loaders (#4006) 2023-05-02 15:24:53 -07:00
airtable.ipynb Create Airtable loader (#5958) 2023-06-10 15:43:18 -07:00
alibaba_cloud_maxcompute.ipynb add maxcompute (#5533) 2023-06-01 00:54:42 -07:00
apify_dataset.ipynb Vwp/docs improved document loaders (#4006) 2023-05-02 15:24:53 -07:00
arxiv.ipynb docs: ecosystem/integrations update 1 (#5219) 2023-05-29 07:25:17 -07:00
audio.ipynb Create OpenAIWhisperParser for generating Documents from audio files (#5580) 2023-06-05 15:51:13 -07:00
aws_s3_directory.ipynb docs: document_loaders improvements (#4200) 2023-05-05 17:44:54 -07:00
aws_s3_file.ipynb docs: document_loaders improvements (#4200) 2023-05-05 17:44:54 -07:00
azlyrics.ipynb Vwp/docs improved document loaders (#4006) 2023-05-02 15:24:53 -07:00
azure_blob_storage_container.ipynb Vwp/docs improved document loaders (#4006) 2023-05-02 15:24:53 -07:00
azure_blob_storage_file.ipynb Vwp/docs improved document loaders (#4006) 2023-05-02 15:24:53 -07:00
bibtex.ipynb Bibtex integration for document loader and retriever (#5137) 2023-05-25 00:21:31 -07:00
bilibili.ipynb Fix bilibili (#4860) 2023-05-18 09:56:51 -04:00
blackboard.ipynb docs: document_loaders improvements (#4200) 2023-05-05 17:44:54 -07:00
blockchain.ipynb Fix typos (#5323) 2023-05-26 18:55:21 -07:00
chatgpt_loader.ipynb docs: document_loaders improvements (#4200) 2023-05-05 17:44:54 -07:00
college_confidential.ipynb Vwp/docs improved document loaders (#4006) 2023-05-02 15:24:53 -07:00
confluence.ipynb Implements support for Personal Access Token Authentication in the ConfluenceLoader (#5385) 2023-06-03 14:57:49 -07:00
conll-u.ipynb docs: document_loaders improvements (#4200) 2023-05-05 17:44:54 -07:00
copypaste.ipynb Vwp/docs improved document loaders (#4006) 2023-05-02 15:24:53 -07:00
csv.ipynb feat: Add UnstructuredCSVLoader for CSV files (#5844) 2023-06-07 19:18:01 -07:00
diffbot.ipynb docs: ecosystem/integrations update 2 (#5282) 2023-05-29 07:19:43 -07:00
discord.ipynb docs ecosystem/integrations update 3 (#5470) 2023-05-31 17:54:05 -07:00
docugami.ipynb Documentation fixes (linting and broken links) (#5563) 2023-06-01 13:06:17 -07:00
duckdb.ipynb Vwp/docs improved document loaders (#4006) 2023-05-02 15:24:53 -07:00
email.ipynb Vwp/docs improved document loaders (#4006) 2023-05-02 15:24:53 -07:00
embaas.ipynb Add embaas document extraction api endpoints (#6048) 2023-06-12 19:13:52 -07:00
epub.ipynb docs: document_loaders improvements (#4200) 2023-05-05 17:44:54 -07:00
evernote.ipynb feature/4493 Improve Evernote Document Loader (#4577) 2023-05-19 14:28:17 -07:00
excel.ipynb feat: add UnstructuredExcelLoader for .xlsx and .xls files (#5617) 2023-06-03 12:44:12 -07:00
facebook_chat.ipynb docs ecosystem/integrations update 3 (#5470) 2023-05-31 17:54:05 -07:00
fauna.ipynb Harrison/fauna loader (#5864) 2023-06-07 21:32:23 -07:00
figma.ipynb Vwp/docs improved document loaders (#4006) 2023-05-02 15:24:53 -07:00
file_directory.ipynb feat #4479: TextLoader auto detect encoding and improved exceptions (#4927) 2023-05-18 09:55:14 -04:00
git.ipynb Vwp/docs improved document loaders (#4006) 2023-05-02 15:24:53 -07:00
gitbook.ipynb Vwp/docs improved document loaders (#4006) 2023-05-02 15:24:53 -07:00
github.ipynb DocumentLoader for GitHub (#5408) 2023-05-29 20:11:21 -07:00
google_bigquery.ipynb docs: document_loaders improvements (#4200) 2023-05-05 17:44:54 -07:00
google_cloud_storage_directory.ipynb docs: document_loaders improvements (#4200) 2023-05-05 17:44:54 -07:00
google_cloud_storage_file.ipynb docs: document_loaders improvements (#4200) 2023-05-05 17:44:54 -07:00
google_drive.ipynb Load specific file types from Google Drive (issue #4878) (#4926) 2023-05-18 09:27:53 -04:00
gutenberg.ipynb Vwp/docs improved document loaders (#4006) 2023-05-02 15:24:53 -07:00
hacker_news.ipynb docs: document_loaders improvements (#4200) 2023-05-05 17:44:54 -07:00
html.ipynb 2markdown loader (#4796) 2023-05-16 23:42:53 -07:00
hugging_face_dataset.ipynb docs: document_loaders improvements (#4200) 2023-05-05 17:44:54 -07:00
ifixit.ipynb docs: document_loaders improvements (#4200) 2023-05-05 17:44:54 -07:00
image_captions.ipynb docs: document_loaders improvements (#4200) 2023-05-05 17:44:54 -07:00
image.ipynb docs: document_loaders improvements (#4200) 2023-05-05 17:44:54 -07:00
imsdb.ipynb docs: document_loaders improvements (#4200) 2023-05-05 17:44:54 -07:00
iugu.ipynb Add Iugu document loader (#5162) 2023-05-24 11:47:01 -07:00
joplin.ipynb Add Joplin document loader (#5153) 2023-05-24 12:31:55 -07:00
json.ipynb docs: added missed document_loaders examples (#5150) 2023-05-23 21:56:41 -07:00
jupyter_notebook.ipynb docs: document_loaders improvements (#4200) 2023-05-05 17:44:54 -07:00
markdown.ipynb Vwp/docs improved document loaders (#4006) 2023-05-02 15:24:53 -07:00
mastodon.ipynb Add Mastodon toots loader (#5036) 2023-05-22 16:43:07 -07:00
mediawikidump.ipynb docs: document_loaders improvements (#4200) 2023-05-05 17:44:54 -07:00
microsoft_onedrive.ipynb docs: document_loaders improvements (#4200) 2023-05-05 17:44:54 -07:00
microsoft_powerpoint.ipynb docs: document_loaders improvements (#4200) 2023-05-05 17:44:54 -07:00
microsoft_word.ipynb Update microsoft loader example with docx2txt dependency (#5832) 2023-06-07 19:21:48 -07:00
modern_treasury.ipynb docs: document_loaders improvements (#4200) 2023-05-05 17:44:54 -07:00
notion.ipynb Vwp/docs improved document loaders (#4006) 2023-05-02 15:24:53 -07:00
notiondb.ipynb Harrison/param notion db (#4689) 2023-05-14 18:26:25 -07:00
obsidian.ipynb Vwp/docs improved document loaders (#4006) 2023-05-02 15:24:53 -07:00
odt.ipynb docs: added missed document_loaders examples (#5150) 2023-05-23 21:56:41 -07:00
pandas_dataframe.ipynb docs: document_loaders improvements (#4200) 2023-05-05 17:44:54 -07:00
pdf.ipynb Feature: pdfplumber PDF loader with BaseBlobParser (#4552) 2023-05-15 09:47:02 -04:00
psychic.ipynb Harrison/psychic (#5063) 2023-05-21 09:13:20 -07:00
pyspark_dataframe.ipynb Add minor fixes for PySpark Document Loader Docs (#5525) 2023-05-31 15:02:57 -07:00
readthedocs_documentation.ipynb Vwp/docs improved document loaders (#4006) 2023-05-02 15:24:53 -07:00
reddit.ipynb docs ecosystem/integrations update 3 (#5470) 2023-05-31 17:54:05 -07:00
roam.ipynb Vwp/docs improved document loaders (#4006) 2023-05-02 15:24:53 -07:00
sitemap.ipynb Add how to use a custom scraping function with the sitemap loader. (#5847) 2023-06-07 19:16:51 -07:00
slack.ipynb Fix a typo in the documentation for the Slack document loader (#5745) 2023-06-05 13:30:24 -07:00
snowflake.ipynb fixes to docs (#5919) 2023-06-09 09:15:53 -07:00
spreedly.ipynb Vwp/docs improved document loaders (#4006) 2023-05-02 15:24:53 -07:00
stripe.ipynb docs: document_loaders improvements (#4200) 2023-05-05 17:44:54 -07:00
subtitle.ipynb docs: document_loaders improvements (#4200) 2023-05-05 17:44:54 -07:00
telegram.ipynb docs ecosystem/integrations update 4 (#5590) 2023-06-03 15:29:03 -07:00
tomarkdown.ipynb docs: added missed document_loaders examples (#5150) 2023-05-23 21:56:41 -07:00
toml.ipynb docs: document_loaders improvements (#4200) 2023-05-05 17:44:54 -07:00
trello.ipynb docs ecosystem/integrations update 4 (#5590) 2023-06-03 15:29:03 -07:00
twitter.ipynb docs: document_loaders improvements (#4200) 2023-05-05 17:44:54 -07:00
unstructured_file.ipynb docs: unstructured no longer requires installing detectron2 from source (#5524) 2023-05-31 15:03:21 -07:00
url.ipynb Harrison/playwright (#2871) 2023-04-13 22:15:03 -07:00
weather.ipynb Adding Weather Loader (#5056) 2023-05-23 15:57:33 -07:00
web_base.ipynb docs: document_loaders improvements (#4200) 2023-05-05 17:44:54 -07:00
whatsapp_chat.ipynb docs ecosystem/integrations update 4 (#5590) 2023-06-03 15:29:03 -07:00
wikipedia.ipynb added Wikipedia document loader (#4141) 2023-05-06 09:32:45 -07:00
xml.ipynb feat: Add UnstructuredXMLLoader for .xml files (#5955) 2023-06-10 16:24:42 -07:00
youtube_audio.ipynb YoutubeAudioLoader and updates to OpenAIWhisperParser (#5772) 2023-06-06 15:15:08 -07:00
youtube_transcript.ipynb Harrison/youtube multi language (#5758) 2023-06-05 16:38:07 -07:00