You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
langchain/docs/extras/integrations/document_loaders/mastodon.ipynb

127 lines
3.8 KiB
Plaintext

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

{
"cells": [
{
"cell_type": "markdown",
"id": "66a7777e",
"metadata": {},
"source": [
"# Mastodon\n",
"\n",
">[Mastodon](https://joinmastodon.org/) is a federated social media and social networking service.\n",
"\n",
"This loader fetches the text from the \"toots\" of a list of `Mastodon` accounts, using the `Mastodon.py` Python package.\n",
"\n",
"Public accounts can the queried by default without any authentication. If non-public accounts or instances are queried, you have to register an application for your account which gets you an access token, and set that token and your account's API base URL.\n",
"\n",
"Then you need to pass in the Mastodon account names you want to extract, in the `@account@instance` format."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9ec8a3b3",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import MastodonTootsLoader"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "43128d8d",
"metadata": {},
"outputs": [],
"source": [
"#!pip install Mastodon.py"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "35d6809a",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"loader = MastodonTootsLoader(\n",
" mastodon_accounts=[\"@Gargron@mastodon.social\"],\n",
" number_toots=50, # Default value is 100\n",
")\n",
"\n",
"# Or set up access information to use a Mastodon app.\n",
"# Note that the access token can either be passed into\n",
"# constructor or you can set the envirovnment \"MASTODON_ACCESS_TOKEN\".\n",
"# loader = MastodonTootsLoader(\n",
"# access_token=\"<ACCESS TOKEN OF MASTODON APP>\",\n",
"# api_base_url=\"<API BASE URL OF MASTODON APP INSTANCE>\",\n",
"# mastodon_accounts=[\"@Gargron@mastodon.social\"],\n",
"# number_toots=50, # Default value is 100\n",
"# )"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "05fe33b9",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<p>It is tough to leave this behind and go back to reality. And some people live here! Im sure there are downsides but it sounds pretty good to me right now.</p>\n",
"================================================================================\n",
"<p>I wish we could stay here a little longer, but it is time to go home 🥲</p>\n",
"================================================================================\n",
"<p>Last day of the honeymoon. And its <a href=\"https://mastodon.social/tags/caturday\" class=\"mention hashtag\" rel=\"tag\">#<span>caturday</span></a>! This cute tabby came to the restaurant to beg for food and got some chicken.</p>\n",
"================================================================================\n"
]
}
],
"source": [
"documents = loader.load()\n",
"for doc in documents[:3]:\n",
" print(doc.page_content)\n",
" print(\"=\" * 80)"
]
},
{
"cell_type": "markdown",
"id": "322bb6a1",
"metadata": {},
"source": [
"The toot texts (the documents' `page_content`) is by default HTML as returned by the Mastodon API."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}