langchain/docs/modules/indexes/document_loaders/examples/telegram.ipynb
Raduan Al-Shedivat 00c6ec8a2d
fix(document_loaders/telegram): fix pandas calls + add tests (#4806)
# Fix Telegram API loader + add tests.
I was testing this integration and it was broken with next error:
```python
message_threads = loader._get_message_threads(df)
KeyError: False
```
Also, this particular loader didn't have any tests / related group in
poetry, so I added those as well.

@hwchase17 / @eyurtsev please take a look on this fix PR.

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-16 14:35:25 -07:00

126 lines
3.2 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "markdown",
"id": "33205b12",
"metadata": {},
"source": [
"# Telegram\n",
"\n",
">[Telegram Messenger](https://web.telegram.org/a/) is a globally accessible freemium, cross-platform, encrypted, cloud-based and centralized instant messaging service. The application also provides optional end-to-end encrypted chats and video calling, VoIP, file sharing and several other features.\n",
"\n",
"This notebook covers how to load data from `Telegram` into a format that can be ingested into LangChain."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "90b69c94",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import TelegramChatFileLoader, TelegramChatApiLoader"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "13deb0f5",
"metadata": {},
"outputs": [],
"source": [
"loader = TelegramChatFileLoader(\"example_data/telegram.json\")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "9ccc1e2f",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content=\"Henry on 2020-01-01T00:00:02: It's 2020...\\n\\nHenry on 2020-01-01T00:00:04: Fireworks!\\n\\nGrace 🧤 ðŸ\\x8d on 2020-01-01T00:00:05: You're a minute late!\\n\\n\", metadata={'source': 'example_data/telegram.json'})]"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"loader.load()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "3e64cac2",
"metadata": {},
"source": [
"`TelegramChatApiLoader` loads data directly from any specified chat from Telegram. In order to export the data, you will need to authenticate your Telegram account. \n",
"\n",
"You can get the API_HASH and API_ID from https://my.telegram.org/auth?to=apps\n",
"\n",
"chat_entity recommended to be the [entity](https://docs.telethon.dev/en/stable/concepts/entities.html?highlight=Entity#what-is-an-entity) of a channel.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f05f75f3",
"metadata": {},
"outputs": [],
"source": [
"loader = TelegramChatApiLoader(\n",
" chat_entity=\"<CHAT_URL>\", # recommended to use Entity here\n",
" api_hash=\"<API HASH >\", \n",
" api_id=\"<API_ID>\", \n",
" user_name =\"\", # needed only for caching the session.\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "40039f7b",
"metadata": {},
"outputs": [],
"source": [
"loader.load()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "18e5af2b",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}