You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
langchain/docs/docs/integrations/chat_loaders/telegram.ipynb

210 lines
6.3 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"id": "735455a6-f82e-4252-b545-27385ef883f4",
"metadata": {},
"source": [
"# Telegram\n",
"\n",
"This notebook shows how to use the Telegram chat loader. This class helps map exported Telegram conversations to LangChain chat messages.\n",
"\n",
"The process has three steps:\n",
"1. Export the chat .txt file by copying chats from the Telegram app and pasting them in a file on your local computer\n",
"2. Create the `TelegramChatLoader` with the file path pointed to the json file or directory of JSON files\n",
"3. Call `loader.load()` (or `loader.lazy_load()`) to perform the conversion. Optionally use `merge_chat_runs` to combine message from the same sender in sequence, and/or `map_ai_messages` to convert messages from the specified sender to the \"AIMessage\" class.\n",
"\n",
"## 1. Create message dump\n",
"\n",
"Currently (2023/08/23) this loader best supports json files in the format generated by exporting your chat history from the [Telegram Desktop App](https://desktop.telegram.org/).\n",
"\n",
"**Important:** There are 'lite' versions of telegram such as \"Telegram for MacOS\" that lack the export functionality. Please make sure you use the correct app to export the file.\n",
"\n",
"To make the export:\n",
"1. Download and open telegram desktop\n",
"2. Select a conversation\n",
"3. Navigate to the conversation settings (currently the three dots in the top right corner)\n",
"4. Click \"Export Chat History\"\n",
"5. Unselect photos and other media. Select \"Machine-readable JSON\" format to export.\n",
"\n",
"An example is below: "
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "285f2044-0f58-4b92-addb-9f8569076734",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting telegram_conversation.json\n"
]
}
],
"source": [
"%%writefile telegram_conversation.json\n",
"{\n",
" \"name\": \"Jiminy\",\n",
" \"type\": \"personal_chat\",\n",
" \"id\": 5965280513,\n",
" \"messages\": [\n",
" {\n",
" \"id\": 1,\n",
" \"type\": \"message\",\n",
" \"date\": \"2023-08-23T13:11:23\",\n",
" \"date_unixtime\": \"1692821483\",\n",
" \"from\": \"Jiminy Cricket\",\n",
" \"from_id\": \"user123450513\",\n",
" \"text\": \"You better trust your conscience\",\n",
" \"text_entities\": [\n",
" {\n",
" \"type\": \"plain\",\n",
" \"text\": \"You better trust your conscience\"\n",
" }\n",
" ]\n",
" },\n",
" {\n",
" \"id\": 2,\n",
" \"type\": \"message\",\n",
" \"date\": \"2023-08-23T13:13:20\",\n",
" \"date_unixtime\": \"1692821600\",\n",
" \"from\": \"Batman & Robin\",\n",
" \"from_id\": \"user6565661032\",\n",
" \"text\": \"What did you just say?\",\n",
" \"text_entities\": [\n",
" {\n",
" \"type\": \"plain\",\n",
" \"text\": \"What did you just say?\"\n",
" }\n",
" ]\n",
" }\n",
" ]\n",
"}"
]
},
{
"cell_type": "markdown",
"id": "7cc109f4-4c92-4cd3-8143-c322776c3f03",
"metadata": {},
"source": [
"## 2. Create the Chat Loader\n",
"\n",
"All that's required is the file path. You can optionally specify the user name that maps to an ai message as well an configure whether to merge message runs."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "111f7767-573c-42d4-86f0-bd766bbaa071",
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.chat_loaders.telegram import TelegramChatLoader"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "a4226efa-2640-4990-a20c-6861d1887329",
"metadata": {},
"outputs": [],
"source": [
"loader = TelegramChatLoader(\n",
" path=\"./telegram_conversation.json\",\n",
")"
]
},
{
"cell_type": "markdown",
"id": "71699fb7-7815-4c89-8d96-30e8fada6923",
"metadata": {},
"source": [
"## 3. Load messages\n",
"\n",
"The `load()` (or `lazy_load`) methods return a list of \"ChatSessions\" that currently just contain a list of messages per loaded conversation."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "81121efb-c875-4a77-ad1e-fe26b3d7e812",
"metadata": {},
"outputs": [],
"source": [
"from typing import List\n",
"\n",
"from langchain_community.chat_loaders.utils import (\n",
" map_ai_messages,\n",
" merge_chat_runs,\n",
")\n",
"from langchain_core.chat_sessions import ChatSession\n",
"\n",
"raw_messages = loader.lazy_load()\n",
"# Merge consecutive messages from the same sender into a single message\n",
"merged_messages = merge_chat_runs(raw_messages)\n",
"# Convert messages from \"Jiminy Cricket\" to AI messages\n",
"messages: List[ChatSession] = list(\n",
" map_ai_messages(merged_messages, sender=\"Jiminy Cricket\")\n",
")"
]
},
{
"cell_type": "markdown",
"id": "b9089c05-7375-41ca-a2f9-672a845314e4",
"metadata": {},
"source": [
"### Next Steps\n",
"\n",
"You can then use these messages how you see fit, such as fine-tuning a model, few-shot example selection, or directly make predictions for the next message "
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "637a6f5d-6944-4722-9361-a76ef5e9dd2a",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"I said, \"You better trust your conscience.\""
]
}
],
"source": [
"from langchain_openai import ChatOpenAI\n",
"\n",
"llm = ChatOpenAI()\n",
"\n",
"for chunk in llm.stream(messages[0][\"messages\"]):\n",
" print(chunk.content, end=\"\", flush=True)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.2"
}
},
"nbformat": 4,
"nbformat_minor": 5
}