mirror of
https://github.com/hwchase17/langchain
synced 2024-10-31 15:20:26 +00:00
207 lines
6.2 KiB
Plaintext
207 lines
6.2 KiB
Plaintext
|
{
|
||
|
"cells": [
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "735455a6-f82e-4252-b545-27385ef883f4",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"# Telegram\n",
|
||
|
"\n",
|
||
|
"This notebook shows how to use the Telegram chat loader. This class helps map exported Telegram conversations to LangChain chat messages.\n",
|
||
|
"\n",
|
||
|
"The process has three steps:\n",
|
||
|
"1. Export the chat .txt file by copying chats from the Discord app and pasting them in a file on your local computer\n",
|
||
|
"2. Create the `TelegramChatLoader` with the file path pointed to the json file or directory of JSON files\n",
|
||
|
"3. Call `loader.load()` (or `loader.lazy_load()`) to perform the conversion. Optionally use `merge_chat_runs` to combine message from the same sender in sequence, and/or `map_ai_messages` to convert messages from the specified sender to the \"AIMessage\" class.\n",
|
||
|
"\n",
|
||
|
"## 1. Creat message dump\n",
|
||
|
"\n",
|
||
|
"Currently (2023/08/23) this loader best supports json files in the format generated by exporting your chat history from the [Telegram Desktop App](https://desktop.telegram.org/).\n",
|
||
|
"\n",
|
||
|
"**Important:** There are 'lite' versions of telegram such as \"Telegram for MacOS\" that lack the export functionality. Please make sure you use the correct app to export the file.\n",
|
||
|
"\n",
|
||
|
"To make the export:\n",
|
||
|
"1. Download and open telegram desktop\n",
|
||
|
"2. Select a conversation\n",
|
||
|
"3. Navigate to the conversation settings (currently the three dots in the top right corner)\n",
|
||
|
"4. Click \"Export Chat History\"\n",
|
||
|
"5. Unselect photos and other media. Select \"Machine-readable JSON\" format to export.\n",
|
||
|
"\n",
|
||
|
"An example is below: "
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 1,
|
||
|
"id": "285f2044-0f58-4b92-addb-9f8569076734",
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"Overwriting telegram_conversation.json\n"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"%%writefile telegram_conversation.json\n",
|
||
|
"{\n",
|
||
|
" \"name\": \"Jiminy\",\n",
|
||
|
" \"type\": \"personal_chat\",\n",
|
||
|
" \"id\": 5965280513,\n",
|
||
|
" \"messages\": [\n",
|
||
|
" {\n",
|
||
|
" \"id\": 1,\n",
|
||
|
" \"type\": \"message\",\n",
|
||
|
" \"date\": \"2023-08-23T13:11:23\",\n",
|
||
|
" \"date_unixtime\": \"1692821483\",\n",
|
||
|
" \"from\": \"Jiminy Cricket\",\n",
|
||
|
" \"from_id\": \"user123450513\",\n",
|
||
|
" \"text\": \"You better trust your conscience\",\n",
|
||
|
" \"text_entities\": [\n",
|
||
|
" {\n",
|
||
|
" \"type\": \"plain\",\n",
|
||
|
" \"text\": \"You better trust your conscience\"\n",
|
||
|
" }\n",
|
||
|
" ]\n",
|
||
|
" },\n",
|
||
|
" {\n",
|
||
|
" \"id\": 2,\n",
|
||
|
" \"type\": \"message\",\n",
|
||
|
" \"date\": \"2023-08-23T13:13:20\",\n",
|
||
|
" \"date_unixtime\": \"1692821600\",\n",
|
||
|
" \"from\": \"Batman & Robin\",\n",
|
||
|
" \"from_id\": \"user6565661032\",\n",
|
||
|
" \"text\": \"What did you just say?\",\n",
|
||
|
" \"text_entities\": [\n",
|
||
|
" {\n",
|
||
|
" \"type\": \"plain\",\n",
|
||
|
" \"text\": \"What did you just say?\"\n",
|
||
|
" }\n",
|
||
|
" ]\n",
|
||
|
" }\n",
|
||
|
" ]\n",
|
||
|
"}"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "7cc109f4-4c92-4cd3-8143-c322776c3f03",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"## 2. Create the Chat Loader\n",
|
||
|
"\n",
|
||
|
"All that's required is the file path. You can optionally specify the user name that maps to an ai message as well an configure whether to merge message runs."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 2,
|
||
|
"id": "111f7767-573c-42d4-86f0-bd766bbaa071",
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"from langchain.chat_loaders.telegram import TelegramChatLoader"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 3,
|
||
|
"id": "a4226efa-2640-4990-a20c-6861d1887329",
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"loader = TelegramChatLoader(\n",
|
||
|
" path=\"./telegram_conversation.json\", \n",
|
||
|
")"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "71699fb7-7815-4c89-8d96-30e8fada6923",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"## 3. Load messages\n",
|
||
|
"\n",
|
||
|
"The `load()` (or `lazy_load`) methods return a list of \"ChatSessions\" that currently just contain a list of messages per loaded conversation."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 4,
|
||
|
"id": "81121efb-c875-4a77-ad1e-fe26b3d7e812",
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"from typing import List\n",
|
||
|
"from langchain.chat_loaders.base import ChatSession\n",
|
||
|
"from langchain.chat_loaders.utils import (\n",
|
||
|
" map_ai_messages,\n",
|
||
|
" merge_chat_runs,\n",
|
||
|
")\n",
|
||
|
"\n",
|
||
|
"raw_messages = loader.lazy_load()\n",
|
||
|
"# Merge consecutive messages from the same sender into a single message\n",
|
||
|
"merged_messages = merge_chat_runs(raw_messages)\n",
|
||
|
"# Convert messages from \"Jiminy Cricket\" to AI messages\n",
|
||
|
"messages: List[ChatSession] = list(map_ai_messages(merged_messages, sender=\"Jiminy Cricket\"))"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "b9089c05-7375-41ca-a2f9-672a845314e4",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"### Next Steps\n",
|
||
|
"\n",
|
||
|
"You can then use these messages how you see fit, such as finetuning a model, few-shot example selection, or directly make predictions for the next message "
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 5,
|
||
|
"id": "637a6f5d-6944-4722-9361-a76ef5e9dd2a",
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"I said, \"You better trust your conscience.\""
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"from langchain.chat_models import ChatOpenAI\n",
|
||
|
"\n",
|
||
|
"llm = ChatOpenAI()\n",
|
||
|
"\n",
|
||
|
"for chunk in llm.stream(messages[0]['messages']):\n",
|
||
|
" print(chunk.content, end=\"\", flush=True)"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"metadata": {
|
||
|
"kernelspec": {
|
||
|
"display_name": "Python 3 (ipykernel)",
|
||
|
"language": "python",
|
||
|
"name": "python3"
|
||
|
},
|
||
|
"language_info": {
|
||
|
"codemirror_mode": {
|
||
|
"name": "ipython",
|
||
|
"version": 3
|
||
|
},
|
||
|
"file_extension": ".py",
|
||
|
"mimetype": "text/x-python",
|
||
|
"name": "python",
|
||
|
"nbconvert_exporter": "python",
|
||
|
"pygments_lexer": "ipython3",
|
||
|
"version": "3.11.2"
|
||
|
}
|
||
|
},
|
||
|
"nbformat": 4,
|
||
|
"nbformat_minor": 5
|
||
|
}
|