mirror of
https://github.com/hwchase17/langchain
synced 2024-10-29 17:07:25 +00:00
dff00ea91e
Still working out interface/notebooks + need discord data dump to test out things other than copy+paste Update: - Going to remove the 'user_id' arg in the loaders themselves and just standardize on putting the "sender" arg in the extra kwargs. Then can provide a utility function to map these to ai and human messages - Going to move the discord one into just a notebook since I don't have a good dump to test on and copy+paste maybe isn't the greatest thing to support in v0 - Need to do more testing on slack since it seems the dump only includes channels and NOT 1 on 1 convos - --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
205 lines
7.8 KiB
Plaintext
205 lines
7.8 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "735455a6-f82e-4252-b545-27385ef883f4",
|
||
"metadata": {},
|
||
"source": [
|
||
"# WhatsApp\n",
|
||
"\n",
|
||
"This notebook shows how to use the WhatsApp chat loader. This class helps map exported Telegram conversations to LangChain chat messages.\n",
|
||
"\n",
|
||
"The process has three steps:\n",
|
||
"1. Export the chat conversations to computer\n",
|
||
"2. Create the `WhatsAppChatLoader` with the file path pointed to the json file or directory of JSON files\n",
|
||
"3. Call `loader.load()` (or `loader.lazy_load()`) to perform the conversion.\n",
|
||
"\n",
|
||
"## 1. Creat message dump\n",
|
||
"\n",
|
||
"To make the export of your WhatsApp conversation(s), complete the following steps:\n",
|
||
"\n",
|
||
"1. Open the target conversation\n",
|
||
"2. Click the three dots in the top right corner and select \"More\".\n",
|
||
"3. Then select \"Export chat\" and choose \"Without media\".\n",
|
||
"\n",
|
||
"An example of the data format for each converation is below: "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 1,
|
||
"id": "285f2044-0f58-4b92-addb-9f8569076734",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Writing whatsapp_chat.txt\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"%%writefile whatsapp_chat.txt\n",
|
||
"[8/15/23, 9:12:33 AM] Dr. Feather: Messages and calls are end-to-end encrypted. No one outside of this chat, not even WhatsApp, can read or listen to them.\n",
|
||
"[8/15/23, 9:12:43 AM] Dr. Feather: I spotted a rare Hyacinth Macaw yesterday in the Amazon Rainforest. Such a magnificent creature!\n",
|
||
"[8/15/23, 9:12:48 AM] Dr. Feather: image omitted\n",
|
||
"[8/15/23, 9:13:15 AM] Jungle Jane: That's stunning! Were you able to observe its behavior?\n",
|
||
"[8/15/23, 9:13:23 AM] Dr. Feather: image omitted\n",
|
||
"[8/15/23, 9:14:02 AM] Dr. Feather: Yes, it seemed quite social with other macaws. They're known for their playful nature.\n",
|
||
"[8/15/23, 9:14:15 AM] Jungle Jane: How's the research going on parrot communication?\n",
|
||
"[8/15/23, 9:14:30 AM] Dr. Feather: image omitted\n",
|
||
"[8/15/23, 9:14:50 AM] Dr. Feather: It's progressing well. We're learning so much about how they use sound and color to communicate.\n",
|
||
"[8/15/23, 9:15:10 AM] Jungle Jane: That's fascinating! Can't wait to read your paper on it.\n",
|
||
"[8/15/23, 9:15:20 AM] Dr. Feather: Thank you! I'll send you a draft soon.\n",
|
||
"[8/15/23, 9:25:16 PM] Jungle Jane: Looking forward to it! Keep up the great work."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "7cc109f4-4c92-4cd3-8143-c322776c3f03",
|
||
"metadata": {},
|
||
"source": [
|
||
"## 2. Create the Chat Loader\n",
|
||
"\n",
|
||
"The WhatsAppChatLoader accepts the resulting zip file, unzipped directory, or the path to any of the chat `.txt` files therein.\n",
|
||
"\n",
|
||
"Provide that as well as the user name you want to take on the role of \"AI\" when finetuning."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 7,
|
||
"id": "111f7767-573c-42d4-86f0-bd766bbaa071",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"from langchain.chat_loaders.whatsapp import WhatsAppChatLoader"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 12,
|
||
"id": "a4226efa-2640-4990-a20c-6861d1887329",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"loader = WhatsAppChatLoader(\n",
|
||
" path=\"./whatsapp_chat.txt\", \n",
|
||
")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "71699fb7-7815-4c89-8d96-30e8fada6923",
|
||
"metadata": {},
|
||
"source": [
|
||
"## 3. Load messages\n",
|
||
"\n",
|
||
"The `load()` (or `lazy_load`) methods return a list of \"ChatSessions\" that currently store the list of messages per loaded conversation."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 13,
|
||
"id": "81121efb-c875-4a77-ad1e-fe26b3d7e812",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"[{'messages': [AIMessage(content='I spotted a rare Hyacinth Macaw yesterday in the Amazon Rainforest. Such a magnificent creature!', additional_kwargs={'sender': 'Dr. Feather', 'events': [{'message_time': '8/15/23, 9:12:43 AM'}]}, example=False),\n",
|
||
" HumanMessage(content=\"That's stunning! Were you able to observe its behavior?\", additional_kwargs={'sender': 'Jungle Jane', 'events': [{'message_time': '8/15/23, 9:13:15 AM'}]}, example=False),\n",
|
||
" AIMessage(content=\"Yes, it seemed quite social with other macaws. They're known for their playful nature.\", additional_kwargs={'sender': 'Dr. Feather', 'events': [{'message_time': '8/15/23, 9:14:02 AM'}]}, example=False),\n",
|
||
" HumanMessage(content=\"How's the research going on parrot communication?\", additional_kwargs={'sender': 'Jungle Jane', 'events': [{'message_time': '8/15/23, 9:14:15 AM'}]}, example=False),\n",
|
||
" AIMessage(content=\"It's progressing well. We're learning so much about how they use sound and color to communicate.\", additional_kwargs={'sender': 'Dr. Feather', 'events': [{'message_time': '8/15/23, 9:14:50 AM'}]}, example=False),\n",
|
||
" HumanMessage(content=\"That's fascinating! Can't wait to read your paper on it.\", additional_kwargs={'sender': 'Jungle Jane', 'events': [{'message_time': '8/15/23, 9:15:10 AM'}]}, example=False),\n",
|
||
" AIMessage(content=\"Thank you! I'll send you a draft soon.\", additional_kwargs={'sender': 'Dr. Feather', 'events': [{'message_time': '8/15/23, 9:15:20 AM'}]}, example=False),\n",
|
||
" HumanMessage(content='Looking forward to it! Keep up the great work.', additional_kwargs={'sender': 'Jungle Jane', 'events': [{'message_time': '8/15/23, 9:25:16 PM'}]}, example=False)]}]"
|
||
]
|
||
},
|
||
"execution_count": 13,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"from typing import List\n",
|
||
"from langchain.chat_loaders.base import ChatSession\n",
|
||
"from langchain.chat_loaders.utils import (\n",
|
||
" map_ai_messages,\n",
|
||
" merge_chat_runs,\n",
|
||
")\n",
|
||
"\n",
|
||
"raw_messages = loader.lazy_load()\n",
|
||
"# Merge consecutive messages from the same sender into a single message\n",
|
||
"merged_messages = merge_chat_runs(raw_messages)\n",
|
||
"# Convert messages from \"Dr. Feather\" to AI messages\n",
|
||
"messages: List[ChatSession] = list(map_ai_messages(merged_messages, sender=\"Dr. Feather\"))"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "b9089c05-7375-41ca-a2f9-672a845314e4",
|
||
"metadata": {},
|
||
"source": [
|
||
"### Next Steps\n",
|
||
"\n",
|
||
"You can then use these messages how you see fit, such as finetuning a model, few-shot example selection, or directly make predictions for the next message."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 14,
|
||
"id": "637a6f5d-6944-4722-9361-a76ef5e9dd2a",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Thank you for the encouragement! I'll do my best to continue studying and sharing fascinating insights about parrot communication."
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"from langchain.chat_models import ChatOpenAI\n",
|
||
"\n",
|
||
"llm = ChatOpenAI()\n",
|
||
"\n",
|
||
"for chunk in llm.stream(messages[0]['messages']):\n",
|
||
" print(chunk.content, end=\"\", flush=True)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "16156643-cfbd-444f-b4ae-198eb44f0267",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": []
|
||
}
|
||
],
|
||
"metadata": {
|
||
"kernelspec": {
|
||
"display_name": "Python 3 (ipykernel)",
|
||
"language": "python",
|
||
"name": "python3"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.11.2"
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 5
|
||
}
|