mirror of
https://github.com/hwchase17/langchain
synced 2024-10-29 17:07:25 +00:00
0498dad562
### Summary Updates `UnstructuredEmailLoader` so that it can process attachments in addition to the e-mail content. The loader will process attachments if the `process_attachments` kwarg is passed when the loader is instantiated. ### Testing ```python file_path = "fake-email-attachment.eml" loader = UnstructuredEmailLoader( file_path, mode="elements", process_attachments=True ) docs = loader.load() docs[-1] ``` ### Reviewers - @rlancemartin - @eyurtsev - @hwchase17
50 lines
1.6 KiB
Plaintext
50 lines
1.6 KiB
Plaintext
MIME-Version: 1.0
|
|
Date: Fri, 23 Dec 2022 12:08:48 -0600
|
|
Message-ID: <CAPgNNXSzLVJ-d1OCX_TjFgJU7ugtQrjFybPtAMmmYZzphxNFYg@mail.gmail.com>
|
|
Subject: Fake email with attachment
|
|
From: Mallori Harrell <mallori@unstructured.io>
|
|
To: Mallori Harrell <mallori@unstructured.io>
|
|
Content-Type: multipart/mixed; boundary="0000000000005d654405f082adb7"
|
|
|
|
--0000000000005d654405f082adb7
|
|
Content-Type: multipart/alternative; boundary="0000000000005d654205f082adb5"
|
|
|
|
--0000000000005d654205f082adb5
|
|
Content-Type: text/plain; charset="UTF-8"
|
|
|
|
Hello!
|
|
|
|
Here's the attachments!
|
|
|
|
It includes:
|
|
|
|
- Lots of whitespace
|
|
- Little to no content
|
|
- and is a quick read
|
|
|
|
Best,
|
|
|
|
Mallori
|
|
|
|
--0000000000005d654205f082adb5
|
|
Content-Type: text/html; charset="UTF-8"
|
|
Content-Transfer-Encoding: quoted-printable
|
|
|
|
<div dir=3D"ltr">Hello!=C2=A0<div><br></div><div>Here's the attachments=
|
|
!</div><div><br></div><div>It includes:</div><div><ul><li style=3D"margin-l=
|
|
eft:15px">Lots of whitespace</li><li style=3D"margin-left:15px">Little=C2=
|
|
=A0to no content</li><li style=3D"margin-left:15px">and is a quick read</li=
|
|
></ul><div>Best,</div></div><div><br></div><div>Mallori</div><div dir=3D"lt=
|
|
r" class=3D"gmail_signature" data-smartmail=3D"gmail_signature"><div dir=3D=
|
|
"ltr"><div><div><br></div></div></div></div></div>
|
|
|
|
--0000000000005d654205f082adb5--
|
|
--0000000000005d654405f082adb7
|
|
Content-Type: text/plain; charset="US-ASCII"; name="fake-attachment.txt"
|
|
Content-Disposition: attachment; filename="fake-attachment.txt"
|
|
Content-Transfer-Encoding: base64
|
|
X-Attachment-Id: f_lc0tto5j0
|
|
Content-ID: <f_lc0tto5j0>
|
|
|
|
SGV5IHRoaXMgaXMgYSBmYWtlIGF0dGFjaG1lbnQh
|
|
--0000000000005d654405f082adb7-- |