mirror of
https://github.com/hwchase17/langchain
synced 2024-11-10 01:10:59 +00:00
193 lines
6.4 KiB
Plaintext
193 lines
6.4 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "bda1f3f5",
|
||
"metadata": {},
|
||
"source": [
|
||
"# BibTeX\n",
|
||
"\n",
|
||
"> BibTeX is a file format and reference management system commonly used in conjunction with LaTeX typesetting. It serves as a way to organize and store bibliographic information for academic and research documents.\n",
|
||
"\n",
|
||
"BibTeX files have a .bib extension and consist of plain text entries representing references to various publications, such as books, articles, conference papers, theses, and more. Each BibTeX entry follows a specific structure and contains fields for different bibliographic details like author names, publication title, journal or book title, year of publication, page numbers, and more.\n",
|
||
"\n",
|
||
"Bibtex files can also store the path to documents, such as `.pdf` files that can be retrieved."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "1b7a1eef-7bf7-4e7d-8bfc-c4e27c9488cb",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Installation\n",
|
||
"First, you need to install `bibtexparser` and `PyMuPDF`."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 26,
|
||
"id": "b674aaea-ed3a-4541-8414-260a8f67f623",
|
||
"metadata": {
|
||
"tags": []
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"#!pip install bibtexparser pymupdf"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "95f05e1c-195e-4e2b-ae8e-8d6637f15be6",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Examples"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "e29b954c-1407-4797-ae21-6ba8937156be",
|
||
"metadata": {},
|
||
"source": [
|
||
"`BibtexLoader` has these arguments:\n",
|
||
"- `file_path`: the path the the `.bib` bibtex file\n",
|
||
"- optional `max_docs`: default=None, i.e. not limit. Use it to limit number of retrieved documents.\n",
|
||
"- optional `max_content_chars`: default=4000. Use it to limit the number of characters in a single document.\n",
|
||
"- optional `load_extra_meta`: default=False. By default only the most important fields from the bibtex entries: `Published` (publication year), `Title`, `Authors`, `Summary`, `Journal`, `Keywords`, and `URL`. If True, it will also try to load return `entry_id`, `note`, `doi`, and `links` fields. \n",
|
||
"- optional `file_pattern`: default=`r'[^:]+\\.pdf'`. Regex pattern to find files in the `file` entry. Default pattern supports `Zotero` flavour bibtex style and bare file path."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 27,
|
||
"id": "9bfd5e46",
|
||
"metadata": {
|
||
"tags": []
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"from langchain.document_loaders import BibtexLoader"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 28,
|
||
"id": "01971b53",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Create a dummy bibtex file and download a pdf.\n",
|
||
"import urllib.request\n",
|
||
"\n",
|
||
"urllib.request.urlretrieve(\n",
|
||
" \"https://www.fourmilab.ch/etexts/einstein/specrel/specrel.pdf\", \"einstein1905.pdf\"\n",
|
||
")\n",
|
||
"\n",
|
||
"bibtex_text = \"\"\"\n",
|
||
" @article{einstein1915,\n",
|
||
" title={Die Feldgleichungen der Gravitation},\n",
|
||
" abstract={Die Grundgleichungen der Gravitation, die ich hier entwickeln werde, wurden von mir in einer Abhandlung: ,,Die formale Grundlage der allgemeinen Relativit{\\\"a}tstheorie`` in den Sitzungsberichten der Preu{\\ss}ischen Akademie der Wissenschaften 1915 ver{\\\"o}ffentlicht.},\n",
|
||
" author={Einstein, Albert},\n",
|
||
" journal={Sitzungsberichte der K{\\\"o}niglich Preu{\\ss}ischen Akademie der Wissenschaften},\n",
|
||
" volume={1915},\n",
|
||
" number={1},\n",
|
||
" pages={844--847},\n",
|
||
" year={1915},\n",
|
||
" doi={10.1002/andp.19163540702},\n",
|
||
" link={https://onlinelibrary.wiley.com/doi/abs/10.1002/andp.19163540702},\n",
|
||
" file={einstein1905.pdf}\n",
|
||
" }\n",
|
||
" \"\"\"\n",
|
||
"# save bibtex_text to biblio.bib file\n",
|
||
"with open(\"./biblio.bib\", \"w\") as file:\n",
|
||
" file.write(bibtex_text)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 29,
|
||
"id": "2631f46b",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"docs = BibtexLoader(\"./biblio.bib\").load()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 30,
|
||
"id": "33ef1fb2",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"{'id': 'einstein1915',\n",
|
||
" 'published_year': '1915',\n",
|
||
" 'title': 'Die Feldgleichungen der Gravitation',\n",
|
||
" 'publication': 'Sitzungsberichte der K{\"o}niglich Preu{\\\\ss}ischen Akademie der Wissenschaften',\n",
|
||
" 'authors': 'Einstein, Albert',\n",
|
||
" 'abstract': 'Die Grundgleichungen der Gravitation, die ich hier entwickeln werde, wurden von mir in einer Abhandlung: ,,Die formale Grundlage der allgemeinen Relativit{\"a}tstheorie`` in den Sitzungsberichten der Preu{\\\\ss}ischen Akademie der Wissenschaften 1915 ver{\"o}ffentlicht.',\n",
|
||
" 'url': 'https://doi.org/10.1002/andp.19163540702'}"
|
||
]
|
||
},
|
||
"execution_count": 30,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"docs[0].metadata"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 31,
|
||
"id": "46969806-45a9-4c4d-a61b-cfb9658fc9de",
|
||
"metadata": {
|
||
"tags": []
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"ON THE ELECTRODYNAMICS OF MOVING\n",
|
||
"BODIES\n",
|
||
"By A. EINSTEIN\n",
|
||
"June 30, 1905\n",
|
||
"It is known that Maxwell’s electrodynamics—as usually understood at the\n",
|
||
"present time—when applied to moving bodies, leads to asymmetries which do\n",
|
||
"not appear to be inherent in the phenomena. Take, for example, the recipro-\n",
|
||
"cal electrodynamic action of a magnet and a conductor. The observable phe-\n",
|
||
"nomenon here depends only on the r\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"print(docs[0].page_content[:400]) # all pages of the pdf content"
|
||
]
|
||
}
|
||
],
|
||
"metadata": {
|
||
"kernelspec": {
|
||
"display_name": "Python 3 (ipykernel)",
|
||
"language": "python",
|
||
"name": "python3"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.11.3"
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 5
|
||
}
|