2023-05-29 14:25:17 +00:00
|
|
|
# Arxiv
|
|
|
|
|
|
|
|
>[arXiv](https://arxiv.org/) is an open-access archive for 2 million scholarly articles in the fields of physics,
|
|
|
|
> mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering and
|
|
|
|
> systems science, and economics.
|
|
|
|
|
|
|
|
|
|
|
|
## Installation and Setup
|
|
|
|
|
|
|
|
First, you need to install `arxiv` python package.
|
|
|
|
|
|
|
|
```bash
|
|
|
|
pip install arxiv
|
|
|
|
```
|
|
|
|
|
|
|
|
Second, you need to install `PyMuPDF` python package which transforms PDF files downloaded from the `arxiv.org` site into the text format.
|
|
|
|
|
|
|
|
```bash
|
|
|
|
pip install pymupdf
|
|
|
|
```
|
|
|
|
|
|
|
|
## Document Loader
|
|
|
|
|
|
|
|
See a [usage example](../modules/indexes/document_loaders/examples/arxiv.ipynb).
|
|
|
|
|
|
|
|
```python
|
|
|
|
from langchain.document_loaders import ArxivLoader
|
|
|
|
```
|
2023-06-03 22:29:03 +00:00
|
|
|
|
|
|
|
## Retriever
|
|
|
|
|
|
|
|
See a [usage example](../modules/indexes/retrievers/examples/arxiv.ipynb).
|
|
|
|
|
|
|
|
```python
|
|
|
|
from langchain.retrievers import ArxivRetriever
|
|
|
|
```
|