langchain/docs/extras/ecosystem/integrations/arxiv.mdx
Davis Chase 87e502c6bc
Doc refactor (#6300)
Co-authored-by: jacoblee93 <jacoblee93@gmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-06-16 11:52:56 -07:00

37 lines
908 B
Plaintext

# Arxiv
>[arXiv](https://arxiv.org/) is an open-access archive for 2 million scholarly articles in the fields of physics,
> mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering and
> systems science, and economics.
## Installation and Setup
First, you need to install `arxiv` python package.
```bash
pip install arxiv
```
Second, you need to install `PyMuPDF` python package which transforms PDF files downloaded from the `arxiv.org` site into the text format.
```bash
pip install pymupdf
```
## Document Loader
See a [usage example](/docs/modules/data_connection/document_loaders/integrations/arxiv.html).
```python
from langchain.document_loaders import ArxivLoader
```
## Retriever
See a [usage example](/docs/modules/data_connection/retrievers/integrations/arxiv.html).
```python
from langchain.retrievers import ArxivRetriever
```