langchain/docs/extras/integrations/document_loaders/git.ipynb
seamusp 25f2c82ae8
docs:misc fixes (#9671)
Improve internal consistency in LangChain documentation
- Change occurrences of eg and eg. to e.g.
- Fix headers containing unnecessary capital letters.
- Change instances of "few shot" to "few-shot".
- Add periods to end of sentences where missing.
- Minor spelling and grammar fixes.
2023-08-23 22:36:54 -07:00

213 lines
4.1 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Git\n",
"\n",
">[Git](https://en.wikipedia.org/wiki/Git) is a distributed version control system that tracks changes in any set of computer files, usually used for coordinating work among programmers collaboratively developing source code during software development.\n",
"\n",
"This notebook shows how to load text files from `Git` repository."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load existing repository from disk"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"!pip install GitPython"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from git import Repo\n",
"\n",
"repo = Repo.clone_from(\n",
" \"https://github.com/hwchase17/langchain\", to_path=\"./example_data/test_repo1\"\n",
")\n",
"branch = repo.head.reference"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.document_loaders import GitLoader"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"loader = GitLoader(repo_path=\"./example_data/test_repo1/\", branch=branch)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"data = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"len(data)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"page_content='.venv\\n.github\\n.git\\n.mypy_cache\\n.pytest_cache\\nDockerfile' metadata={'file_path': '.dockerignore', 'file_name': '.dockerignore', 'file_type': ''}\n"
]
}
],
"source": [
"print(data[0])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Clone repository from url"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import GitLoader"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"loader = GitLoader(\n",
" clone_url=\"https://github.com/hwchase17/langchain\",\n",
" repo_path=\"./example_data/test_repo2/\",\n",
" branch=\"master\",\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"data = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1074"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(data)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Filtering files to load"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import GitLoader\n",
"\n",
"# e.g. loading only python files\n",
"loader = GitLoader(\n",
" repo_path=\"./example_data/test_repo1/\",\n",
" file_filter=lambda file_path: file_path.endswith(\".py\"),\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}