forked from Archives/langchain
8259f9b7fa
# Creates GitHubLoader (#5257) GitHubLoader is a DocumentLoader that loads issues and PRs from GitHub. Fixes #5257 --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
262 lines
7.1 KiB
Plaintext
262 lines
7.1 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# GitHub\n",
|
|
"\n",
|
|
"This notebooks shows how you can load issues and pull requests (PRs) for a given repository on [GitHub](https://github.com/). We will use the LangChain Python repository as an example."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Setup access token"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"To access the GitHub API, you need a personal access token - you can set up yours here: https://github.com/settings/tokens?type=beta. You can either set this token as the environment variable ``GITHUB_PERSONAL_ACCESS_TOKEN`` and it will be automatically pulled in, or you can pass it in directly at initializaiton as the ``access_token`` named parameter."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# If you haven't set your access token as an environment variable, pass it in here.\n",
|
|
"from getpass import getpass\n",
|
|
"\n",
|
|
"ACCESS_TOKEN = getpass()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Load Issues and PRs"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 10,
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain.document_loaders import GitHubIssuesLoader"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 11,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"loader = GitHubIssuesLoader(\n",
|
|
" repo=\"hwchase17/langchain\",\n",
|
|
" access_token=ACCESS_TOKEN, # delete/comment out this argument if you've set the access token as an env var.\n",
|
|
" creator=\"UmerHA\",\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Let's load all issues and PRs created by \"UmerHA\".\n",
|
|
"\n",
|
|
"Here's a list of all filters you can use:\n",
|
|
"- include_prs\n",
|
|
"- milestone\n",
|
|
"- state\n",
|
|
"- assignee\n",
|
|
"- creator\n",
|
|
"- mentioned\n",
|
|
"- labels\n",
|
|
"- sort\n",
|
|
"- direction\n",
|
|
"- since\n",
|
|
"\n",
|
|
"For more info, see https://docs.github.com/en/rest/issues/issues?apiVersion=2022-11-28#list-repository-issues."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 12,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"docs = loader.load()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 13,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"# Creates GitHubLoader (#5257)\r\n",
|
|
"\r\n",
|
|
"GitHubLoader is a DocumentLoader that loads issues and PRs from GitHub.\r\n",
|
|
"\r\n",
|
|
"Fixes #5257\r\n",
|
|
"\r\n",
|
|
"Community members can review the PR once tests pass. Tag maintainers/contributors who might be interested:\r\n",
|
|
"DataLoaders\r\n",
|
|
"- @eyurtsev\r\n",
|
|
"\n",
|
|
"{'url': 'https://github.com/hwchase17/langchain/pull/5408', 'title': 'DocumentLoader for GitHub', 'creator': 'UmerHA', 'created_at': '2023-05-29T14:50:53Z', 'comments': 0, 'state': 'open', 'labels': ['enhancement', 'lgtm', 'doc loader'], 'assignee': None, 'milestone': None, 'locked': False, 'number': 5408, 'is_pull_request': True}\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"print(docs[0].page_content)\n",
|
|
"print(docs[0].metadata)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Only load issues"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"By default, the GitHub API returns considers pull requests to also be issues. To only get 'pure' issues (i.e., no pull requests), use `include_prs=False`"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 14,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"loader = GitHubIssuesLoader(\n",
|
|
" repo=\"hwchase17/langchain\",\n",
|
|
" access_token=ACCESS_TOKEN, # delete/comment out this argument if you've set the access token as an env var.\n",
|
|
" creator=\"UmerHA\",\n",
|
|
" include_prs=False,\n",
|
|
")\n",
|
|
"docs = loader.load()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 15,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"### System Info\n",
|
|
"\n",
|
|
"LangChain version = 0.0.167\r\n",
|
|
"Python version = 3.11.0\r\n",
|
|
"System = Windows 11 (using Jupyter)\n",
|
|
"\n",
|
|
"### Who can help?\n",
|
|
"\n",
|
|
"- @hwchase17\r\n",
|
|
"- @agola11\r\n",
|
|
"- @UmerHA (I have a fix ready, will submit a PR)\n",
|
|
"\n",
|
|
"### Information\n",
|
|
"\n",
|
|
"- [ ] The official example notebooks/scripts\n",
|
|
"- [X] My own modified scripts\n",
|
|
"\n",
|
|
"### Related Components\n",
|
|
"\n",
|
|
"- [X] LLMs/Chat Models\n",
|
|
"- [ ] Embedding Models\n",
|
|
"- [X] Prompts / Prompt Templates / Prompt Selectors\n",
|
|
"- [ ] Output Parsers\n",
|
|
"- [ ] Document Loaders\n",
|
|
"- [ ] Vector Stores / Retrievers\n",
|
|
"- [ ] Memory\n",
|
|
"- [ ] Agents / Agent Executors\n",
|
|
"- [ ] Tools / Toolkits\n",
|
|
"- [ ] Chains\n",
|
|
"- [ ] Callbacks/Tracing\n",
|
|
"- [ ] Async\n",
|
|
"\n",
|
|
"### Reproduction\n",
|
|
"\n",
|
|
"```\r\n",
|
|
"import os\r\n",
|
|
"os.environ[\"OPENAI_API_KEY\"] = \"...\"\r\n",
|
|
"\r\n",
|
|
"from langchain.chains import LLMChain\r\n",
|
|
"from langchain.chat_models import ChatOpenAI\r\n",
|
|
"from langchain.prompts import PromptTemplate\r\n",
|
|
"from langchain.prompts.chat import ChatPromptTemplate\r\n",
|
|
"from langchain.schema import messages_from_dict\r\n",
|
|
"\r\n",
|
|
"role_strings = [\r\n",
|
|
" (\"system\", \"you are a bird expert\"), \r\n",
|
|
" (\"human\", \"which bird has a point beak?\")\r\n",
|
|
"]\r\n",
|
|
"prompt = ChatPromptTemplate.from_role_strings(role_strings)\r\n",
|
|
"chain = LLMChain(llm=ChatOpenAI(), prompt=prompt)\r\n",
|
|
"chain.run({})\r\n",
|
|
"```\n",
|
|
"\n",
|
|
"### Expected behavior\n",
|
|
"\n",
|
|
"Chain should run\n",
|
|
"{'url': 'https://github.com/hwchase17/langchain/issues/5027', 'title': \"ChatOpenAI models don't work with prompts created via ChatPromptTemplate.from_role_strings\", 'creator': 'UmerHA', 'created_at': '2023-05-20T10:39:18Z', 'comments': 1, 'state': 'open', 'labels': [], 'assignee': None, 'milestone': None, 'locked': False, 'number': 5027, 'is_pull_request': False}\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"print(docs[0].page_content)\n",
|
|
"print(docs[0].metadata)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": []
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.11.3"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 4
|
|
}
|