Implements support for Personal Access Token Authentication in the ConfluenceLoader (#5385)

# Implements support for Personal Access Token Authentication in the
ConfluenceLoader

Fixes #5191

Implements a new optional parameter for the ConfluenceLoader: `token`.
This allows the use of personal access authentication when using the
on-prem server version of Confluence.

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:
@eyurtsev @Jflick58 

Twitter Handle: felipe_yyc

---------

Co-authored-by: Felipe <feferreira@ea.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
searx_updates
Felipe Ferreira 12 months ago committed by GitHub
parent b81f98b8a6
commit ae2cf1f598
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -1,15 +1,18 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"source": [
"# Confluence\n",
"\n",
">[Confluence](https://www.atlassian.com/software/confluence) is a wiki collaboration platform that saves and organizes all of the project-related material. `Confluence` is a knowledge base that primarily handles content management activities. \n",
"\n",
"A loader for `Confluence` pages currently supports both `username/api_key` and `Oauth2 login`.\n",
"See [instructions](https://support.atlassian.com/atlassian-account/docs/manage-api-tokens-for-your-atlassian-account/).\n",
"A loader for `Confluence` pages.\n",
"\n",
"\n",
"This currently supports `username/api_key`, `Oauth2 login`. Additionally, on-prem installations also support `token` authentication. \n",
"\n",
"\n",
"Specify a list `page_id`-s and/or `space_key` to load in the corresponding pages into Document objects, if both are specified the union of both sets will be returned.\n",
@ -20,9 +23,17 @@
"Hint: `space_key` and `page_id` can both be found in the URL of a page in Confluence - https://yoursite.atlassian.com/wiki/spaces/<space_key>/pages/<page_id>\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Before using ConfluenceLoader make sure you have the latest version of the atlassian-python-api package installed:"
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 1,
"metadata": {
"tags": []
},
@ -31,6 +42,29 @@
"#!pip install atlassian-python-api"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Examples"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Username and Password or Username and API Token (Atlassian Cloud only)\n",
"\n",
"This example authenticates using either a username and password or, if you're connecting to an Atlassian Cloud hosted version of Confluence, a username and an API Token.\n",
"You can generate an API token at: https://id.atlassian.com/manage-profile/security/api-tokens.\n",
"\n",
"The `limit` parameter specifies how many documents will be retrieved in a single call, not how many documents will be retrieved in total.\n",
"By default the code will return up to 1000 documents in 50 documents batches. To control the total number of documents use the `max_pages` parameter. \n",
"Plese note the maximum value for the `limit` parameter in the atlassian-python-api package is currently 100. "
]
},
{
"cell_type": "code",
"execution_count": null,
@ -46,6 +80,34 @@
")\n",
"documents = loader.load(space_key=\"SPACE\", include_attachments=True, limit=50)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Personal Access Token (Server/On-Prem only)\n",
"\n",
"This method is valid for the Data Center/Server on-prem edition only.\n",
"For more information on how to generate a Personal Access Token (PAT) check the official Confluence documentation at: https://confluence.atlassian.com/enterprise/using-personal-access-tokens-1026032365.html.\n",
"When using a PAT you provide only the token value, you cannot provide a username. \n",
"Please note that ConfluenceLoader will run under the permissions of the user that generated the PAT and will only be able to load documents for which said user has access to. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import ConfluenceLoader\n",
"\n",
"loader = ConfluenceLoader(\n",
" url=\"https://yoursite.atlassian.com/wiki\",\n",
" token=\"12345\"\n",
")\n",
"documents = loader.load(space_key=\"SPACE\", include_attachments=True, limit=50, max_pages=50)"
]
}
],
"metadata": {
@ -64,7 +126,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
"version": "3.9.13"
},
"vscode": {
"interpreter": {

@ -19,7 +19,8 @@ logger = logging.getLogger(__name__)
class ConfluenceLoader(BaseLoader):
"""
Load Confluence pages. Port of https://llamahub.ai/l/confluence
This currently supports both username/api_key and Oauth2 login.
This currently supports username/api_key, Oauth2 login or personal access token
authentication.
Specify a list page_ids and/or space_key to load in the corresponding pages into
Document objects, if both are specified the union of both sets will be returned.
@ -53,6 +54,8 @@ class ConfluenceLoader(BaseLoader):
:type username: str, optional
:param oauth2: _description_, defaults to {}
:type oauth2: dict, optional
:param token: _description_, defaults to None
:type token: str, optional
:param cloud: _description_, defaults to True
:type cloud: bool, optional
:param number_of_retries: How many times to retry, defaults to 3
@ -73,6 +76,7 @@ class ConfluenceLoader(BaseLoader):
api_key: Optional[str] = None,
username: Optional[str] = None,
oauth2: Optional[dict] = None,
token: Optional[str] = None,
cloud: Optional[bool] = True,
number_of_retries: Optional[int] = 3,
min_retry_seconds: Optional[int] = 2,
@ -80,7 +84,9 @@ class ConfluenceLoader(BaseLoader):
confluence_kwargs: Optional[dict] = None,
):
confluence_kwargs = confluence_kwargs or {}
errors = ConfluenceLoader.validate_init_args(url, api_key, username, oauth2)
errors = ConfluenceLoader.validate_init_args(
url, api_key, username, oauth2, token
)
if errors:
raise ValueError(f"Error(s) while validating input: {errors}")
@ -101,6 +107,10 @@ class ConfluenceLoader(BaseLoader):
self.confluence = Confluence(
url=url, oauth2=oauth2, cloud=cloud, **confluence_kwargs
)
elif token:
self.confluence = Confluence(
url=url, token=token, cloud=cloud, **confluence_kwargs
)
else:
self.confluence = Confluence(
url=url,
@ -116,6 +126,7 @@ class ConfluenceLoader(BaseLoader):
api_key: Optional[str] = None,
username: Optional[str] = None,
oauth2: Optional[dict] = None,
token: Optional[str] = None,
) -> Union[List, None]:
"""Validates proper combinations of init arguments"""
@ -147,6 +158,12 @@ class ConfluenceLoader(BaseLoader):
"`['access_token', 'access_token_secret', 'consumer_key', 'key_cert']`"
)
if token and (api_key or username or oauth2):
errors.append(
"Cannot provide a value for `token` and a value for `api_key`, "
"`username` or `oauth2`"
)
if errors:
return errors
return None

Loading…
Cancel
Save