Change name from `mediawiki-scraper` to `mediawiki-dump-generator` (#181)

Fixes
https://github.com/mediawiki-client-tools/mediawiki-scraper/issues/65.

Addresses @yzqzss'
[comment](https://github.com/orgs/mediawiki-client-tools/discussions/61#discussioncomment-6831973):

> * `scraper` is an evil name. (for webmasters)

Uses similar naming to
[`mediawiki-dump`](https://github.com/macbre/mediawiki-dump), from one
of the past contributors to `wikitools`. (I'm not 100% sure, but this
might be a more modern replacement for `wikitools`... either way,
potentially someone to be friendly with!)

I already created [a placeholder on
PyPI](https://pypi.org/project/mediawiki-dump-generator/), and it seems
like we're like 99% of the way there to being able to publish there.

I can change the name of this repository to match the new name right
when I merge this.

Signed-off-by: Elsie Hupp <github@elsiehupp.com>
pull/475/head
Elsie Hupp 9 months ago committed by GitHub
parent b48a13852a
commit 490af55dab
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -7,7 +7,7 @@ assignees: ''
---
<!-- Thank you for helping to improve MediaWiki Scraper! -->
<!-- Thank you for helping to improve MediaWiki Dump Generator! -->
<!-- So that we can better address your issue,
please fill out as much of the following as possible. -->

@ -1,6 +1,6 @@
blank_issues_enabled: false
contact_links:
- name: Get help using MediaWiki Scraper
- name: Get help using MediaWiki Dump Generator
url: https://github.com/orgs/mediawiki-client-tools/discussions/categories/q-a
about: If you need help (other than reporting a bug), you can reach out on our Discussions Q&A.
- name: Anything else

@ -6,7 +6,7 @@ This document is an ongoing process for establishing and refining a set of best
## Reporting Issues
If you find anything amiss, you can report it using [GitHub Issues](https://github.com/mediawiki-client-tools/mediawiki-scraper/issues). The template is there to help you communicate clearly. It's okay if you change it to meet your needs, though, as it is merely a suggested baseline.
If you find anything amiss, you can report it using [GitHub Issues](https://github.com/mediawiki-client-tools/mediawiki-dump-generator/issues). The template is there to help you communicate clearly. It's okay if you change it to meet your needs, though, as it is merely a suggested baseline.
For anything that doesn't fit, you can open a less formal conversation in [GitHub Discussions](https://github.com/orgs/mediawiki-client-tools/discussions) and feel free to tag any of the members of our GitHub organization.
@ -32,13 +32,13 @@ In addition to the tools listed in the basic installation instructions in the ma
### 1. Fork the repository if you don't have write access
You can do so [here](https://github.com/mediawiki-client-tools/mediawiki-scraper/fork).
You can do so [here](https://github.com/mediawiki-client-tools/mediawiki-dump-generator/fork).
### 2. Clone the repository (or your fork) if you'd like to work on it locally (such as in VS Code)
This is particularly important if you are contributing executible code, so that you can use "code intelligence" and test your work. You can clone the repository using the big green **Code** button on the homepage of the repository (or your fork).
Alternately, you can [create a codespace](https://github.com/mediawiki-client-tools/mediawiki-scraper/codespaces) (also from the big green **Code** button), though we have yet to set up a consistent development container.
Alternately, you can [create a codespace](https://github.com/mediawiki-client-tools/mediawiki-dump-generator/codespaces) (also from the big green **Code** button), though we have yet to set up a consistent development container.
### 3. Create a new branch for the changes you'd like to make

@ -1,14 +1,14 @@
# `MediaWiki Scraper`
# `MediaWiki Dump Generator`
**MediaWiki Scraper can archive wikis from the largest to the tiniest.**
**MediaWiki Dump Generator can archive wikis from the largest to the tiniest.**
`MediaWiki Scraper` is an ongoing project to port the legacy [`wikiteam`](https://github.com/WikiTeam/wikiteam) toolset to Python 3 and PyPI to make it more accessible for today's archivers.
`MediaWiki Dump Generator` is an ongoing project to port the legacy [`wikiteam`](https://github.com/WikiTeam/wikiteam) toolset to Python 3 and PyPI to make it more accessible for today's archivers.
Most of the focus has been on the core `dumpgenerator` tool, but Python 3 versions of the other `wikiteam` tools may be added over time.
## MediaWiki Scraper Toolset
## MediaWiki Dump Generator Toolset
MediaWiki Scraper is a set of tools for archiving wikis. The main general-purpose module of MediaWiki Scraper is dumpgenerator, which can download XML dumps of MediaWiki sites that can then be parsed or redeployed elsewhere.
MediaWiki Dump Generator is a set of tools for archiving wikis. The main general-purpose module of MediaWiki Dump Generator is dumpgenerator, which can download XML dumps of MediaWiki sites that can then be parsed or redeployed elsewhere.
### Viewing MediaWiki XML Dumps
@ -17,18 +17,18 @@ MediaWiki Scraper is a set of tools for archiving wikis. The main general-purpos
## Python Environment
`MediaWiki Scraper` requires [Python 3.8](https://www.python.org/downloads/release/python-380/) or later (less than 4.0), but you may be able to get it run with earlier versions of Python 3. On recent versions of Linux and macOS Python 3.8 should come preinstalled, but on Windows you will need to install it from [python.org](https://www.python.org/downloads/release/python-380/).
`MediaWiki Dump Generator` requires [Python 3.8](https://www.python.org/downloads/release/python-380/) or later (less than 4.0), but you may be able to get it run with earlier versions of Python 3. On recent versions of Linux and macOS Python 3.8 should come preinstalled, but on Windows you will need to install it from [python.org](https://www.python.org/downloads/release/python-380/).
`MediaWiki Scraper` has been tested on Linux, macOS, Windows and Android. If you are connecting to Linux or macOS via `ssh`, you can continue using the `bash` or `zsh` command prompt in the same terminal, but if you are starting in a desktop environment and don't already have a preferred Terminal environment you can try one of the following.
`MediaWiki Dump Generator` has been tested on Linux, macOS, Windows and Android. If you are connecting to Linux or macOS via `ssh`, you can continue using the `bash` or `zsh` command prompt in the same terminal, but if you are starting in a desktop environment and don't already have a preferred Terminal environment you can try one of the following.
> **NOTE:** You may need to update and pre-install dependencies in order for `MediaWiki Scraper` to work properly. Shell commands for these dependencies appear below each item in the list. (Also note that while installing and running `MediaWiki Scraper` itself should not require administrative priviliges, installing dependencies usually will.)
> **NOTE:** You may need to update and pre-install dependencies in order for `MediaWiki Dump Generator` to work properly. Shell commands for these dependencies appear below each item in the list. (Also note that while installing and running `MediaWiki Dump Generator` itself should not require administrative priviliges, installing dependencies usually will.)
* On desktop Linux you can use the default terminal application such as [Konsole](https://konsole.kde.org/) or [GNOME Terminal](https://help.gnome.org/users/gnome-terminal/stable/).
<details>
<summary>Linux Dependencies</summary>
While most Linux distributions will have Python 3 preinstalled, if you are cloning `MediaWiki Scraper` rather than downloading it directly you may need to install `git`.
While most Linux distributions will have Python 3 preinstalled, if you are cloning `MediaWiki Dump Generator` rather than downloading it directly you may need to install `git`.
On Debian, Ubuntu, and the like:
@ -45,7 +45,7 @@ MediaWiki Scraper is a set of tools for archiving wikis. The main general-purpos
<details>
<summary>macOS Dependencies</summary>
While macOS will have Python 3 preinstalled, if you are cloning `MediaWiki Scraper` rather than downloading it directly and you are using an older versions of macOS, you may need to install `git`.
While macOS will have Python 3 preinstalled, if you are cloning `MediaWiki Dump Generator` rather than downloading it directly and you are using an older versions of macOS, you may need to install `git`.
If `git` is not preinstalled, however, macOS will prompt you to install it the first time you run the command. Therefore, to check whether you have `git` installed or to install `git`, simply run `git` (with no arguments) in Terminal:
@ -68,9 +68,9 @@ MediaWiki Scraper is a set of tools for archiving wikis. The main general-purpos
> When installing [Python 3.8](https://www.python.org/downloads/release/python-380/) (from python.org), be sure to check "Add Python to PATH" so that installed Python scripts are accessible from any location. If for some reason installed Python scripts, e.g. `pip`, are not available from any location, you can add Python to the `PATH` environment variable using the instructions [here](https://datatofish.com/add-python-to-windows-path/).
>
> And while doing so should not be necessary if you follow the instructions further down and install `MediaWiki Scraper` using `pip`, if you'd prefer that Windows store installed Python scripts somewhere other than the default Python folder under `%appdata%`, you can also add your preferred alternative path such as `C:\Program Files\Python3\Scripts\` or a subfolder of `My Documents`. (You will need to restart any terminal sessions in order for this to take effect.)
> And while doing so should not be necessary if you follow the instructions further down and install `MediaWiki Dump Generator` using `pip`, if you'd prefer that Windows store installed Python scripts somewhere other than the default Python folder under `%appdata%`, you can also add your preferred alternative path such as `C:\Program Files\Python3\Scripts\` or a subfolder of `My Documents`. (You will need to restart any terminal sessions in order for this to take effect.)
Whenever you'd like to run a Bash session, you can open a Bash terminal prompt from any folder in Windows Explorer by right-clicking and choosing the option from the context menu. (For some purposes you may wish to run Bash as an administrator.) This way you can open a Bash prompt and clone the `MediaWiki Scraper` repository in one location, and subsequently or later open another Bash prompt and run `MediaWiki Scraper` to dump a wiki wherever else you'd like without having to browse to the directory manually using Bash.
Whenever you'd like to run a Bash session, you can open a Bash terminal prompt from any folder in Windows Explorer by right-clicking and choosing the option from the context menu. (For some purposes you may wish to run Bash as an administrator.) This way you can open a Bash prompt and clone the `MediaWiki Dump Generator` repository in one location, and subsequently or later open another Bash prompt and run `MediaWiki Dump Generator` to dump a wiki wherever else you'd like without having to browse to the directory manually using Bash.
</details>
@ -102,22 +102,22 @@ MediaWiki Scraper is a set of tools for archiving wikis. The main general-purpos
The Python 3 port of the `dumpgenerator` module of `wikiteam3` is largely functional and can be installed from a downloaded or cloned copy of this repository.
> If you run into a problem with the version that mostly works, you can [open an Issue](https://github.com/mediawiki-client-tools/mediawiki-scraper/issues/new/choose). Be sure to include the following:
> If you run into a problem with the version that mostly works, you can [open an Issue](https://github.com/mediawiki-client-tools/mediawiki-dump-generator/issues/new/choose). Be sure to include the following:
>
> 1. The operating system you're using
> 2. What command you ran that didn't work
> 3. What output was printed to your terminal
### 1. Downloading and installing `MediaWiki Scraper`
### 1. Downloading and installing `MediaWiki Dump Generator`
In whatever folder you use for cloned repositories:
```bash
git clone https://github.com/mediawiki-client-tools/mediawiki-scraper
git clone https://github.com/mediawiki-client-tools/mediawiki-dump-generator
```
```bash
cd mediawiki-scraper
cd mediawiki-dump-generator
```
```bash
@ -158,12 +158,12 @@ pip uninstall wikiteam3
```
```bash
rm -fr [cloned_MediaWiki Scraper_folder]
rm -fr [cloned_mediawiki_scraper_folder]
```
### 4. Updating MediaWiki Scraper
### 4. Updating MediaWiki Dump Generator
> **Note:** Re-run the following steps each time to reinstall each time the MediaWiki Scraper branch is updated.
> **Note:** Re-run the following steps each time to reinstall each time the MediaWiki Dump Generator branch is updated.
```bash
git pull
@ -194,9 +194,9 @@ pip install --force-reinstall (Get-ChildItem .\dist\*.whl).FullName
</details>
### 5. Manually build and install `MediaWiki Scraper`
### 5. Manually build and install `MediaWiki Dump Generator`
If you'd like to manually build and install `MediaWiki Scraper` from a cloned or downloaded copy of this repository, run the following commands from the downloaded base directory:
If you'd like to manually build and install `MediaWiki Dump Generator` from a cloned or downloaded copy of this repository, run the following commands from the downloaded base directory:
```bash
curl -sSL https://install.python-poetry.org | python3 -
@ -243,7 +243,7 @@ git checkout --track origin/python3
## Using `dumpgenerator` (once installed)
After installing `MediaWiki Scraper` using `pip` you should be able to use the `dumpgenerator` command from any local directory.
After installing `MediaWiki Dump Generator` using `pip` you should be able to use the `dumpgenerator` command from any local directory.
For basic usage, you can run `dumpgenerator` in the directory where you'd like the download to be.
@ -385,5 +385,5 @@ You can contact Elsie Hupp directly via email at [mediawiki-client-tools@elsiehu
**WikiTeam** is the [Archive Team](http://www.archiveteam.org) [[GitHub](https://github.com/ArchiveTeam)] subcommittee on wikis.
It was founded and originally developed by [Emilio J. Rodríguez-Posada](https://github.com/emijrp), a Wikipedia veteran editor and amateur archivist. Thanks to people who have helped, especially to: [Federico Leva](https://github.com/nemobis), [Alex Buie](https://github.com/ab2525), [Scott Boyd](http://www.sdboyd56.com), [Hydriz](https://github.com/Hydriz), Platonides, Ian McEwen, [Mike Dupont](https://github.com/h4ck3rm1k3), [balr0g](https://github.com/balr0g) and [PiRSquared17](https://github.com/PiRSquared17).
**MediaWiki Scraper**
**MediaWiki Dump Generator**
The Python 3 initiative is currently being led by [Elsie Hupp](https://github.com/elsiehupp), with contributions from [Victor Gambier](https://github.com/vgambier), [Thomas Karcher](https://github.com/t-karcher), [Janet Cobb](https://github.com/randomnetcat), [yzqzss](https://github.com/yzqzss), [NyaMisty](https://github.com/NyaMisty) and [Rob Kam](https://github.com/robkam)

@ -53,7 +53,7 @@ def bye():
print("---> Congratulations! Your dump is complete <---")
print("")
print("If you encountered a bug, you can report it on GitHub Issues:")
print(" https://github.com/mediawiki-client-tools/mediawiki-scraper/issues")
print(" https://github.com/mediawiki-client-tools/mediawiki-dump-generator/issues")
print("")
print("If you need any other help, you can reach out on GitHub Discussions:")
print(" https://github.com/orgs/mediawiki-client-tools/discussions")

@ -425,7 +425,7 @@ class Image:
)
if "%u" in filename:
raise NotImplementedError(
f"Filename {filename} contains unicode. Please file an issue with MediaWiki Scraper."
f"Filename {filename} contains unicode. Please file an issue with MediaWiki Dump Generator."
)
uploader = re.sub("_", " ", image.get("user", "Unknown"))
size = image.get("size", "False")

@ -266,15 +266,15 @@ def upload(wikis, logfile, config={}, uploadeddumps=[]):
# retrieve some info from the wiki
wikititle = "Wiki - %s" % (sitename) # Wiki - ECGpedia
wikidesc = (
'<a href="%s">%s</a> dumped with <a href="https://github.com/mediawiki-client-tools/mediawiki-scraper/" rel="nofollow">MediaWiki-Scraper</a> (aka WikiTeam3) tools.'
'<a href="%s">%s</a> dumped with <a href="https://github.com/mediawiki-client-tools/mediawiki-dump-generator/" rel="nofollow">MediaWiki Dump Generator</a> (aka WikiTeam3) tools.'
% (baseurl, sitename)
) # "<a href=\"http://en.ecgpedia.org/\" rel=\"nofollow\">ECGpedia,</a>: a free electrocardiography (ECG) tutorial and textbook to which anyone can contribute, designed for medical professionals such as cardiac care nurses and physicians. Dumped with <a href=\"https://github.com/WikiTeam/wikiteam\" rel=\"nofollow\">WikiTeam</a> tools."
wikikeys = [
"wiki",
"wikiteam",
"wikiteam3",
"mediawiki-scraper",
"mediawikiScraper",
"mediawiki-dump-generator",
"MediaWikiDumpGenerator",
"MediaWiki",
sitename,
wikiname,

Loading…
Cancel
Save