Update README.md (#120)

1 year ago · 71aae07de6
parent 70f52c9731
commit 71aae07de6
1 changed files with 28 additions and 39 deletions
--- a/README.md
+++ b/README.md
@ -115,13 +115,18 @@ There are two versions of these instructions:
 In whatever folder you use for cloned repositories:

 ```bash
-git clone https://github.com/elsiehupp/wikiteam3.git
+git clone https://github.com/mediawiki-client-tools/mediawiki-scraper
 ```

 ```bash
 cd mediawiki-scraper
 ```

+```bash
+poetry update && poetry install && poetry build
+```
+
+If you're switching branches:
 ```bash
 git checkout --track origin/python3
 ```
@ -143,7 +148,7 @@ pip uninstall wikiteam3
 ```

 ```bash
-rm -r [cloned_wikiteam3_folder]
+rm -fr [cloned_wikiteam3_folder]
 ```

 ### 4. Updating MediaWiki Scraper
@ -171,11 +176,7 @@ curl -sSL https://install.python-poetry.org | python3 -
 ```

 ```bash
-poetry install
-```
-
-```bash
-poetry build
+poetry update && poetry install && poetry build
 ```

 ```bash
@ -204,15 +205,12 @@ dumpgenerator --help

 Several examples follow.

-> **Note:** the `\` and line breaks in the examples below are for legibility in this documentation. `dumpgenerator` can also be run with the arguments in a single line and separated by a single space each.
+> **Note:** the `\` and line breaks in the examples below are for legibility in this documentation. Run `dumpgenerator` with the arguments in a single line and a single space between.

 ### Downloading a wiki with complete XML history and images

 ```bash
-dumpgenerator \
-    http://wiki.domain.org \
-    --xml \
-    --images
+dumpgenerator http://wiki.domain.org --xml --images
 ```

 ### Manually specifying `api.php` and/or `index.php`
@ -220,18 +218,12 @@ dumpgenerator \
 If the script can't find itself the `api.php` and/or `index.php` paths, then you can provide them:

 ```bash
-dumpgenerator \
-    --api http://wiki.domain.org/w/api.php \
-    --xml \
-    --images
+dumpgenerator --api http://wiki.domain.org/w/api.php --xml --images
 ```

 ```bash
-dumpgenerator \
-    --api http://wiki.domain.org/w/api.php \
-    --index http://wiki.domain.org/w/index.php \
-    --xml \
-    --images
+dumpgenerator --api http://wiki.domain.org/w/api.php --index http://wiki.domain.org/w/index.php \
+    --xml --images
 ```

 If you only want the XML histories, just use `--xml`. For only the images, just `--images`. For only the current version of every page, `--xml --curonly`.
@ -240,11 +232,7 @@ If you only want the XML histories, just use `--xml`. For only the images, just

 ```bash
 dumpgenerator \
-    --api http://wiki.domain.org/w/api.php \
-    --xml \
-    --images \
-    --resume \
-    --path=/path/to/incomplete-dump
+    --api http://wiki.domain.org/w/api.php --xml --images --resume --path /path/to/incomplete-dump
 ```

 In the above example, `--path` is only necessary if the download path is not the default.
@ -285,22 +273,23 @@ For the positional parameter `listfile`, `uploader` expects a path to a file tha

 `uploader` will search a configurable directory for files with the names generated by `launcher` and upload any that it finds to an Internet Archive item. The item will be created if it does not already exist.

-Named arguments:
-* `-pd` / `--prune_directories`: After uploading, remove the raw directory generated by `launcher`
-* `-pw` / `--prune_wikidump`: After uploading, remove the `wikidump.7z` file generated by `launcher`
-* `-c` / `--collection`: Assign the Internet Archive items to the specified collection
-* `-a` / `--admin`: Used only if you are an admin of the WikiTeam collection on the Internet Archive
-* `-wd` / `--wikidump_dir`: The directory to search for dumps. Defaults to `.`.
-* `-u` / `--update`: Update the metadata on an existing Internet Archive item
-* `-kf` / `--keysfile`: Path to a file containing Internet Archive API keys. Should contain two lines: the access key, then the secret key. Defaults to `./keys.txt`.
-* `-lf` / `--logfile`: Where to store a log of uploaded files (to reduce duplicate work). Defaults to `uploader-X.txt`, where `X` is the final part of the `listfile` path.
+Named arguments (short and long versions):
+* `-pd`, `--prune_directories`: After uploading, remove the raw directory generated by `launcher`
+* `-pw`, `--prune_wikidump`: After uploading, remove the `wikidump.7z` file generated by `launcher`
+* `-c`, `--collection`: Assign the Internet Archive items to the specified collection
+* `-a`, `--admin`: Used only if you are an admin of the WikiTeam collection on the Internet Archive
+* `-wd`, `--wikidump_dir`: The directory to search for dumps. Defaults to `.`.
+* `-u`, `--update`: Update the metadata on an existing Internet Archive item
+* `-kf`, `--keysfile`: Path to a file containing Internet Archive API keys. Should contain two lines: the access key, then the secret key. Defaults to `./keys.txt`.
+* `-lf`, `--logfile`: Where to store a log of uploaded files (to reduce duplicate work). Defaults to `uploader-X.txt`, where `X` is the final part of the `listfile` path.

 ## Checking dump integrity

 If you want to check the XML dump integrity, type this into your command line to count title, page and revision XML tags:

 ```bash
-grep -E '<title(.*?)>' *.xml -c;grep -E '<page(.*?)>' *.xml -c;grep "</page>" *.xml -c;grep -E '<revision(.*?)>' *.xml -c;grep "</revision>" *.xml -c
+grep -E '<title(.*?)>' *.xml -c;grep -E '<page(.*?)>' *.xml -c;grep \
+    "</page>" *.xml -c;grep -E '<revision(.*?)>' *.xml -c;grep "</revision>" *.xml -c
 ```
  
 You should see something similar to this (not the actual numbers) - the first three numbers should be the same and the last two should be the same as each other:
@ -315,10 +304,10 @@ You should see something similar to this (not the actual numbers) - the first th

 If your first three numbers or your last two numbers are different, then, your XML dump is corrupt (it contains one or more unfinished ```</page>``` or ```</revision>```). This is not common in small wikis, but large or very large wikis may fail at this due to truncated XML pages while exporting and merging. The solution is to remove the XML dump and re-download, a bit boring, and it can fail again...

-## WikiTeam Team
+## Contributors

 **WikiTeam** is the [Archive Team](http://www.archiveteam.org) [[GitHub](https://github.com/ArchiveTeam)] subcommittee on wikis.
-
 It was founded and originally developed by [Emilio J. Rodríguez-Posada](https://github.com/emijrp), a Wikipedia veteran editor and amateur archivist. Thanks to people who have helped, especially to: [Federico Leva](https://github.com/nemobis), [Alex Buie](https://github.com/ab2525), [Scott Boyd](http://www.sdboyd56.com), [Hydriz](https://github.com/Hydriz), Platonides, Ian McEwen, [Mike Dupont](https://github.com/h4ck3rm1k3), [balr0g](https://github.com/balr0g) and [PiRSquared17](https://github.com/PiRSquared17).

-The Python 3 initiative is currently being led by [Elsie Hupp](https://github.com/elsiehupp), with contributions from [Victor Gambier](https://github.com/vgambier), [Thomas Karcher](https://github.com/t-karcher), and [Janet Cobb](https://github.com/randomnetcat).
+**MediaWiki Scraper**
+The Python 3 initiative is currently being led by [Elsie Hupp](https://github.com/elsiehupp), with contributions from [Victor Gambier](https://github.com/vgambier), [Thomas Karcher](https://github.com/t-karcher), [Janet Cobb](https://github.com/randomnetcat), [yzqzss](https://github.com/yzqzss), [NyaMisty](https://github.com/NyaMisty) and [Rob Kam](https://github.com/robkam)