You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
wikiteam/listsofwikis/mediawiki/readme-checkalive.txt

52 lines
2.6 KiB
Plaintext

Description
-----------
checkalive.pl is a Perl script that will go thru a list of URLs & determine if they are
online & if they are Mediawiki wikis. It should work with: "/index.php/Main_Page",
"index.php", "api.php" and even pages such as: "/wiki/Pagina_principale". If the URl is
not "api.php", it will look for it, check it, and output it if found to be a valid api.php.
If not found, it will output the URL with "index.php" if that's available.
As of 01/23/2014, I have started using version numbers.
Required programs and modules
-----------------------------
checkalive.pl has been developed in Linux, and of course requires Perl 5.x to
be on your system. You will also need to have the following Perl modules installed:
LWP::Simple
LWP::UserAgent
Crypt::SSLeay
Mojo::URL
The first two are contained in LWP - The World-Wide Web library for Perl
(aka: libwww-perl-6.x), available at CPAN, (http://www.cpan.org)or through your Linux
distro's package manager.
Crypt::SSLeay (OpenSSL support for LWP) is also available at CPAN. This module
is needed to properly handle any URLs beginning with "https".
Mojo::URL is available at CPAN as well. It's needed to extract the domain name from a URL.
Configuration
-------------
There are several variables you can change, or you can just use them as-is:
-- "$slp" is the number of seconds to sleep between requests (currently set to 2 seconds).
-- "$urllist" is for the name of the file that contains the list of URLs to check
(currently set to 'URL-list.txt'). If you don't want to change this variable, make
sure your list is named 'URL-list.txt'.
-- "$alivelist" is the file that will contain the list of URLs that are both online AND
powered by MediaWiki.
-- "$deadlist" is the file that will contain the list of URLs that don't meet the above
criteria. URLs that are online and NOT powered by MediaWiki are also in this file,
and will be noted as such.
Any other variable that you want to change - you do so at your own risk.
Starting the script
-------------------
If you want to use the default configuration noted above, at a command prompt, simply
type: "perl checkalive.pl" (without the quotes). You must be in the same directory (or
folder) as the script and the URL list that you want to check.
Issues
------
The script does NOT have a "resume" feature at this time. If you are running through a
list of 1000's of URLs, and the script crashes, or you kill it, your lists of alive and
dead URLs will NOT BE SAVED TO DISK. I suggest breaking up your list into smaller lists
of a few hundred URLs in each list until I can implement a resume feature.