2014-01-26 22:02:16 +00:00
|
|
|
Description
|
|
|
|
-----------
|
2014-02-27 05:50:28 +00:00
|
|
|
checkalive.pl is a Perl script that will go thru a list of URLs & determine if they are
|
|
|
|
online & if they are Mediawiki wikis. It should work with: "/index.php/Main_Page",
|
|
|
|
"index.php", "api.php" and even pages such as: "/wiki/Pagina_principale". If the URl is
|
|
|
|
not "api.php", it will look for it, check it, and output it if found to be a valid api.php.
|
|
|
|
If not found, it will output the URL with "index.php" if that's available.
|
|
|
|
As of 01/23/2014, I have started using version numbers.
|
2014-01-26 22:02:16 +00:00
|
|
|
|
|
|
|
Required programs and modules
|
|
|
|
-----------------------------
|
2014-01-26 22:23:34 +00:00
|
|
|
checkalive.pl has been developed in Linux, and of course requires Perl 5.x to
|
|
|
|
be on your system. You will also need to have the following Perl modules installed:
|
2014-01-26 22:02:16 +00:00
|
|
|
LWP::Simple
|
|
|
|
LWP::UserAgent
|
|
|
|
Crypt::SSLeay
|
2014-02-27 05:50:28 +00:00
|
|
|
Mojo::URL
|
2014-01-26 22:23:34 +00:00
|
|
|
The first two are contained in LWP - The World-Wide Web library for Perl
|
2014-02-27 05:50:28 +00:00
|
|
|
(aka: libwww-perl-6.x), available at CPAN, (http://www.cpan.org)or through your Linux
|
|
|
|
distro's package manager.
|
2014-01-26 22:23:34 +00:00
|
|
|
Crypt::SSLeay (OpenSSL support for LWP) is also available at CPAN. This module
|
|
|
|
is needed to properly handle any URLs beginning with "https".
|
2014-02-27 05:50:28 +00:00
|
|
|
Mojo::URL is available at CPAN as well. It's needed to extract the domain name from a URL.
|
2014-01-26 22:02:16 +00:00
|
|
|
|
|
|
|
Configuration
|
|
|
|
-------------
|
|
|
|
There are several variables you can change, or you can just use them as-is:
|
|
|
|
-- "$slp" is the number of seconds to sleep between requests (currently set to 2 seconds).
|
2014-01-26 22:23:34 +00:00
|
|
|
-- "$urllist" is for the name of the file that contains the list of URLs to check
|
|
|
|
(currently set to 'URL-list.txt'). If you don't want to change this variable, make
|
|
|
|
sure your list is named 'URL-list.txt'.
|
|
|
|
-- "$alivelist" is the file that will contain the list of URLs that are both online AND
|
|
|
|
powered by MediaWiki.
|
|
|
|
-- "$deadlist" is the file that will contain the list of URLs that don't meet the above
|
|
|
|
criteria. URLs that are online and NOT powered by MediaWiki are also in this file,
|
|
|
|
and will be noted as such.
|
2014-01-26 22:02:16 +00:00
|
|
|
Any other variable that you want to change - you do so at your own risk.
|
|
|
|
|
2014-02-27 05:50:28 +00:00
|
|
|
Starting the script
|
|
|
|
-------------------
|
|
|
|
If you want to use the default configuration noted above, at a command prompt, simply
|
|
|
|
type: "perl checkalive.pl" (without the quotes). You must be in the same directory (or
|
|
|
|
folder) as the script and the URL list that you want to check.
|
|
|
|
|
2014-01-26 22:02:16 +00:00
|
|
|
Issues
|
|
|
|
------
|
2014-01-26 22:23:34 +00:00
|
|
|
The script does NOT have a "resume" feature at this time. If you are running through a
|
|
|
|
list of 1000's of URLs, and the script crashes, or you kill it, your lists of alive and
|
|
|
|
dead URLs will NOT BE SAVED TO DISK. I suggest breaking up your list into smaller lists
|
|
|
|
of a few hundred URLs in each list until I can implement a resume feature.
|
2014-01-26 22:02:16 +00:00
|
|
|
|