NAME

i-spy -- scrape FTP and web sites for content changes


SYNOPSIS

i-spy [-m e-mail] [-d directory]

i-spy [-d directory] [-w directory]

(See the OPTIONS section for alternate option syntax with long option names.)


DESCRIPTION

i-spy grabs and compares contents of FTP directories and web pages. It then compiles a report and either sends it via e-mail or saves it as a web page. You may also request both deliveries of the report. For e-mail reports, you may request plain text or HTML.

i-spy logs its activity as it chugs along. You may specify the log directory, or i-spy will try to find one automatically. For web page reports, i-spy will attempt to store the log in such a place where it may be referenced by the report and served by the web server.

A site definition is a directory which contains at least the data.txt file. This file must have the target URL as its first line. i-spy currently deals with FTP and HTTP URLs. An optional second line may indicate an alternate data source such as a precompiled site index or listing (see the CPAN example). For FTP sites only, a second line may also indicatre a directive for verbose listings (see the NTP example). The latter may be useful for sites where the operator updates directories and links without changing their name.

A site definition may also contain a pre-filter (see the Apple example) and a post-filter (see the NTP example). i-spy invokes pre-filters before comparing snapshots, and post-filters after comparing snapshots.

i-spy generates all other files you may find within site directories.

Options

-d directory, --dir directory, --directory directory
A directory contaning site definitions. If omitted, i-spy will try the present working directory.

-l directory, --log directory, --logDirectory directory
A directory for the log. If omitted or improper, i-spy will try to figure out an appropriate place.

-w directory, --web directory
A directory for generated HTML reports; this directory should ideally be accessible by a local or a remote web browser.

-m e-mail, --mail e-mail
A destination e-mail address for generated reports.

-h, --html
A flag indicated whether to send an HTML report (if set) or a plain text report (if not set). Reports saved in a specified web directory will be formatted as HTML regardless of this setting.

-n text, --name text
Specified text string overrides the name learned from the OS; used for log names and such.


EXAMPLES

For periodic automated runs, try something like this:

i-spy --web /var/www/i-spy --directory /var/sites

The above may be set to run once a day via cron and will generate a report each time which one may later check from a client machine with a web browser. All web reports provide a link to a report from a previous run. Also, each report references its corresponding log file.

i-spy will expect to find the site directories within the /var/sites directory specified above via the --directory directive.

i-spy will save the report in the directory specified by the --web /var/www/i-spy directive and the log in the corresponding /var/www/i-spy/logs directory. Both should be accessible via a web server or a locally running web browser.

For interactive reports delivered via e-mail, consider the following:

i-spy --mail user@domain.tld

i-spy will check the sites directory in your present working directory for site directories. It will generate a plain text report and send it to the specified user@domain.tld e-mail address.

i-spy will also leave a log in the logs subdirectory of the present working directory.

For HTML-formatted reports over e-mail, try:

i-spy --mail user@domain.tld --html


FILES

data.txt
A text file within individual site directories containing the primary URL on the first line and either an alternate URL on line two or an FTP listing type directive on line two.

added.txt
A text file within individual site directories containing all items added since the preceding run.

removed.txt
A text file within individual site directories containing all items removed since the preceding run.

retained.txt
A text file within individual site directories containing all items which have remained unchanged since the preceding run.

snapshot.txt
A text file within individual site directories containing the current snapshot of the remote resource.

old-snapshot.txt
A text file within individual site directories containing the snapshot of the remote resource saved during a preceding run.

pre-filter
An executable to process the scraped snapshot before comparison to the old-snapshot.

post-filter
An executable to process the scraped snapshot after comparison to the old-snapshot.

index.html
A symbolic link to the latest report saved in the web directory.


REQUIRES

Perl 5.8, Getopt::Long, Log::File, Date::Format


SEE ALSO

perl(1)


BUGS

Send bug reports, questions, and requests to i-spy@ayradyss.org.


AUTHOR

Igor S. Livshits <mailto:i-spy@ayradyss.org>


COPYRIGHT

Copyright (C) 2006 Igor S. Livshits

Use and distribute this tool as per the Artistic License