Configuring SiteSurfer
SiteSurfer Builder has many options for configuring how both the Builder
and the applet behave. Many of these options are presented to the user in
the normal course of index a site. These include
where the site is located,
how to retrieve the pages,
what important applet features to enable,
what pages to index, and
where to store the resultant index.
The above options are presented in the main steps the SiteSurfer Builder's wizard
interface because they are the options that are the most likely too need special
attention. At the same time, SiteSurfer lets you easily bypass most decisions
by setting reasonable default values. For instance, SiteSurfer Builder defaults
to indexing a web site directory, which is the most typical options for most users.
Also, you need not choose which pages to index, and instead just index all
the pages SiteSurfer encounters.
At the same time, once you index a particular web site, SiteSurfer will
remember all the choices you made for that site in its profile; the last 5
sites are remembered. If you selectively indexed pages on a site, the program
will even recall which pages were and were not selected, and mark those
appropriately the next time you index the same site.
In addition to the normal set of options, SiteSurfer Builder has a vast
array of more complex customizations available. On the 4th panel, where you
set important applet features, like the site index and site map, you may open
the advanced Options dialog by pressing the
button. This
dialog exposed settings for the following features:
- Indexing Fields lets you enable searching for
various individual types of data in the applet. For instance, you could look
for a file of a particular size--or size range--or one whose title best matches
the words you type.
- Stop Words let you exclude certain "junk" words
from the index. These words offer little value while searching, and their inclusion
would only bloat the index size.
- Name patterns instruct the program to automatically
include or exclude pages whose names--the fully-qualified addresses--match certain
patterns. For instance a name that matches *.GIF is probably an image file,
so is excluded; but a match to *.TXT is probably a text file that is easily
processed, and is included.
- Word sizes restrict the lengths of indexed words
between a minimum and maximum. This provides value by excluding certain "garbage"
strings that probably are not even words.
- Character settings let you specify which characters
are to be interpreted as parts of words. For specialized topics, some punctuation
characters or numbers may need to be included.
- Protocols for retrieving data may be turned on
or off. For instance, a web site accessed via HTTP may have FTP links to some
files on the same server, and you may or may not want to include those files
in the index.
- The Robots page lets you customize SiteSurfer's
adherence to the standard robots.txt and ROBOTS meta tag methods
of controlling what pages are visible to web crawlers and robots. Furthermore,
you may restrict how deep into the site SiteSurfer should crawl.
- Sample Pages let you turn on or off the generation
of sample HTML files that showcase the SiteSurfer applet when the index and
applet are generated for a site. You may also enable use of the Java Plugin,
so that the SiteSurfer applet is loaded with Sun's industry-standard JVM.