SearchEngine:
Frequently Asked Questions
Topics
Some common problems have occurred when using the SearchEngine. This chapter
lists these problems and their solutions. Questions have been divided into two
categories; the SearchEngine and the Search applet.
The FAQ
index
- Files are not being excluded
- The SearchEngine is reading files excluded with the -xu flag.
- SearchEngine: tags or tag attributes are being stored
in the database
- The SearchEngine is storing words which look suspiciously like tags or tag
attributes.
- SearchEngine: keywords in titles and headers are missing
- The SearchEngine is not storing words which appear in HTML
tags like <TITLE>, <H1..H6>, etc.
- SearchEngine: runs fine for a while, then slows down
- The SearchEngine parses the first few hundred files, then slows down and
starts thrashing (repeatedly using) the hard-disk.
- SearchEngine: stops with an OutOfMemoryException
- The SearchEngine parses the first few hundred files, then displays a long
list of error messages, starting with OutOfMemoryException.
- SearchEngine: stops with a 'Too many files for the search
applet database' message
- The SearchEngine parses many hundreds of files, then displays a 'Too many
files for the search applet database' message.
- Applet: Search button remains gray, or an error message
appears
- The applet starts up, but after a few seconds, the search button appears
grayed out, or an error message is displayed.
- Applet: Clicking on a title causes the browser to issue
a 'document not found' error
- When the user double clicks on a found document title, instead of the browser
opening the document, it issues a 'document not found' error message.
-
Questions
about the SearchEngine
- Files are not being excluded
- The SearchEngine is reading files excluded with the -xu flag.
-
- Take care when using the wildcard character '*'.
- The wildcard character '*' can appear at the start of the
URL, and/or at the end of the URL, anywhere
else it is treated as an ordinary character.
No other combinations of the wildcard character '*' are valid.
A filter definition of */extawt/*remove.* will result in
a (probably useless) filter to ignore all URLs containing
/extawt/*remove., and not the probable intention of
ignoring all URLs containing /extawt/ and
also remove.
- The SearchEngine uses case sensitive URLs when filtering.
- Some operating systems (Windows) are case insensitive to file names,
however, the SearchEngine is not. If for example, the filter
-xu *.zip
was used, then all files ending in .zip will be removed,
but files ending in .ZIP will not. Use both lower
case and upper case to filter file extensions:
-xu *.zip
-xu *.ZIP
- Tags or tag attributes are being
stored in the database
- The SearchEngine is storing words which look suspiciously like tags or tag
attributes.
-
- The HTML documents may indeed contain the tag keywords
as text, if the argument is about HTML
- Check the documents for the offending keywords, and ensure that they
are or are not inside HTML markup, watch out for incorrectly
formed comment syntax.
- The HTML document may have syntax errors, which caused
the SearchEngine to store the words in the body, or ignore them completely.
- Check the documents for the offending keywords, and ensure that they
are inside the correct HTML markup, watch out for incorrectly
formed comment syntax.
-
- Keywords in titles and headers
are missing
- The SearchEngine is not storing words which appear in HTML
tags like <TITLE>, <H1..H6>, etc.
-
- The HTML document may have syntax errors, which caused
the SearchEngine to store the words in the body, or ignore them completely.
- Check the documents for the offending keywords, and ensure that they
are inside the correct HTML markup, watch out for incorrectly
formed comment syntax.
-
- Runs fine for a while, then slows
down
- The SearchEngine parses the first few hundred files, then slows down and
starts thrashing (repeatedly using) the hard-disk.
- The SearchEngine is running out of virtual memory.
- The SearchEngine requires about 1.5 to 2.0 times the virtual memory, as
the size of the documents being parsed. If, say, you have 9 MB of documents,
then you will require about 15 to 18 MB of virtual memory.
Start the Java interpreter with as much virtual memory as needed using
the -mx switch (the default is 16 MB):
java -mx24m ruptools.SearchEngine ...
- Not enough virtual memory.
- Possible solutions are:
- Split the files up into sub-groups, and create databases for each.
- Remove word groups, -nb, -nl, -nh (in that order).
- Do both, a restricted global search, with complete sub-search.
- Increase the word exclusion list (english.exclude.html is very generic)
Stops with an OutOfMemoryException
- The SearchEngine parses the first few hundred files, then displays a long
list of error messages, starting with OutOfMemoryException.
-
- The SearchEngine ran out of virtual memory.
- The SearchEngine requires about 1.5 to 2.0 times the virtual memory,
as the size of the documents being parsed. If, say, you have 9 MB of documents,
then you will require about 15 to 18 MB of virtual memory.
Start the Java interpreter with as much virtual memory as needed using
the -mx switch (the default is 16 MB):
java -mx24m ruptools.SearchEngine
- Not enough virtual memory.
- Possible solutions are:
- Split the files up into sub-groups, and create databases for each.
- Remove word groups, -nb, -nl, -nh (in that order).
- Do both, a restricted global search, with complete sub-search.
- Increase the word exclusion list (english.exclude.html is very generic)
Stops with a 'Too many files for the
search applet database' message
- The SearchEngine parses many hundreds of files, then displays a 'Too many
files for the search applet database' message.
-
- The SearchEngine exceeded the applet database maximum file size.
- The applet database can hold information on up to a maximum of 4096
HTML documents.
-
Questions
about the Search applet
- Search button remains gray, or
an error message appears
- The applet starts up, but after a few seconds, the search button appears
grayed out, or an error message is displayed.
- The cause of this problem is that the applet failed to find or load the
database.
-
- Check that the file path is correct.
- The applet will look in the path made up from the codebase
plus database parameter value. Supposing the applet definition
is:
<applet codebase=".." archive="Search.zip"
code="ruptools.Search.class" width=100 height=20>
<param name=database value="docsearch">
and assuming the applet file is in the /search directory,
then the applet will look for the file in /search/../classes/docsearch.ws
or, when reduced /classes/docsearch.ws
If this is not the correct location of the database file, then either
copy the database to that location, or change the database
parameter value.
Remember that the database file must appear in the codebase
path of the applet, otherwise some browsers may refuse access to the
file, causing the applet to fail.
- Check the file path for spelling.
- On some operating systems, the filename is case insensitive (Windows),
whilst on others it is not (Unix). Ensure that the codebase
path and database parameter path have the same case as the
directories and filename. The database file extension is .ws,
in lower case.
- Check that the file path is within the codebase path.
- As for checking the file path, ensure that the reduced file path is
the same or a child directory of the codebase, otherwise
some browsers may refuse access to the file, causing the applet to fail.
- Check the database file.
- The database file may have become corrupt, or have been replaced. Recompile
the database, and copy the file, then try running the applet again in
the browser or appletviewer.
-
- Clicking on a title causes the
browser to issue a 'document not found' error
- When the user double clicks on a found document title, instead of the browser
opening the document, it issues a 'document not found' error message.
-
- The path parameter is probably wrong or missing.
- The path parameter is used to correct the database document
URL with respect to the search applet HTML file
URL.
If, for example, when compiling the database the root file is specified
as:
-f /rational/application/search/doc/index.htm
and the root URL as:
-u http://www.ruptools.com/rup/rational/application/
search/doc/index.htm
then the root file URL will be stored in the database
as:
rational/application/search/doc/index.htm
which corresponds to the identical path in both options:
-f rational/application/search/doc/index.htm
-u http://www.ruptools.com/rup/rational/application/
search/doc/index.htm
If we now suppose the search applet HTML file to be at:
/rational/application/search/doc/docsearch.htm
for the local file, or
http://www.ruptools.com/rup/rational/application/
search/doc/docsearch.htm
for the Internet URL, then we need to correct the document
URL references in the applet database file to move back
three directories:
<param name=path value="../../../../">
Now, when the user clicks on a link, the browser will construct the
URL as follows:
rational/application/search/doc/
../../../../rational/application/search/doc/index.htm
for the local file, or
http://www.ruptools.com/rup/rational/application/search/doc/
../../../../rational/application/search/doc/index.htm
for the Internet URL, which reduces to:
/rational/application/search/doc/index.htm
for the local file, or
http://www.ruptools.com/rup/rational/application/
search/doc/index.htm
for the Internet URL.
Copyright
© 1987 - 2001 Rational Software Corporation
| |
|