COMMAND |
ARGS |
DESCRIPTION |
GET |
URL |
Retrieve the contents of the provided URL using the GET method of encoding the query portion of the URL. This means that query keys and values will be encoded in the URL itself that is send to the server. |
POST |
URL |
Retrieve the contents of the provided URL using the POST method of encoding the query portion of the URL. This means that query keys and values will be sent to the server separately |
FILE |
Filename,arg[,...] |
This command executes a text file that contains other HTMLGet commands. The arguments after the first one represent the actual values for the variables which are referenced in the file.
For instance, the command file foo,arg1,arg2
Will run the commands in the file foo and substitute the value arg1 for each instance of $ARG:1$ and the value arg2 for each instance of $ARG:2$ |
Runtess |
UseCurrent, Isnew, jscfile, [args,...] |
Run TeSS. The arguments to this command indicate exactly where the page source comes from, and how the results are stored.
The arguments are: UseCurrent - the value of this argument is either the string true or the string false. If the value is true, the current HTMLGet page will be used as input to the screenscraper. If the argument is FALSE, then the host and url in the JSC file will be used to retrieve the source page. IsNew - this argument is either true of false. If false the results extracted from this run of TeSS will be added to the current set of results. Jscfile the path to a TeSS .jsc file which will be used to extract results from the page. Args, ... - TeSS wrappers may have arguments associated with them. If the value of UseCurrent is false, then the values provided here are passed to TeSS as the wrapper's set of input values.
|
Crawlandscrape |
Jscfile, query_contains, query_does_not_contain, Linktext |
This command is meant to scrape results off of a series of identical result pages. The JSC file is used to run TeSS on each of the result pages and extract values which are added to the current set of scraped results.
The JSCfile is a required argument. Any of the other arguments can be omitted by using the string ÒnullÓ as the value for that argument.
The crawlandscrape command determines which links in a page point at more result pages in one of the following ways: 1) the displayed link text matches the linktext regular expression OR 2) the URL in the link matches that of the original page AND the query parameters match both the query_contains and query_does_not_contain regular expressions if they have been specified. |
Showpage |
|
Take the current page, and render it using java's html display capability. Some pages may not display correctly or at all. |
Hidepage |
|
Remove the display of the page |
Dumppage |
|
Print the text of the current page to the standard output |
Currenturl |
|
Display the url of the current page |
Savehistory |
Filename |
Write the history of visited urls to a file |
Getselection |
|
Provided that the current page is rendered using showpage, this command will display a HTML document which contains only the selected portion of the document. |
Getforms |
|
Process the current page and extract the forms from it. |
Listforms |
|
List the names of all forms on the current page. This command must be called after getforms |
Listformproperties |
Formname |
List all the properties of the given form along with the current default values. |
Submitform |
Formname, args |
Submit a form. All arguments after the name of the form are optional and will override the default values for the corresponding form element. |
Resultstodb |
Table, jdbc-url,user,password |
Take the current results, and place them in a database |
Resultstocsv |
File |
Print the screenscraping results to a file
or to the screen in comma separated value format.
|
resulttojava |
classname |
This command allows a
callout to a java class. The class must: 1. implement the Runnable interface 2. have a constructor which takes (java.util.Vector, HtmlGet.HtmlGet) as its arguments. The class will be passed the current TeSS screenscraping results as a vector of Object arrays, and the current instance of the HtmlGet class. With these two arguments, the java code will be able to access all state of the current session, and programaticly alter the session by calling the methods of the HTMLGet class. |
Justtags |
TagRE[,...] |
This command takes a list of regular expressions which match HTML start element tags. The current page will be filtered such that after this command runs, the page will only contain these tags.
For example: justtags <INPUT
Will filter the document such that it contains only HTML input tags.
NOTE: the result of the filtering process may not leave a result that is itself a valid HTML page. |
Betweentags |
StartRE,endRE,[startREendRE] |
Extract the contents of a page between sets
of tags. The resulting document will also be marked up with comments
which describe what ÒlevelÓ in the original document
the results came from.
|