This is the concept which will be returned by all functions (FROM) which
have as main purpose the document retrieval and the documents happens to
be HTML pages. The content of the HTML page will be available as a
Stream stored in the content element of the Concept. The concept does
not alter in this way the source of the HTML Page. It also allows the
extraction of one ore more pieces of the page using regular expressions
through the select method provided by the String concept. Once the
desired HTML pieces are extracted it can be transformed in a Series of
Tags through the Tag/getTags method. It also provides a method to return
the content of the page directly as a Series of tags through the getTags
method.
$page = HTMLPage.create("http://www.nolimits.ro")
PRINT $page@url
http://www.nolimits.ro
PRINT $page/getLinks
http://www.nolimits.ro/corpinfo_en.shtml
http://www.nolimits.ro/documents/index.shtml
http://www.nolimits.ro/product_en.shtml
http://www.nolimits.ro/sl_en.shtml
http://www.nolimits.ro/news_en.shtml
http://www.nolimits.ro/contact_en.shtml
http://www.nolimits.ro/search_en.shtml
http://www.nolimits.ro/forum/
http://www.nolimits.ro/help_en.shtml
http://www.nolimits.ro/ro/
http://www.nolimits.ro/
This example will show how you can loggin in into a webpage using forms and cookies.
(I suppose you have already completed the needed form,
if not, see the examples in HTMLForm)
$url = "http://192.168.1.22/oursite/"
# Store the HTMLPage from URL into the $page
# We need this step only to obtain and store the cookies.
# You will not loggin in with this step !!!
$page from $url + "index.php"
# Saving cookies
$cookies = $page.getCookies
each $i in $cookies do [
$i.storeCookie ]
# creating a simplex expression
$expr = Simplex.create( $url + "*" )
# collecting the stored forms using the simplex
# We need to use only the first form
$form = HTMLForm.getStoredForms($expr)/1
# applying the form (with POST method) it will loggin in the script
HTMLPage.create( $form )
# finally We can get the informations from URL, we are already logged in
$page from $url + "index.php"
Methods inherited from: Concept
cloneConcept, extendsConcept, fromXML, getAllInheritedConcepts, getConceptAttribute, getConceptAttributeField, getConceptAttributeFields, getConceptAttributes, getConceptConstructors, getConceptElement, getConceptElementField, getConceptElementFields, getConceptElements, getConceptLabel, getConceptMethod, getConceptMethods, getConceptOperators, getConceptType, getErrorHandler, getInheritedConcepts, getResourceURI, hasConceptAttribute, hasConceptElement, hasConceptMethod, hasPath, isHidden, loadContent, setConceptLabel, setErrorHandler, setHidden, setShowEmpty, showEmpty, toTXT, toXML, setResourceURI |
|
Label: | contentType |
Type: | String |
Is Static: | false |
Is Hidden: | false |
Show Empty: | true |
Contains the type of the content of this page which is the value of the
content-type header field.
Label: | lastModified |
Type: | Integer |
Is Static: | false |
Is Hidden: | false |
Show Empty: | true |
This is an Integer representing the time the file was last modified,
measured in milliseconds since the epoch (00:00:00 GMT, January 1, 1970)
Label: | length |
Type: | Integer |
Is Static: | false |
Is Hidden: | false |
Show Empty: | true |
Contains the length of this page.
Label: | title |
Type: | integer |
Is Static: | false |
Is Hidden: | false |
Show Empty: | true |
Contains the Title of this document, extracted between the<title>... </title> tags of the HTML page.
Label: | url |
Type: | String |
Is Static: | false |
Is Hidden: | false |
Show Empty: | true |
Contains the URL of the document reproduced in this HTMLPage concept.
Label: | content |
Type: | String |
Is Static: | false |
Is Hidden: | false |
Is Multi: | false |
Show Empty: | true |
Contains a stream with the actual page content. This is the primary way
of keeping the content of the page. The reason for this is that on the
stream can be applied diverse regular expressions to extract the
relevant pieces of the page, and then to apply the selected pieces to a
Tag constructor for further processing.
Parameters: |
$url : | The URL from which this page will be constructed. |
|
Exceptions: |
badURLException :
(Error) | If the URL is not valid. |
IOException :
(Error) | If an I/O exception occurs. |
httpOperationException :
(Error) | If can't get HTML page header or page content. |
Constructor which will produce a HTMLPage concept from the URL given as
parameter. The concept produced in this way will have no headers.
Parameters: |
$form : | The HTMLForm from which this page will be constructed. |
|
Exceptions: |
badURLException :
(Error) | If the URL isn't valid. |
httpOperationException :
(Error) | If can't get HTML page handler or page content. |
IOException :
(Error) | If an I/O exception occurs. |
Constructor which will produce a HTMLPage concept with the HTMLForm
given as parameter. The concept produced in this way will have no
headers.
Exceptions: |
httpOperationException :
(Error) | If can't get HTML page content. |
Returns a String or a ByteArray which represents this page content.
$page = HTMLPage.create("http://www.google.com")
print $page.getContent
-- the result is --
<html><head><META HTTP-EQUIV="content-type" CONTENT="text/html; charset=ISO-8859-1">
<title>Google</title><style>
<!-- body,td,a,p,.h{font-family:arial,sans-serif;} .h{font-size: 20px;} .h{color:}
.q{text-decoration:none; color:#0000cc;}
//--></style>
...
</html>
Returns a Series containing Cookie concepts of all the cookie directives
found in the HTTP header of this page.
$page = HTMLPage.create("http://www.google.com")
print $page.getCookie
-- the result is --
Name : PREF
Value : ID=01ea0cb64090ca3a:TM=1028290720:LM=1028290720:S=MnaQWUJlyfw
Domain : .google.com
Path : /
Expires: 2147368447000
Secure : FALSE
Returns a Series containing HTMLForm concepts of all the HTML forms
found in this particular page.
$page = HTMLPage.create("http://www.google.com")
print $page.getForms
-- the result is --
HTMLForm
method: GET url: http://www.google.com/search name: f id: -1309137107
hl en
ie ISO-8859-1
q
btnG Google Search
btnI I'm Feeling Lucky
Returns a Map containing all the key, value HTTP header pairs, as
returned by the server which served this page.
$page = HTMLPage.create("http://www.google.com")
print $page.getHeaders
-- the result is --
Map
Content-Length 3152
Server GWS/2.0
Date Fri, 02 Aug 2002 12:26:14 GMT
Content-Type text/html
Cache-control private
Set-Cookie PREF=ID=2b87fa35752aa236:TM=1028291174:LM=1028291174:S=mGb7J7tJ95Q;
domain=.google.com; path=/; expires=Sun, 17-Jan-2038 19:14:07 GMT
Returns a Series of Elements containing all the Links found in the page.
The relative links will be returned to, but in their absolute form (the
URL of the page appended before).
$page = HTMLPage.create("http://www.google.com")
print $page.getLinks
-- the result is --
http://www.google.com/imghp?hl=en&ie=UTF-8
http://www.google.com/grphp?hl=en&ie=UTF-8
http://www.google.com/dirhp?hl=en&ie=UTF-8
http://www.google.com/advanced_search?hl=en
http://www.google.com/preferences?hl=en
http://www.google.com/language_tools?hl=en
http://www.google.com/ads/
http://www.google.com/services/
http://www.google.com/news/
http://toolbar.google.com
http://www.google.com/about.html
http://www.google.com/mgyhp.html\
The method will return a Series containing Tags and Strings resulted
from processing the content of this HTML page.
The rules are:
* all tags denoted by the < symbol and closed by the > symbol
will be transformed in a Tag type of concept.
* all strings outside tags will be transformed in the String type of
concept.
* the CR LF, CR, LF symbols will determine the parser to create a
new String type concept.
$page = HTMLPage.create("http://www.google.com")
print $page.getTags
-- the result is --
<html>
<head>
<META HTTP-EQUIV="content-type" CONTENT="text/html; charset=ISO-8859-1">
<title>
</title>
<style>
...
</html>
Returns a Series containing the tags with text resulted from processing
the content of this HTMLPage.
$page = HTMLPage.create("http://www.google.com")
print $page.getTagsWithText
-- the result is --
<html>
<head>
<META HTTP-EQUIV="content-type" CONTENT="text/html; charset=ISO-8859-1">
<title>
Google
</title>
<style>
...
</html>
Returns a Series with all URIs contained by this HTMLPage.
$page = HTMLPage.create("http://www.google.com")
print $page.getURIs
-- the result is --
http://www.google.com/imghp?hl=en&ie=UTF-8
http://www.google.com/grphp?hl=en&ie=UTF-8
http://www.google.com/dirhp?hl=en&ie=UTF-8
http://www.google.com/advanced_search?hl=en
http://www.google.com/preferences?hl=en
http://www.google.com/language_tools?hl=en
http://www.google.com/ads/
http://www.google.com/services/
http://www.google.com/news/
http://toolbar.google.com
http://www.google.com/about.html
http://www.google.com/mgyhp.html\
Returns a response code of this HTMLPage header.
$page = HTMLPage.create("http://www.google.com")
print $page.getResponseCode
-- the result is --
200
Returns a response message of this HTMLPage header.
$page = HTMLPage.create("http://www.google.com")
print $page.getResponseMessage
-- the result is --
OK
Parameters: |
$bol : | Boolean expression. |
|
Sets whether HTTP redirects (requests with response code 3xx) should be
automatically followed by your connection.
$page = HTMLPage.create("http://www.google.com")
print $page.isFollowRedirects
$page.setFollowRedirects(FALSE)
print $page.isFollowRedirects
-- the result is --
true
false
Returns a boolean indicating whether or not HTTP redirects (3xx) should
be automatically followed.
$page = HTMLPage.create("http://www.google.com")
print $page.isFollowRedirects
$page.setFollowRedirects(FALSE)
print $page.isFollowRedirects
-- the result is --
true
false