Location: PHPKode > scripts > Secure HTML parser and filter,XSS,CSRF > secure-html-parser-and-filter/documentation/markup_filter_safe_html_class.html
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>Class: Markup filter safe HTML</title>
</head>
<body>
<center><h1>Class: Markup filter safe HTML</h1></center>
<hr />
<ul>
<p><b>Version:</b> <tt>@(#) $Id: markup_filter_safe_html.php,v 1.39 2009/08/19 09:09:06 mlemos Exp $</tt></p>
<h2><a name="table_of_contents">Contents</a></h2>
<ul>
<li><a href="#2.1.1">Summary</a></li>
<ul>
<li><a href="#3.2.0">Name</a></li>
<li><a href="#3.2.0.0">Author</a></li>
<li><a href="#3.2.0.1">Copyright</a></li>
<li><a href="#3.2.0.2">Version</a></li>
<li><a href="#3.2.0.3">Purpose</a></li>
<li><a href="#3.2.0.4">Usage</a></li>
</ul>
<li><a href="#4.1.1">Variables</a></li>
<ul>
<li><a href="#5.2.15">error</a></li>
<li><a href="#5.2.16">error_code</a></li>
<li><a href="#5.2.17">error_position</a></li>
<li><a href="#5.2.18">buffer_length</a></li>
<li><a href="#5.2.19">ignore_syntax_errors</a></li>
<li><a href="#5.2.20">warnings</a></li>
<li><a href="#5.2.21">store_positions</a></li>
<li><a href="#5.2.22">track_lines</a></li>
<li><a href="#5.2.23">unsafe_tags</a></li>
<li><a href="#5.2.24">safe_proprietary_css_properties</a></li>
<li><a href="#5.2.25">safe_css_property_types</a></li>
<li><a href="#5.2.26">safe_css_property_functions</a></li>
<li><a href="#5.2.27">safe_url_schemes</a></li>
<li><a href="#5.2.28">allow_server_side_includes</a></li>
</ul>
<li><a href="#6.1.1">Functions</a></li>
<ul>
<li><a href="#7.2.9">SetInput</a></li>
<li><a href="#9.2.10">GetPositionLine</a></li>
<li><a href="#11.2.11">FilterStylesheet</a></li>
<li><a href="#13.2.12">GetStylesheetPositionLine</a></li>
<li><a href="#15.2.13">StartParsing</a></li>
<li><a href="#17.2.14">Parse</a></li>
<li><a href="#19.2.15">FinishParsing</a></li>
<li><a href="#19.2.16">RewriteElement</a></li>
</ul>
</ul>
<p><a href="#table_of_contents">Top of the table of contents</a></p>
</ul>
<hr />
<ul>
<h2><li><a name="2.1.1">Summary</a></li></h2>
<ul>
<h3><a name="3.2.0">Name</a></h3>
<p>Markup filter safe HTML</p>
<h3><a name="3.2.0.0">Author</a></h3>
<p>Manuel Lemos (<a href="mailto:mlemos-at-acm.org">mlemos-at-acm.org</a>)</p>
<h3><a name="3.2.0.1">Copyright</a></h3>
<p>Copyright &copy; (C) Manuel Lemos 2009</p>
<h3><a name="3.2.0.2">Version</a></h3>
<p>@(#) $Id: markup_filter_safe_html.php,v 1.39 2009/08/19 09:09:06 mlemos Exp $</p>
<h3><a name="3.2.0.3">Purpose</a></h3>
<p>Parse an HTML document and remove all unsafe tags and CSS styles that may contain Javascript code and other harmful HTML structures. </p>
<p> Unsafe HTML is often submitted by untrusted users to sites that accept user submitted content. Such HTML may contain Javascript that could be used to perform <a href="http://en.wikipedia.org/wiki/Cross-site_scripting">cross-site scripting</a> (XSS) or <a href="http://en.wikipedia.org/wiki/Cross-site_request_forgery">cross-site request forgery</a> (CSRF) attacks.</p>
<h3><a name="3.2.0.4">Usage</a></h3>
<p>Use the <tt><a href="#function_StartParsing">StartParsing</a></tt> function to initialize the parser. Then use the <tt><a href="#function_Parse">Parse</a></tt> function to make the class parse HTML data, eventually read from files. When you are done with feeding the whole document data, call the <tt><a href="#function_FinishParsing">FinishParsing</a></tt> function. </p>
<p> The <tt><a href="#function_Parse">Parse</a></tt> function returns arrays of tokens that describe each document element. The <tt><a href="#function_RewriteElement">RewriteElement</a></tt> function can be used to convert the tokens back to HTML document strings. </p>
<p> By default, the class uses the markup validator class to parse the HTML documents before it actually analyzes and filters unsafe tags from the documents. Use the <tt><a href="#function_SetInput">SetInput</a></tt> function to set a different filter object as source of parsed document elements. </p>
<p> Element tokens are associated to the respective positions in the document. Positions are numbers that represent their offsets relative to beginning of the document. The <tt><a href="#function_GetPositionLine">GetPositionLine</a></tt> function can return the line and column number associated to a given document position if the <tt><a href="#variable_track_lines">track_lines</a></tt> is set to 1. </p>
<p> The class may also parse and filter individual CSS stylesheets using the function <tt><a href="#function_FilterStylesheet">FilterStylesheet</a></tt>. The function <tt><a href="#function_GetStylesheetPositionLine">GetStylesheetPositionLine</a></tt> may be used to determine the line associated to the position of an error.</p>
<p><a href="#table_of_contents">Table of contents</a></p>
</ul>
</ul>
<hr />
<ul>
<h2><li><a name="variables"></a><a name="4.1.1">Variables</a></li></h2>
<ul>
<li><tt><a href="#variable_error">error</a></tt></li><br />
<li><tt><a href="#variable_error_code">error_code</a></tt></li><br />
<li><tt><a href="#variable_error_position">error_position</a></tt></li><br />
<li><tt><a href="#variable_buffer_length">buffer_length</a></tt></li><br />
<li><tt><a href="#variable_ignore_syntax_errors">ignore_syntax_errors</a></tt></li><br />
<li><tt><a href="#variable_warnings">warnings</a></tt></li><br />
<li><tt><a href="#variable_store_positions">store_positions</a></tt></li><br />
<li><tt><a href="#variable_track_lines">track_lines</a></tt></li><br />
<li><tt><a href="#variable_unsafe_tags">unsafe_tags</a></tt></li><br />
<li><tt><a href="#variable_safe_proprietary_css_properties">safe_proprietary_css_properties</a></tt></li><br />
<li><tt><a href="#variable_safe_css_property_types">safe_css_property_types</a></tt></li><br />
<li><tt><a href="#variable_safe_css_property_functions">safe_css_property_functions</a></tt></li><br />
<li><tt><a href="#variable_safe_url_schemes">safe_url_schemes</a></tt></li><br />
<li><tt><a href="#variable_allow_server_side_includes">allow_server_side_includes</a></tt></li><br />
<p><a href="#table_of_contents">Table of contents</a></p>
<h3><a name="variable_error"></a><li><a name="5.2.15">error</a></li></h3>
<h3>Type</h3>
<p><tt><i>string</i></tt></p>
<h3>Default value</h3>
<p><tt>''</tt></p>
<h3>Purpose</h3>
<p>Store the message that is returned when an error occurs.</p>
<h3>Usage</h3>
<p>Check this variable to understand what happened when a call to any of the class functions has failed.</p>
<p> This class uses cumulative error handling. This means that if one class functions that may fail is called and this variable was already set to an error message due to a failure in a previous call to the same or other function, the function will also fail and does not do anything.</p>
<p> This allows programs using this class to safely call several functions that may fail and only check the failure condition after the last function call.</p>
<p> Just set this variable to an empty string to clear the error condition.</p>
<p><a href="#variables">Variables</a></p>
<h3><a name="variable_error_code"></a><li><a name="5.2.16">error_code</a></li></h3>
<h3>Type</h3>
<p><tt><i>int</i></tt></p>
<h3>Default value</h3>
<p><tt>0</tt></p>
<h3>Purpose</h3>
<p>Store the code that is returned when an error occurs.</p>
<h3>Usage</h3>
<p>Check this variable to understand what happened when a call to any of the class functions has failed. It may be set to several possible error codes defined as constants:</p>
<p> <tt>MARKUP_FILTER_SAFE_HTML_ERROR_NONE</tt> - No error happened </p>
<p> <tt>MARKUP_FILTER_SAFE_HTML_ERROR_UNEXPECTED</tt> - It was found a condition that the class is not yet ready to handle </p>
<p> <tt>MARKUP_FILTER_SAFE_HTML_ERROR_INVALID_SYNTAX</tt> - A syntax error was found </p>
<p> <tt>MARKUP_FILTER_SAFE_HTML_ERROR_INVALID_USAGE</tt> - An invalid value was passed to the class function parameters or set to the class variables </p>
<p> <tt>MARKUP_FILTER_SAFE_HTML_ERROR_UNSAFE_TAG</tt> - A tag considered unsafe was found </p>
<p> <tt>MARKUP_FILTER_SAFE_HTML_ERROR_UNSAFE_ATTRIBUTE</tt> - A tag attribute considered unsafe was found </p>
<p> <tt>MARKUP_FILTER_SAFE_HTML_ERROR_UNSAFE_CSS_STYLE</tt> - A CSS style considered unsafe was found </p>
<p> <tt>MARKUP_FILTER_SAFE_HTML_ERROR_SSI_COMMENT</tt> - An HTML comment with Server Side Include (SSI) commands was found</p>
<p><a href="#variables">Variables</a></p>
<h3><a name="variable_error_position"></a><li><a name="5.2.17">error_position</a></li></h3>
<h3>Type</h3>
<p><tt><i>int</i></tt></p>
<h3>Default value</h3>
<p><tt>-1</tt></p>
<h3>Purpose</h3>
<p>Point to the position of the markup data or file that refers to the last error that occurred.</p>
<h3>Usage</h3>
<p>Check this variable to determine the relevant position of the document when a parsing error occurs.</p>
<p><a href="#variables">Variables</a></p>
<h3><a name="variable_buffer_length"></a><li><a name="5.2.18">buffer_length</a></li></h3>
<h3>Type</h3>
<p><tt><i>int</i></tt></p>
<h3>Default value</h3>
<p><tt>8000</tt></p>
<h3>Purpose</h3>
<p>Maximum length of the chunks of markup data read from files that the class parse at one time.</p>
<h3>Usage</h3>
<p>Adjust this value according to the available memory.</p>
<p><a href="#variables">Variables</a></p>
<h3><a name="variable_ignore_syntax_errors"></a><li><a name="5.2.19">ignore_syntax_errors</a></li></h3>
<h3>Type</h3>
<p><tt><i>bool</i></tt></p>
<h3>Default value</h3>
<p><tt>1</tt></p>
<h3>Purpose</h3>
<p>Specify whether the class should ignore syntax errors in malformed documents.</p>
<h3>Usage</h3>
<p>Set this variable to 0 if it is necessary to verify whether markup data may be corrupted due to to eventual bugs in the program that generated the document.</p>
<p> Currently the class only ignores some types of syntax errors. Other syntax errors may still cause the <tt><a href="#function_Parse">Parse</a></tt> to fail.</p>
<p><a href="#variables">Variables</a></p>
<h3><a name="variable_warnings"></a><li><a name="5.2.20">warnings</a></li></h3>
<h3>Type</h3>
<p><tt><i>array</i></tt></p>
<h3>Default value</h3>
<p><tt>array()</tt></p>
<h3>Purpose</h3>
<p>Return a list of positions of the original document that contain syntax errors.</p>
<h3>Usage</h3>
<p>Check this variable to retrieve eventual document syntax errors that were ignored when the <tt><a href="#variable_ignore_syntax_errors">ignore_syntax_errors</a></tt> is set to 1.</p>
<p> The indexes of this array are the positions of the errors. The array values are the corresponding syntax error messages.</p>
<p><a href="#variables">Variables</a></p>
<h3><a name="variable_store_positions"></a><li><a name="5.2.21">store_positions</a></li></h3>
<h3>Type</h3>
<p><tt><i>bool</i></tt></p>
<h3>Default value</h3>
<p><tt>1</tt></p>
<h3>Purpose</h3>
<p>Tell the class to return the position of each document element token.</p>
<h3>Usage</h3>
<p>Set this variable to 0 if you do not need to know the position of each parsed markup element.</p>
<p><a href="#variables">Variables</a></p>
<h3><a name="variable_track_lines"></a><li><a name="5.2.22">track_lines</a></li></h3>
<h3>Type</h3>
<p><tt><i>bool</i></tt></p>
<h3>Default value</h3>
<p><tt>0</tt></p>
<h3>Purpose</h3>
<p>Tell the class to keep track the position of each document line.</p>
<h3>Usage</h3>
<p>Set this variable to 1 if you need to determine the line and column number associated to a given position of the parsed document.</p>
<p><a href="#variables">Variables</a></p>
<h3><a name="variable_unsafe_tags"></a><li><a name="5.2.23">unsafe_tags</a></li></h3>
<h3>Type</h3>
<p><tt><i>array</i></tt></p>
<h3>Default value</h3>
<p><tt>array()</tt></p>
<h3>Purpose</h3>
<p>List of tags that may be unsafe.</p>
<h3>Usage</h3>
<p>Change the default list of unsafe tags only if you realize there are more tags that should be considered unsafe. </p>
<p> Currently, the tags considered unsafe are: APPLET, IFRAME, OBJECT and SCRIPT. </p>
<p> It is not necessary to add proprietary tags because those will be discarded by another class that validates the HTML according to a standard DTD. </p>
<p> All the entries in this array variable must be set with a key with the name of the unsafe tag that the class should discard. The entry values should be set to an empty array to allow eventual parameters in future versions of this class.</p>
<p><a href="#variables">Variables</a></p>
<h3><a name="variable_safe_proprietary_css_properties"></a><li><a name="5.2.24">safe_proprietary_css_properties</a></li></h3>
<h3>Type</h3>
<p><tt><i>array</i></tt></p>
<h3>Default value</h3>
<p><tt>array()</tt></p>
<h3>Purpose</h3>
<p>List of proprietary CSS properties that should be considered safe to allow. Proprietary tags start with the - character.</p>
<h3>Usage</h3>
<p>Change the default list of safe CSS properties only if you realize there are more properties that should be considered safe. </p>
<p> All the entries in this array variable must be set with a key with the name of the safe property that the class should allow. The entry values should be set to an empty array to allow eventual parameters in future versions of this class.</p>
<p><a href="#variables">Variables</a></p>
<h3><a name="variable_safe_css_property_types"></a><li><a name="5.2.25">safe_css_property_types</a></li></h3>
<h3>Type</h3>
<p><tt><i>array</i></tt></p>
<h3>Default value</h3>
<p><tt>array()</tt></p>
<h3>Purpose</h3>
<p>List of types of expressions that should be considered safe to allow in CSS style values.</p>
<h3>Usage</h3>
<p>Change the default list of safe CSS property types only if you realize there are more properties that should be considered safe. </p>
<p> Currently, the types considered safe are: delimiter, dimension, function, hash, identifier, number, percentage, string and uri. </p>
<p> All the entries in this array variable must be set with a key with the name of the safe property type that the class should allow. The entry values should be set to an empty array to allow eventual parameters in future versions of this class.</p>
<p><a href="#variables">Variables</a></p>
<h3><a name="variable_safe_css_property_functions"></a><li><a name="5.2.26">safe_css_property_functions</a></li></h3>
<h3>Type</h3>
<p><tt><i>array</i></tt></p>
<h3>Default value</h3>
<p><tt>array()</tt></p>
<h3>Purpose</h3>
<p>List of the names of functions that should be considered safe to allow in CSS style values.</p>
<h3>Usage</h3>
<p>Change the default list of safe CSS property functions only if you realize there are more functions that should be considered safe. </p>
<p> All the entries in this array variable must be set with a key with the name of the safe function that the class should allow. The entry values should be set to an empty array to allow eventual parameters in future versions of this class.</p>
<p><a href="#variables">Variables</a></p>
<h3><a name="variable_safe_url_schemes"></a><li><a name="5.2.27">safe_url_schemes</a></li></h3>
<h3>Type</h3>
<p><tt><i>array</i></tt></p>
<h3>Default value</h3>
<p><tt>array()</tt></p>
<h3>Purpose</h3>
<p>List of schemes that should be considered safe to allow in URLs.</p>
<h3>Usage</h3>
<p>Change the default list of safe URL schemes only if you realize there are more schemes that should be considered safe. </p>
<p> All the entries in this array variable must be set with a key with the name of the scheme that the class should allow. The entry values should be set to an empty array to allow eventual parameters in future versions of this class.</p>
<p><a href="#variables">Variables</a></p>
<h3><a name="variable_allow_server_side_includes"></a><li><a name="5.2.28">allow_server_side_includes</a></li></h3>
<h3>Type</h3>
<p><tt><i>bool</i></tt></p>
<h3>Default value</h3>
<p><tt>0</tt></p>
<h3>Purpose</h3>
<p>Tell the class whether it should parse HTML comments to determine whether it contains server side includes (SSI) that should be filtered.</p>
<h3>Usage</h3>
<p>Set this variable to 1 only if you need to retrieve all comments unfiltered, even if they may server side include commands.</p>
<p><a href="#variables">Variables</a></p>
<p><a href="#table_of_contents">Table of contents</a></p>
</ul>
</ul>
<hr />
<ul>
<h2><li><a name="functions"></a><a name="6.1.1">Functions</a></li></h2>
<ul>
<li><tt><a href="#function_SetInput">SetInput</a></tt></li><br />
<li><tt><a href="#function_GetPositionLine">GetPositionLine</a></tt></li><br />
<li><tt><a href="#function_FilterStylesheet">FilterStylesheet</a></tt></li><br />
<li><tt><a href="#function_GetStylesheetPositionLine">GetStylesheetPositionLine</a></tt></li><br />
<li><tt><a href="#function_StartParsing">StartParsing</a></tt></li><br />
<li><tt><a href="#function_Parse">Parse</a></tt></li><br />
<li><tt><a href="#function_FinishParsing">FinishParsing</a></tt></li><br />
<li><tt><a href="#function_RewriteElement">RewriteElement</a></tt></li><br />
<p><a href="#table_of_contents">Table of contents</a></p>
<h3><a name="function_SetInput"></a><li><a name="7.2.9">SetInput</a></li></h3>
<h3>Synopsis</h3>
<p><tt><i></i> SetInput(</tt><ul>
<tt>(input and output) <i>object</i> </tt><tt><a href="#argument_SetInput_input">input</a></tt></ul>
<tt>)</tt></p>
<h3>Purpose</h3>
<p>Set the object of the class that will be used to parse HTML document before it is filtered by this class.</p>
<h3>Usage</h3>
<p>Use this function only if you need to override the HTML parsing class, which is the markup validator class by default is the markup filter validator class.</p>
<h3>Arguments</h3>
<ul>
<p><tt><b><a name="argument_SetInput_input">input</a></b></tt> - Reference to the HTML parser input object.</p>
</ul>
<p><a href="#functions">Functions</a></p>
<h3><a name="function_GetPositionLine"></a><li><a name="9.2.10">GetPositionLine</a></li></h3>
<h3>Synopsis</h3>
<p><tt><i>bool</i> GetPositionLine(</tt><ul>
<tt><i>int</i> </tt><tt><a href="#argument_GetPositionLine_position">position</a></tt><tt>,</tt><br />
<tt>(output) <i>int &amp;</i> </tt><tt><a href="#argument_GetPositionLine_line">line</a></tt><tt>,</tt><br />
<tt>(output) <i>int &amp;</i> </tt><tt><a href="#argument_GetPositionLine_column">column</a></tt></ul>
<tt>)</tt></p>
<h3>Purpose</h3>
<p>Get the line number of the document that corresponds to a given position.</p>
<h3>Usage</h3>
<p>Pass the document offset number as the position to be located. Make sure the <tt><a href="#variable_track_lines">track_lines</a></tt> variable is set to 1 before parsing the document.</p>
<h3>Arguments</h3>
<ul>
<p><tt><b><a name="argument_GetPositionLine_position">position</a></b></tt> - Position of the line to be located.</p>
<p><tt><b><a name="argument_GetPositionLine_line">line</a></b></tt> - Returns the number of the line that corresponds to the given document position.</p>
<p><tt><b><a name="argument_GetPositionLine_column">column</a></b></tt> - Returns the number of the column of the line that corresponds to the given document position.</p>
</ul>
<h3>Return value</h3>
<p>This function returns 1 if the <tt><a href="#variable_track_lines">track_lines</a></tt> variable is set to 1 and it was given a valid positive position number that does not exceed the position of the last parsed document line.</p>
<p><a href="#functions">Functions</a></p>
<h3><a name="function_FilterStylesheet"></a><li><a name="11.2.11">FilterStylesheet</a></li></h3>
<h3>Synopsis</h3>
<p><tt><i>bool</i> FilterStylesheet(</tt><ul>
<tt><i>string</i> </tt><tt><a href="#argument_FilterStylesheet_stylesheet">stylesheet</a></tt><tt>,</tt><br />
<tt>(output) <i>string &amp;</i> </tt><tt><a href="#argument_FilterStylesheet_filtered">filtered</a></tt></ul>
<tt>)</tt></p>
<h3>Purpose</h3>
<p>Filter a CSS stylesheet to discard unsafe style definitions.</p>
<h3>Usage</h3>
<p>Pass a string with the text of the stylesheet to filter.</p>
<h3>Arguments</h3>
<ul>
<p><tt><b><a name="argument_FilterStylesheet_stylesheet">stylesheet</a></b></tt> - String of the stylesheet to parse.</p>
<p><tt><b><a name="argument_FilterStylesheet_filtered">filtered</a></b></tt> - Returns the filtered stylesheet without any unsafe CSS style definitions.</p>
</ul>
<h3>Return value</h3>
<p>This function returns 1 if the stylesheet string was parsed successfully.</p>
<p><a href="#functions">Functions</a></p>
<h3><a name="function_GetStylesheetPositionLine"></a><li><a name="13.2.12">GetStylesheetPositionLine</a></li></h3>
<h3>Synopsis</h3>
<p><tt><i>bool</i> GetStylesheetPositionLine(</tt><ul>
<tt><i>int</i> </tt><tt><a href="#argument_GetStylesheetPositionLine_position">position</a></tt><tt>,</tt><br />
<tt>(output) <i>int &amp;</i> </tt><tt><a href="#argument_GetStylesheetPositionLine_line">line</a></tt><tt>,</tt><br />
<tt>(output) <i>int &amp;</i> </tt><tt><a href="#argument_GetStylesheetPositionLine_column">column</a></tt></ul>
<tt>)</tt></p>
<h3>Purpose</h3>
<p>Get the line number of a given position of the original stylesheet filtered with the <tt><a href="#function_FilterStylesheet">FilterStylesheet</a></tt> function.</p>
<h3>Usage</h3>
<p>Pass the stylesheet offset number as the position to be located. Make sure the <tt><a href="#variable_track_lines">track_lines</a></tt> variable is set to 1 before parsing the stylesheet.</p>
<h3>Arguments</h3>
<ul>
<p><tt><b><a name="argument_GetStylesheetPositionLine_position">position</a></b></tt> - Position of the line to be located.</p>
<p><tt><b><a name="argument_GetStylesheetPositionLine_line">line</a></b></tt> - Returns the number of the line that corresponds to the given stylesheet position.</p>
<p><tt><b><a name="argument_GetStylesheetPositionLine_column">column</a></b></tt> - Returns the number of the column of the line that corresponds to the given stylesheet position.</p>
</ul>
<h3>Return value</h3>
<p>This function returns 1 if the <tt><a href="#variable_track_lines">track_lines</a></tt> variable is set to 1 and it was given a valid positive position number that does not exceed the position of the last parsed stylesheet line.</p>
<p><a href="#functions">Functions</a></p>
<h3><a name="function_StartParsing"></a><li><a name="15.2.13">StartParsing</a></li></h3>
<h3>Synopsis</h3>
<p><tt><i>bool</i> StartParsing(</tt><ul>
<tt>(input and output) <i>array</i> </tt><tt><a href="#argument_StartParsing_parameters">parameters</a></tt></ul>
<tt>)</tt></p>
<h3>Purpose</h3>
<p>Initialize the state of the markup parser.</p>
<h3>Usage</h3>
<p>Call this function before start parsing the markup document, passing the file name or data to be parse and eventually other parsing option parameters.</p>
<h3>Arguments</h3>
<ul>
<p><tt><b><a name="argument_StartParsing_parameters">parameters</a></b></tt> - Specifies a list of options that define how to parse the given document. Currently it has the following options: </p>
<p> <tt>Data</tt> - String with the markup data to be parsed </p>
<p> <tt>File</tt> - Name of the file from which the data to be parsed should be read instead of a static string. </p>
<p> <tt>OnlyBody</tt> - Determine whether the HTML document should be parsed just as the BODY section or as a complete HTML document. </p>
<p> <tt>DTDCachePath</tt> - Path of directory where the cached DTD files will be stored to prevent the overhead of fecthing the DTD files from the remote DTD sites every time an HTML document is parsed. If this parameter is missing, the DTD will not be cached. </p>
</ul>
<h3>Return value</h3>
<p>Returns 1 if all parameters are correctly defined.</p>
<p><a href="#functions">Functions</a></p>
<h3><a name="function_Parse"></a><li><a name="17.2.14">Parse</a></li></h3>
<h3>Synopsis</h3>
<p><tt><i>bool</i> Parse(</tt><ul>
<tt>(output) <i>bool &amp;</i> </tt><tt><a href="#argument_Parse_end">end</a></tt><tt>,</tt><br />
<tt>(output) <i>array</i> </tt><tt><a href="#argument_Parse_elements">elements</a></tt></ul>
<tt>)</tt></p>
<h3>Purpose</h3>
<p>Parse the markup document.</p>
<h3>Usage</h3>
<p>Call this function iteratively until the <tt><a href="#argument_Parse_end">end</a></tt> argument is returned set to 1.</p>
<h3>Arguments</h3>
<ul>
<p><tt><b><a name="argument_Parse_end">end</a></b></tt> - Determine when the parser reached the end of the document.</p>
<p><tt><b><a name="argument_Parse_elements">elements</a></b></tt> - Return a sequence of associative arrays with entries that describe each document element that was parsed.</p>
</ul>
<h3>Return value</h3>
<p>Returns 1 if there were no fatal parsing errors.</p>
<p><a href="#functions">Functions</a></p>
<h3><a name="function_FinishParsing"></a><li><a name="19.2.15">FinishParsing</a></li></h3>
<h3>Synopsis</h3>
<p><tt><i>bool</i> FinishParsing(</tt><tt>)</tt></p>
<h3>Purpose</h3>
<p>Close any files and release any resources allocated while the document was being parsed.</p>
<h3>Usage</h3>
<p>Call this function after you are done with parsing the markup document.</p>
<h3>Return value</h3>
<p>Returns 1 if all resources were successfully released.</p>
<p><a href="#functions">Functions</a></p>
<h3><a name="function_RewriteElement"></a><li><a name="19.2.16">RewriteElement</a></li></h3>
<h3>Synopsis</h3>
<p><tt><i>bool</i> RewriteElement(</tt><ul>
<tt>(input and output) <i>array</i> </tt><tt><a href="#argument_RewriteElement_element">element</a></tt><tt>,</tt><br />
<tt>(output) <i>string &amp;</i> </tt><tt><a href="#argument_RewriteElement_markup">markup</a></tt></ul>
<tt>)</tt></p>
<h3>Purpose</h3>
<p>Generate a string for a previously parsed document markup element.</p>
<h3>Usage</h3>
<p>Call this function for each markup element when you want to regenerated an element that was just parsed and eventually filtered.</p>
<h3>Arguments</h3>
<ul>
<p><tt><b><a name="argument_RewriteElement_element">element</a></b></tt> - Associative array that defines the type and the values of the document element to be rewritten.</p>
<p><tt><b><a name="argument_RewriteElement_markup">markup</a></b></tt> - Return the string of the rewritten document element.</p>
</ul>
<h3>Return value</h3>
<p>Returns 0 if it is pass an invalid element definition.</p>
<p><a href="#functions">Functions</a></p>
<p><a href="#table_of_contents">Table of contents</a></p>
</ul>
</ul>

<hr />
<address>Manuel Lemos (<a href="mailto:mlemos-at-acm.org">mlemos-at-acm.org</a>)</address>
</body>
</html>
Return current item: Secure HTML parser and filter,XSS,CSRF