Location: PHPKode > projects > PHPCrawl > PHPCrawl_081/documentation/classreferences/PHPCrawler/method_detail_tpl_method_handleDocumentInfo.htm


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

<html>
<head>
 <title>Documentation for method: 
PHPCrawler::handleDocumentInfo()</title>
 <meta name="keywords" content="framework, API, manual, class reference, classreference, documentation" />
 <meta name="description" content="The class reference contains the detailed description of how to use every class, method, and property." />
 <link rel="stylesheet" type="text/css" media="screen" href="style.css">
 
 <script name="javascript">
 
 function show_hide_examples(mode)
 {
   if (document.getElementById("examples").style.display == "none")
   {
     document.getElementById("examples").style.display = "";
   }
   else
   {
     document.getElementById("examples").style.display = "none";
   }
 }
 </script>
 
</head>

<body>

<div id="outer">

<h1 id="head">
  <span>Method: 
PHPCrawler::handleDocumentInfo()</span>
</h1>

<h2 id="head">
 <span><a href="overview.html"><< Back to class-overview</a></span>
</h2>

<br>





<!--<?php include("google_code.php"); ?> -->

<div id="docframe">

<div id="section">

Override this method to get access to all information about a page or file the crawler found and received.
</div>

<div id="section">
<b>Signature:</b>
<p id="signature">
  
public handleDocumentInfo(PHPCrawlerDocumentInfo $PageInfo)
</p>
</div>

<div id="section">
<b>Parameters:</b>
<p>
<table id="param_list">
  
<tr><td id="paramname" width="1%"><b>$PageInfo</b>&nbsp;</td><td width="1%"><i><a href="../PHPCrawlerDocumentInfo/overview.html" class="inline">PHPCrawlerDocumentInfo</a></i>&nbsp;</td><td width="*">A PHPCrawlerDocumentInfo-object containing all information about the currently received document.<br>                                        Please see the reference of the <a href="../PHPCrawlerDocumentInfo/overview.html" class="inline">PHPCrawlerDocumentInfo</a>-class for detailed information.</td></tr>
</table>
</p>
</div>

<div id="section">
<b>Returns:</b>
<p>
<table id="param_list">
  
<tr> <td width="1%"><i><i>int</i></i>&nbsp;</td> <td width="*">                             The crawling-process will stop immedeatly if you let this method return any negative value.</td></tr>
</table>
</p>
</div>

<div id="section">
<b>Description:</b>
<p>

  
Everytime the crawler found and received a document on it's way this method will be called.<br>The crawler passes all information about the currently received page or file to this method<br>by a PHPCrawlerDocumentInfo-object.<br><br>Please see the <a href="../PHPCrawlerDocumentInfo/overview.html" class="inline">PHPCrawlerDocumentInfo</a> documentation for a list of all properties describing the<br>html-document.<br><br>Example:<code>class MyCrawler extends PHPCrawler<br>{<br>&nbsp; function handleDocumentInfo($PageInfo)<br>&nbsp; {<br>&nbsp; &nbsp; // Print the URL of the document<br>&nbsp; &nbsp; echo "URL: ".$PageInfo-&gt;url."&lt;br /&gt;";<br><br>&nbsp; &nbsp; // Print the http-status-code<br>&nbsp; &nbsp; echo "HTTP-statuscode: ".$PageInfo-&gt;http_status_code."&lt;br /&gt;";<br><br>&nbsp; &nbsp; // Print the number of found links in this document<br>&nbsp; &nbsp; echo "Links found: ".count($PageInfo-&gt;links_found_url_descriptors)."&lt;br /&gt;";<br>&nbsp; &nbsp; <br>&nbsp; &nbsp; // ..<br>&nbsp; }<br>}</code>
  
</p>
</div>





</div>


<div id="footer">Docs created with <a href="http://phpclassview.cuab.de"  target="_parent">PhpClassView</a></div>

</div>

</body>
</html>
Return current item: PHPCrawl