<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>Method Details</title>
<link rel="stylesheet" type="text/css" media="screen" href="style.css">
<script name="javascript">
function show_hide_examples(mode)
{
if (document.getElementById("examples").style.display == "none")
{
document.getElementById("examples").style.display = "";
}
else
{
document.getElementById("examples").style.display = "none";
}
}
</script>
</head>
<body>
<div id="outer">
<h1 id="head">
<span>Method:
PHPCrawler::addContentTypeReceiveRule()</span>
</h1>
<h2 id="head">
<span><a href="overview.html"><< Back to class-overview</a></span>
</h2>
<br>
<div id="section">
Adds a rule to the list of rules that decides which pages or files - regarding their content-type - should be received
</div>
<div id="section">
<b>Signature:</b>
<p id="signature">
public addContentTypeReceiveRule($regex)
</p>
</div>
<div id="section">
<b>Parameters:</b>
<p>
<table id="param_list">
<tr><td id="paramname" width="1%"><b>$regex</b> </td><td width="1%"><i><i>string</i></i> </td><td width="*">The rule as a regular-expression</td></tr>
</table>
</p>
</div>
<div id="section">
<b>Returns:</b>
<p>
<table id="param_list">
<tr> <td width="1%"><i><i>bool</i></i> </td> <td width="*"> TRUE if the rule was added to the list.<br> FALSE if the given regex is not valid.</td></tr>
</table>
</p>
</div>
<div id="section">
<b>Description:</b>
<p>
After receiving the HTTP-header of a followed URL, the crawler check's - based on the given rules - whether the content of that URL<br>should be received.<br>If no rule matches with the content-type of the document, the content won't be received.<br><br>Example:<code>$crawler->addContentTypeReceiveRule("#text/html#");<br>$crawler->addContentTypeReceiveRule("#text/css#");</code><br>This rules lets the crawler receive the content/source of pages with the Content-Type "text/html" AND "text/css".<br>Other pages or files with different content-types (e.g. "image/gif") won't be received (if this is the only rule added to the list).<br><br><b>IMPORTANT:</b> By default, if no rule was added to the list, the crawler receives every content.<br><br>Note: To reduce the traffic the crawler will cause, you only should add content-types of pages/files you really want to receive.<br>But at least you should add the content-type "text/html" to this list, otherwise the crawler can't find any links.
</p>
</div>
</div>
<div id="footer">Docs created with <a href="http://phpclassview.cuab.de" target="_parent">PhpClassView</a></div>
</body>
</html>