Robots_txt takes the URL of a page and retrieves the robots.txt file of the same site. The robots.txt is parsed and the rules defined in it are looked up, in order to determine if crawling a page is allowed.
Robots_txt also stores the time when a page is crawled to check whether next time another page of the same site is being crawled it is honoring the intended crawl delay and request rate limits.
PHP 5.0 or higher
More popular Web Search
- Google Custom Search 1.0 (4 years, 7 months 10 days ago)
- Yahoo Boss Search 1.0.1 (4 years, 3 months 1 day ago)
- Search Keywords 1.0 (8 years, 7 months 10 days ago)
- Spider Class 1.0 (6 years, 4 months 6 days ago)
- Script - Run your own Search engine on your server in 5 minutes 2.03 (8 years, 3 months 27 days ago)