Name of RSS robot: rss_pundit
Removing Documents From the RSS Snap-Shots
The rss_pundit is not interested in offering access to RSS whose authors do not want their materials in the collection.
To remove your RSS from the rss_pundit, place a robots.txt file at the top level of your site (e.g. www.yourdomain.com/robots.txt) and then submit your site below.
The robots.txt file will do two things:
It will remove all documents from your domain from rss_pundit.
It will tell us not to crawl your RSS in the future.
To exclude rss_pundit crawler (and remove documents from rss_pundit) while allowing all other robots to crawl your site, your robots.txt file should say:
User-agent: ia_archiver
Disallow: /
Robots.txt is the most widely used method for controlling the behavior of automated robots on your RSS (all major robots, including those of Google, Alta Vista, etc. respect these exclusions).
It can be used to block access to the whole domain, or any file or directory within.
There are a large number of resources for webmasters and site owners describing this method and how to use it. Here are some:
http://www.robotstxt.org/
http://pageresource.com/zine/robotstxt.htm
The robots.txt file must be placed at the root of your domain (www.yourdomain.com/robots.txt).
If you cannot put a robots.txt file up, read our exclusion policy.
|