File `robot.txt` defines which crawlers may access on your site. This file is placed at the root of website. Example for website www.google.com, the robots.txt file will be at www.google.com/robots.txt. `robots.txt` is text file which is written using the Robots Exclusion Standard. It consists of multiple rules, Each rule blocks or allows access for a given crawler to a specified file path in that website.
Example `robots.txt` file
Disallow: /nogooglebot/User-agent: AdsBot-Google-Mobile
Disallow: /desktop/User-agent: *
Here’s what that robots.txt file means:
- Googlebot is not allowed to crawl any URL that starts with
- AdsBot-Google-Mobile cannot crawl any URL that starts with
- All other user agents are allowed to crawl the entire site.
- The default behaviour is that user agents are allowed to crawl the entire site.
Example 2 :
# Example 1: Block only Googlebot
# Example 2: Block Googlebot and Adsbot
# Example 3: Block all crawlers except AdsBot (AdsBot crawlers must be named explicitly)
Important points to remember :
- Your site can have only one robots.txt file.
- Filename must be robots.txt
- A robots.txt file must be an UTF-8 encoded text file
#character marks the beginning of a comment.
Post Views: 1