I’ve seen lot for beginners get confused with using Robots.txt. It’s very basic to understand the importance of using robots txt for a site. So, are you looking forward to learn about it and how to optimize this for making search bots to access your important pages of the website? The first question that everyone should know, what is robots.txt and how to use robots txt file for site or blog?
Read this post, will share all important aspects that you need to know about robots.txt file.
What is robots.txt in SEO?
It is a text file that is use to create to allow search bots how to crawl pages of the site and it resides in the root file of a domain. For example: seotutorialpoint.com/robots.txt
Now, the question may arise in your mind how to use robots txt file and how to optimize it?
Let’s start to understand about it, as we know all search engines have certain bots to crawl and index web pages of a website. When a search bot like Googlebot comes on your site, they follow all available links to crawl and index your site.
As a site owner, we don’t want to restrict major bots to prevent indexing of web pages of the site. There are two important things that we need to consider. Like Sitemap.xml and Robots.txt tells Google or MSN bot whether you want to allow access of getting indexed. This is very critical as per the search visibility of sites. So make sure to upload these two files in the root folder your website.
When a bot visit on your website, it will crawl pages of your site. As SE has limited resources to index blog links, they can’t crawl all pages of your sites, and this will result non-indexing. You can prevent inessential pages like admin login, terms & conditions, disclaimer, etc So that, the search engines will give more importance on your main pages of the site and this will result deep crawling of web pages.
Key Points about Robot.txt File
- It is not for No-index or Do-index
- Allow search bots not to crawl specific web pages
How to check your Robots.txt file?
As we already discussed, Robots.txt file resides in the root of your web domain. You can check your domain like www.seotutorialpoint.com/robots.txt. There is another option that you can check your domain Robots file using Google Webmaster Tool by going to GWT > Crawler> robots.txt Tester.
The basic structure of robots.txt to avoid duplicate content should be something like this
This will prevent robots to crawl your admin folder of your site followed by trackbacks, comment feeds, pages, feeds, and comments. Always remember this file only stops crawling but doesn’t block indexing. Google uses No-index tag for not indexing any pages of your blog.
If you’ve a website on WP CMS platform then you can use WP SEO by yoast plugin to add Noindex in any individual posts or a part of your blog. For effective SEO of your domain, Website, blog , I suggest you keep your category, tags pages as Noindex but dofollow.
Lets learn Robots.txt Format
If you want to allow indexing of everything
If your want to Disawllow indexing of a specific folder
Disallow indexing of everything