In search preferences section, you will see 3 area that need an advance blogging knowledge to play with. They are meta tags, errors/redirections, and crawlers & indexing. In this post I’m gonna try to give you an easy explanation about what is custom robots txt use for.
Robots.txt is a command path for search engine to crawl or not to crawl a specific path or content on your blog or site. Its function is like a filter. Every blogspot site have a default robots.txt like below:
If you want to see yours, go to your browser address bar and type:
What Are These Lines Means?.
– Google Adsense Robot will crawl your blog content. If you add adsense on your blog, this robot will help your blog get the right advertise display on your pages.
– Command that tell SE robot to not visit this pages, post, or categories. There is no / sign after that, so it’s mean GA robot can crawl all your content.
– All internet search engine robot.
– Robot have no permission to crawl folder search such as /search/label and search/search?updated, etc. Why?, because label is not a real url. Google wants you search topic from search engine box, not just click randomly on label or categories. This also to avoiding duplicate content.
– Allow all pages to get crawl except the path on disallow above.
– Your blog feed address.
How to Prevent a Certain Page?.
If you have been blogging for quite long time, there is some point, you want to block robots to crawl a certain page. Maybe it’s a private content or top secret that you just want to share with your co-worker. On user-agent: *, disallow: /search part, you can add this example path below:
The image below is an example friendly robots.txt for blogspot. If you want to change it like below, go to settings —> search preferences —> crawlers and indexing —> enable custom robots.txt. Don’t forget to save changes when you done.
Click on the image for larger size.