Google has released its robots.txt parsing and matching library in an attempt to encourage web developers to use codes to help web crawlers.
In case you are wondering, the C++ library powers the Googlebot according to Robots Exclusion Protocol. This allows webmasters to specify the webpages they want Google to index in its search engine. Although it’s been over 25 years since the protocol has been in existence, it never became official.
Google’s latest release is an attempt to achieve that. The company wrote, “REP was never turned into an official Internet standard, which means that developers have interpreted the protocol somewhat differently over the years. And since its inception, the REP hasn’t been updated to cover today’s corner cases. This is a challenging problem for website owners because the ambiguous de-facto standard made it difficult to write the rules correctly.”
In the blog post, the company published some of the rules of a draft proposals to address how the protocol should be used. In order to do so, the company worked with the original author, search engines and webmasters before submitting the draft to IETF or Internet Engineering Task Force.
While the protocol will serve webmasters to let them control contents on their websites, standardizing the protocol means internet users will see more accurate results when they use different search engines. However, just because Google is open sourcing the protocol, doesn’t mean it will become an official standard going forward.