Skip to main content

New open source robots.txt projects

 Last year we released the robots.txt parser and matcher that we use in our production systems to the open source world. Since then, we've seen people build new tools with it, contribute to the open source library (effectively improving our production systems- thanks!), and release new language versions like golang and rust, which make it easier for developers to build new tools.

With the intern season ending here at Google, we wanted to highlight two new releases related to robots.txt that were made possible by two interns working on the Search Open Sourcing team, Andreea Dutulescu and Ian Dolzhanskii

Robots.txt Specification Test

First, we are releasing a testing framework for robots.txt parser developers, created by Andreea. The project provides a testing tool that can validate whether a robots.txt parser follows the Robots Exclusion Protocol, or to what extent. Currently there is no official and thorough way to assess the correctness of a parser, so Andreea built a tool that can be used to create robots.txt parsers that are following the protocol.

Java robots.txt parser and matcher

Second, we are releasing an official Java port of the C++ robots.txt parser, created by Ian. Java is the 3rd most popular programming language on GitHub and it's extensively used at Google as well, so no wonder it's been the most requested language port. The parser is a 1-to-1 translation of the C++ parser in terms of functions and behavior, and it's been thoroughly tested for parity against a large corpora of robots.txt rules. Teams are already planning to use the Java robots.txt parser in Google production systems, and we hope that you'll find it useful, too. 

As usual, we welcome your contributions to these projects. If you built something with the C++ robots.txt parser or with these new releases, let us know so we can potentially help you spread the word! If you found a bug, help us fix it by opening an issue on GitHub or directly contributing with a pull request. If you have questions or comments about these projects, catch us on Twitter!

It was our genuine pleasure to host Andreea and Ian, and we're sad that their internship is ending. Their contributions help make the Internet a better place and we hope that we can welcome them back to Google in the future.

Original Post:- https://webmasters.googleblog.com/2020/09/new-open-source-robotstxt-projects.html?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+blogspot%2FamDG+%28Official+Google+Webmaster+Central+Blog%29

Popular posts from this blog

New Schema.org support for retailer shipping data

  Quick summary : Starting today, we support   shippingDetails  schema.org markup   as an alternative way for retailers to be eligible for shipping details in Google Search results. Since June 2020, retailers have been able to list their products across different Google surfaces for free,  including on Google Search . We are committed to supporting ways for the ecosystem to better connect with users that come to Google to look for the best products, brands, and retailers by investing both in more robust tooling in  Google Merchant Center  as well as with new kinds of schema.org options. Shipping details, including cost and expected delivery times, are often a key consideration for users making purchase decisions. In our own studies, we’ve heard that users abandon shopping checkouts because of unforeseen or uncertain shipping costs. This is why we will often show shipping cost information in certain result types, including on free listings on Google Sea...

The 4-step Creative Writing Process for Professional Bloggers [2020]

The 4-step Creative Writing Process for Professional Bloggers I've mentored 1000s of new bloggers in the past few years (through my 100-day blogging course), and do you know what is the most frequently asked question? "How do I write better blog posts?" This is the most important question to ask if you want to become a blogger. From my 10-years of experience in blogging, and inputs from other experienced bloggers, I have come up with a 4-step process for writing.  You can read it here: The 4-step Creative Writing Process for Professional Bloggers In this extensive article I've covered the 4-step process in depth... 1. Gain Knowledge by Learning 2. Add Experience from Past & Present 3. Go into Deep Thought 4. Manifest Your Ideas into Writing This is one of the most important and the longest blog post I've written on my blog.  Read it here:  The 4-step Creative Writing Process for Professional Bloggers I've also explained about t...

The Search Console Training lives on

  In November 2019 we announced the   Search Console Training   YouTube series and started publishing videos regularly. The goal of the series was to create updated video content to be used alongside Search documentation, for example in the   Help Center   and in the   Developers site . The wonderful Google Developer Studio team (the engine behind those videos!) put together this fun blooper reel for the first wave of videos that we recorded in the Google London studio. So far we’ve published twelve episodes in the series, each focusing on a different part of the tool. We’ve seen it’s helping lots of people to learn how to use Search Console - so  we decided to continue recording videos… at home!  Please bear with the trucks, ambulances, neighbors, passing clouds, and of course the doorbell. ¯\_(ツ)_/¯ In addition to the location change, we’re also changing the scope of the new videos. Instead of focusing on one report at a time, we’ll discuss how ...