Skip to main content

New open source robots.txt projects

 Last year we released the robots.txt parser and matcher that we use in our production systems to the open source world. Since then, we've seen people build new tools with it, contribute to the open source library (effectively improving our production systems- thanks!), and release new language versions like golang and rust, which make it easier for developers to build new tools.

With the intern season ending here at Google, we wanted to highlight two new releases related to robots.txt that were made possible by two interns working on the Search Open Sourcing team, Andreea Dutulescu and Ian Dolzhanskii

Robots.txt Specification Test

First, we are releasing a testing framework for robots.txt parser developers, created by Andreea. The project provides a testing tool that can validate whether a robots.txt parser follows the Robots Exclusion Protocol, or to what extent. Currently there is no official and thorough way to assess the correctness of a parser, so Andreea built a tool that can be used to create robots.txt parsers that are following the protocol.

Java robots.txt parser and matcher

Second, we are releasing an official Java port of the C++ robots.txt parser, created by Ian. Java is the 3rd most popular programming language on GitHub and it's extensively used at Google as well, so no wonder it's been the most requested language port. The parser is a 1-to-1 translation of the C++ parser in terms of functions and behavior, and it's been thoroughly tested for parity against a large corpora of robots.txt rules. Teams are already planning to use the Java robots.txt parser in Google production systems, and we hope that you'll find it useful, too. 

As usual, we welcome your contributions to these projects. If you built something with the C++ robots.txt parser or with these new releases, let us know so we can potentially help you spread the word! If you found a bug, help us fix it by opening an issue on GitHub or directly contributing with a pull request. If you have questions or comments about these projects, catch us on Twitter!

It was our genuine pleasure to host Andreea and Ian, and we're sad that their internship is ending. Their contributions help make the Internet a better place and we hope that we can welcome them back to Google in the future.

Original Post:- https://webmasters.googleblog.com/2020/09/new-open-source-robotstxt-projects.html?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+blogspot%2FamDG+%28Official+Google+Webmaster+Central+Blog%29

Popular posts from this blog

New Schema.org support for retailer shipping data

  Quick summary : Starting today, we support   shippingDetails  schema.org markup   as an alternative way for retailers to be eligible for shipping details in Google Search results. Since June 2020, retailers have been able to list their products across different Google surfaces for free,  including on Google Search . We are committed to supporting ways for the ecosystem to better connect with users that come to Google to look for the best products, brands, and retailers by investing both in more robust tooling in  Google Merchant Center  as well as with new kinds of schema.org options. Shipping details, including cost and expected delivery times, are often a key consideration for users making purchase decisions. In our own studies, we’ve heard that users abandon shopping checkouts because of unforeseen or uncertain shipping costs. This is why we will often show shipping cost information in certain result types, including on free listings on Google Search (currently in the US, in Engli

Evergreen Content 2.0: Timeless Posts People Will Actually Remember

Blogging experts preach the virtues of evergreen content, and for good reason. When you create timeless content, it's relevant weeks, months, and even YEARS after its publish date. But here's what the experts don't tell you: If your evergreen content is boring and forgettable, being "timeless" is pretty pointless. To truly stand the test of time, your content can't simply be evergreen. It needs to be memorable too. Here's how to make that happen: Evergreen Content 2.0: Timeless Posts People Will Actually Remember Talk soon, Jon Here are some recent posts you may have missed: 8 Best Free WordPress Themes of 2019 (Chosen by Experts) 600+ Power Words That’ll Pack Your Writing with Emotion How to Build a Niche Website (Step-by-Step Case Study) Thanks for reading this blog! Sachin

The 4-step Creative Writing Process for Professional Bloggers [2020]

The 4-step Creative Writing Process for Professional Bloggers I've mentored 1000s of new bloggers in the past few years (through my 100-day blogging course), and do you know what is the most frequently asked question? "How do I write better blog posts?" This is the most important question to ask if you want to become a blogger. From my 10-years of experience in blogging, and inputs from other experienced bloggers, I have come up with a 4-step process for writing.  You can read it here: The 4-step Creative Writing Process for Professional Bloggers In this extensive article I've covered the 4-step process in depth... 1. Gain Knowledge by Learning 2. Add Experience from Past & Present 3. Go into Deep Thought 4. Manifest Your Ideas into Writing This is one of the most important and the longest blog post I've written on my blog.  Read it here:  The 4-step Creative Writing Process for Professional Bloggers I've also explained about t