Skip to main content

New open source robots.txt projects

 Last year we released the robots.txt parser and matcher that we use in our production systems to the open source world. Since then, we've seen people build new tools with it, contribute to the open source library (effectively improving our production systems- thanks!), and release new language versions like golang and rust, which make it easier for developers to build new tools.

With the intern season ending here at Google, we wanted to highlight two new releases related to robots.txt that were made possible by two interns working on the Search Open Sourcing team, Andreea Dutulescu and Ian Dolzhanskii

Robots.txt Specification Test

First, we are releasing a testing framework for robots.txt parser developers, created by Andreea. The project provides a testing tool that can validate whether a robots.txt parser follows the Robots Exclusion Protocol, or to what extent. Currently there is no official and thorough way to assess the correctness of a parser, so Andreea built a tool that can be used to create robots.txt parsers that are following the protocol.

Java robots.txt parser and matcher

Second, we are releasing an official Java port of the C++ robots.txt parser, created by Ian. Java is the 3rd most popular programming language on GitHub and it's extensively used at Google as well, so no wonder it's been the most requested language port. The parser is a 1-to-1 translation of the C++ parser in terms of functions and behavior, and it's been thoroughly tested for parity against a large corpora of robots.txt rules. Teams are already planning to use the Java robots.txt parser in Google production systems, and we hope that you'll find it useful, too. 

As usual, we welcome your contributions to these projects. If you built something with the C++ robots.txt parser or with these new releases, let us know so we can potentially help you spread the word! If you found a bug, help us fix it by opening an issue on GitHub or directly contributing with a pull request. If you have questions or comments about these projects, catch us on Twitter!

It was our genuine pleasure to host Andreea and Ian, and we're sad that their internship is ending. Their contributions help make the Internet a better place and we hope that we can welcome them back to Google in the future.

Original Post:- https://webmasters.googleblog.com/2020/09/new-open-source-robotstxt-projects.html?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+blogspot%2FamDG+%28Official+Google+Webmaster+Central+Blog%29

Popular posts from this blog

The Search Console Training lives on

  In November 2019 we announced the   Search Console Training   YouTube series and started publishing videos regularly. The goal of the series was to create updated video content to be used alongside Search documentation, for example in the   Help Center   and in the   Developers site . The wonderful Google Developer Studio team (the engine behind those videos!) put together this fun blooper reel for the first wave of videos that we recorded in the Google London studio. So far we’ve published twelve episodes in the series, each focusing on a different part of the tool. We’ve seen it’s helping lots of people to learn how to use Search Console - so  we decided to continue recording videos… at home!  Please bear with the trucks, ambulances, neighbors, passing clouds, and of course the doorbell. ¯\_(ツ)_/¯ In addition to the location change, we’re also changing the scope of the new videos. Instead of focusing on one report at a time, we’ll discuss how ...

3 Business Lessons from Strategy Games

I am not sure how much computer games you play. Do you? I am a gamer and I love strategy games which involve planning. My favourite game is Red Alert. I have been playing Red Alert 3 for a very long time and its a war game. I realized, I have learned a lot of business lessons by playing Red Alert. And here are the top 3. 1. Don't Attack Too Soon In Red Alert, we have limited time and resources. I have to build an army with those resources and then attack my enemy. If I did not build a large enough army, I will lose the battle and also my troops. Once I lose my troops, I have to rebuild it from scratch. If I build a large enough troop after looking at my enemy, and attack with everything I have, I have a chance of winning. In business, it is similar. If you build a product and if it is not as good as your competitors, you will fail. Once you take it to the market and fail, there is no coming back. So have some patience and build a good product. Whe...

Understanding Google Analytics

Now, we are in the third section of the course. In section 1, we covered the basics of digital marketing - what it is, why to learn, and what is the best way to learn. In section 2, we learned how to create a website, how to install WordPress from cPanel and hot to set up your first WordPress blog. If you have followed the lessons and taken appropriate actions, you should be having your blog up and running. From here onwards, you are going to learn the different components of digital marketing. We'll start with  Google Analytics . In this lesson,  you'll learn all about Google Analytics . Why Analytics first? Why not SEO or Adwords or Facebook? Because after this lesson you'll be learning about the traffic generation and optimization techniques. But to know whether your techniques and tactics are working or not, you need data.  Data, that could tell you about what's working and what's not. Google Analytics is the platform that'll help yo...