Web Application Analysisrobots.txtgolangwebcrawlerexclusionprotocol

robotstxt

Robots.txt is a tool that implements the robots.txt exclusion protocol for the Go language. It includes a utility for checking robots.txt compliance.

Description

The robotstxt tool provides an implementation of the robots.txt exclusion protocol in Go (golang). This protocol is used by websites to instruct web crawlers and other bots on which parts of the site they are allowed or disallowed to access. The package is available in two forms: a development package with dev files and a runtime binary.

The primary binary, robots.txt-check, allows users to verify how a specific bot would be treated by a site's robots.txt file. This is useful for web developers, SEO specialists, and security researchers to understand crawler permissions and potential information disclosure issues.

Use cases include testing crawler access restrictions, auditing website configurations for unintended exposures, and ensuring compliance with robots.txt directives during web reconnaissance or penetration testing.

How It Works

The tool parses robots.txt files according to the exclusion protocol specification. It interprets directives like User-agent, Allow, and Disallow to determine permissions for specified bots. The robots.txt-check command simulates a bot's request against a remote robots.txt file, evaluating access rules based on the protocol implemented in Go.

Installation

bash
sudo apt install robotstxt

Flags

-botSpecifies the bot name to check against robots.txt rules (default "GoogleBot")
-robots-urlSpecifies the URL of the robots.txt file to check

Examples

Displays the help message showing usage and available flags
robots.txt-check -h
Checks the default robots.txt (current host) for GoogleBot permissions
robots.txt-check -bot GoogleBot
Checks the robots.txt file at the specified URL using default GoogleBot
robots.txt-check -robots-url https://example.com/robots.txt
Checks permissions for BingBot against the specified robots.txt file
robots.txt-check -bot BingBot -robots-url https://example.com/robots.txt
Checks permissions for any bot (wildcard user-agent) against default robots.txt
robots.txt-check -bot *
Evaluates CustomBot permissions against target site's robots.txt
robots.txt-check -robots-url https://target.com/robots.txt -bot CustomBot
Updated 2026-04-16kali.org ↗