Parsero
Parsero audits Robots.txt files by parsing Disallow entries and checking their HTTP status codes. It reveals potentially sensitive directories or files that search engines are instructed not to index.
Description
Parsero is a Python script that reads the Robots.txt file from a web server and examines the Disallow entries. These entries specify directories or files that should not be indexed by search engines like Google, Bing, or Yahoo. For instance, 'Disallow: /portal/login' prevents crawlers from indexing content at www.example.com/portal/login, helping administrators protect sensitive information from being shared publicly.
The tool is useful for security audits, identifying exposed paths that might contain private data, admin panels, or other restricted areas. It can analyze a single URL or a list of domains, and optionally focus on Bing-indexed Disallows to see what content is actually accessible despite the directives.
By simulating requests to these disallowed paths, Parsero reports HTTP status codes such as 200 OK, 404 Not Found, or redirects, providing insight into server configurations and potential misconfigurations.
How It Works
Parsero fetches the Robots.txt file from the target URL, parses the Disallow directives, and sends HTTP requests to each listed path. It reports the response status codes (e.g., 200 OK, 404 Not Found, 301 Moved Permanently). With -sb, it searches Bing for indexed Disallow entries from Robots.txt. The tool uses Python libraries like BeautifulSoup (bs4) and urllib3 for parsing and HTTP requests.
Installation
sudo apt install parseroFlags
Examples
parsero -u www.bing.com -sbparsero -u www.bing.comparsero -u www.bing.com -oparsero -f domains.txtparsero -u example.com -sb -oparsero -h