getallurls
Fetches known URLs from AlienVault’s Open Threat Exchange, the Wayback Machine, and Common Crawl for any given domain. Inspired by Tomnomnom’s waybackurls.
Description
getallurls is a tool designed to retrieve historical and known URLs associated with a specific domain from multiple public sources. It queries AlienVault’s Open Threat Exchange (OTX), the Wayback Machine, and Common Crawl to compile a comprehensive list of URLs that have been indexed for the target domain.
This tool is particularly useful in reconnaissance phases of security assessments, bug bounty hunting, and OSINT investigations. By gathering URLs that may no longer be live or easily discoverable through standard crawling, security researchers can identify potential attack surfaces, forgotten endpoints, or archived content that could reveal sensitive information.
The tool provides flexibility through various output formats and provider selection, making it adaptable to different workflows and analysis needs.
How It Works
getallurls operates by querying three primary data sources: AlienVault’s Open Threat Exchange (OTX), the Internet Archive's Wayback Machine, and Common Crawl datasets. For a given domain, it retrieves indexed URLs from these archives which contain historical web data. The tool aggregates these URLs and outputs them in plain text or JSON format, with options to filter providers, use HTTP proxies, and save results to files.
Installation
sudo apt install getallurlsFlags
Examples
getallurls example.comgetallurls -json example.comgetallurls -o results.txt example.comgetallurls -p http://proxy:8080 example.comgetallurls -providers wayback example.comgetallurls -providers otx,commoncrawl example.comgetallurls -json -o urls.json example.com