Information Gatheringurlswaybackotxcommoncrawlreconosint

getallurls

Fetches known URLs from AlienVault’s Open Threat Exchange, the Wayback Machine, and Common Crawl for any given domain. Inspired by Tomnomnom’s waybackurls.

Description

getallurls is a tool designed to retrieve historical and known URLs associated with a specific domain from multiple public sources. It queries AlienVault’s Open Threat Exchange (OTX), the Wayback Machine, and Common Crawl to compile a comprehensive list of URLs that have been indexed for the target domain.

This tool is particularly useful in reconnaissance phases of security assessments, bug bounty hunting, and OSINT investigations. By gathering URLs that may no longer be live or easily discoverable through standard crawling, security researchers can identify potential attack surfaces, forgotten endpoints, or archived content that could reveal sensitive information.

The tool provides flexibility through various output formats and provider selection, making it adaptable to different workflows and analysis needs.

How It Works

getallurls operates by querying three primary data sources: AlienVault’s Open Threat Exchange (OTX), the Internet Archive's Wayback Machine, and Common Crawl datasets. For a given domain, it retrieves indexed URLs from these archives which contain historical web data. The tool aggregates these URLs and outputs them in plain text or JSON format, with options to filter providers, use HTTP proxies, and save results to files.

Installation

bash
sudo apt install getallurls

Flags

-jsonwrite output as json
-o stringfilename to write results to
-p stringHTTP proxy to use
-providers stringproviders to fetch urls for (default "wayback,otx,commoncrawl")

Examples

Fetches URLs for example.com from all default providers (wayback, otx, commoncrawl)
getallurls example.com
Fetches URLs for example.com and outputs results in JSON format
getallurls -json example.com
Fetches URLs for example.com and saves results to results.txt file
getallurls -o results.txt example.com
Fetches URLs for example.com using specified HTTP proxy
getallurls -p http://proxy:8080 example.com
Fetches URLs for example.com only from Wayback Machine provider
getallurls -providers wayback example.com
Fetches URLs for example.com from OTX and Common Crawl providers only
getallurls -providers otx,commoncrawl example.com
Fetches URLs for example.com in JSON format and saves to urls.json
getallurls -json -o urls.json example.com
Updated 2026-04-16kali.org ↗