Information Gatheringwebsitemirroringofflinebrowserdownloadproxycache

HTTrack

HTTrack is an offline browser utility that copies websites to a local directory, recursively building all directories and files. It preserves the original site's relative link structure for offline browsing.

Description

HTTrack allows users to download entire World Wide Web sites from the Internet to a local directory, including HTML, images, and other files. The tool arranges the original site's relative link-structure, enabling users to browse the mirrored website as if online by simply opening a page in a browser. It supports updating existing mirrors and resuming interrupted downloads, with full configurability via an integrated help system.

Common use cases include creating offline copies for analysis, archiving websites, or accessing content without internet connectivity. It's particularly useful in cybersecurity for information gathering, such as mirroring target websites for reconnaissance or offline examination. Additional packages like webhttrack provide a web interface, while proxytrack serves archived content via a proxy server.

HTTrack handles proxy configurations, limits, flow control, and various parsing options, making it versatile for both simple downloads and complex mirroring tasks.

How It Works

HTTrack recursively downloads websites by following links up to specified depths, parsing HTML, JavaScript, and other content to build a local mirror. It uses multiple connections for efficiency, respects robots.txt and meta tags by default, supports proxy and cookie handling, and maintains relative link structures (configurable via K options). Cache mechanisms enable updates and retries, with MIME-type filtering, external link limits, and flow controls like timeouts and retries ensuring robust operation. The tool generates index files and logs for navigation and debugging.

Installation

bash
sudo apt install httrack

Flags

-Opath for mirror/logfiles+cache (-O path_mirror[,path_cache_and_logfiles])
-wmirror web sites (default)
-Wmirror web sites, semi-automatic (asks questions)
-gjust get files (saved in the current directory)
-icontinue an interrupted mirror using the cache
-rNset the mirror depth to N (default r9999)
-Pproxy use (-P proxy:port or -P user:pass@proxy:port)
-cNnumber of multiple connections (default c8)
%Pextended parsing, attempt to parse all links, even in unknown tags or Javascript
-Fuser-agent field sent in HTTP headers

Examples

mirror site www.someweb.com/bob/ and only this site
httrack www.someweb.com/bob/
mirror the two sites together (with shared links) and accept any .jpg files on .com sites
httrack www.someweb.com/bob/ www.anothertest.com/mike/ +*.com/*.jpg -mime:application/*
get all files starting from bobby.html, with 6 link-depth, and possibility of going everywhere on the web
httrack www.someweb.com/bob/bobby.html +* -r6
runs the spider on www.someweb.com/bob/bobby.html using a proxy
httrack www.someweb.com/bob/bobby.html --spider -P proxy.myhost.com:8080
updates a mirror in the current folder
httrack --update
continues a mirror in the current folder
httrack --continue
will bring you to the interactive mode
httrack
Updated 2026-04-16kali.org ↗