bulk-extractor
bulk_extractor scans disk images, files, or directories to extract useful information without parsing file system structures. It generates feature files and histograms for easy inspection and analysis.
Description
bulk_extractor is a high-performance C++ program designed for digital forensics that extracts information such as emails, URLs, credit card numbers, and other features directly from disk images or files. It operates without needing to parse the file system, making it efficient for large datasets. Results are stored in feature files that can be inspected manually or processed with automated tools, with histograms highlighting the most common and potentially important features.
Use cases include forensic investigations where rapid extraction of contact information, network artifacts, and sensitive data is needed from unallocated space or fragmented files. It supports multi-threaded processing and provides progress updates during analysis, making it suitable for large disk images.
The tool creates detailed reports on features like CCNs, domains, emails, IP addresses, telephone numbers, and URLs, enabling analysts to quickly identify key evidence without traditional file carving limitations.
How It Works
bulk_extractor reads the input disk image or file in phases: Phase 1 scans raw data using multiple enabled scanners (e.g., ccn, email, url) that apply regular expressions and pattern matching to detect features regardless of file system boundaries. It processes data in threads, reporting progress by offset and estimated completion time. Phase 2 shuts down scanners, and Phase 3 generates histograms for all detected features. Features are written to individual files in the output directory with context windows around matches, using configurable hash algorithms and scan options. Performance is CPU-bound, benefiting from multi-core systems.
Installation
sudo apt install bulk-extractorFlags
Examples
bulk_extractor -o bulk-out xp-laptop-2005-07-04-1430.imgbulk_extractor -hbulk_extractor -x accts image_namebulk_extractor -e base16 image_namebulk_extractor -S ssn_mode=1 image_namebulk_extractor -S word_min=6 -S word_max=16 image_namebulk_extractor -S hash_alg=sha1 image_name