Forensicspdfanalysisparsingmalwareforensics

pdf-parser

pdf-parser parses PDF files to identify fundamental elements without rendering the document. It is designed for static analysis of PDF structures and objects.

Description

pdf-parser is a tool that parses PDF documents to identify the fundamental elements used in the analyzed file, such as comments, XREF tables, trailers, indirect objects, and specific PDF dictionary entries like /Catalog, /Font, /Page. It does not render the PDF, making it suitable for security analysis where rendering could be risky.

Use cases include malware analysis, particularly inspecting PDF documents for malicious content or anomalies, as referenced in MITRE D3FEND tactics for basic static malware analysis. The tool provides statistics on object counts and locations, helping analysts understand the document's structure.

It supports searching strings in indirect objects and filtering streams through specific decoders, aiding in detailed forensic examination of PDF files, zip files, or URLs.

How It Works

pdf-parser reads PDF files, zip-files, or URLs and parses their structure to extract and count elements like indirect objects, XREF, trailers, and dictionary keys (/Catalog, /Font, etc.). It lists object numbers and provides statistics without rendering, optionally searching strings in non-stream indirect objects or applying filters (FlateDecode, ASCIIHexDecode, ASCII85Decode, LZWDecode, RunLengthDecode) to stream objects.

Installation

bash
sudo apt install pdf-parser

Flags

-aDisplay statistics for the given PDF file
-h, --helpShow this help message and exit
-m, --manPrint manual
-s SEARCH, --search=SEARCHString to search in indirect objects (except streams)
-f, --filterPass stream object through filters (FlateDecode, ASCIIHexDecode, ASCII85Decode, LZWDecode and RunLengthDecode only)
--versionShow program's version number and exit

Examples

Display statistics for the given PDF file, showing counts of comments, XREF, trailer, indirect objects, and breakdowns by dictionary keys like /Catalog, /Font, /Page
pdf-parser -a /usr/share/doc/texmf/fonts/lm/lm-info.pdf
Show help message and usage information
pdf-parser -h
Print manual
pdf-parser -m
Show program's version number and exit
pdf-parser --version
Search for a string in indirect objects (except streams) of the PDF file
pdf-parser -s SEARCH pdf-file
Pass stream objects through supported filters like FlateDecode
pdf-parser -f pdf-file
Parse a PDF file to identify fundamental elements
pdf-parser pdf-file
Parse a zip-file containing PDFs
pdf-parser zip-file
Updated 2026-04-16kali.org ↗