Forensicspdfanalysisparsingmalwareforensics

pdf-parser

pdf-parser parses PDF files to identify fundamental elements without rendering the document. It is designed for static analysis of PDF structures and objects.

Description

pdf-parser is a tool that parses PDF documents to identify the fundamental elements used in the analyzed file, such as comments, XREF tables, trailers, indirect objects, and specific PDF dictionary entries like /Catalog, /Font, /Page. It does not render the PDF, making it suitable for security analysis where rendering could be risky.

Use cases include malware analysis, particularly inspecting PDF documents for malicious content or anomalies, as referenced in MITRE D3FEND tactics for basic static malware analysis. The tool provides statistics on object counts and locations, helping analysts understand the document's structure.

It supports searching strings in indirect objects and filtering streams through specific decoders, aiding in detailed forensic examination of PDF files, zip files, or URLs.

How It Works

pdf-parser reads PDF files, zip-files, or URLs and parses their structure to extract and count elements like indirect objects, XREF, trailers, and dictionary keys (/Catalog, /Font, etc.). It lists object numbers and provides statistics without rendering, optionally searching strings in non-stream indirect objects or applying filters (FlateDecode, ASCIIHexDecode, ASCII85Decode, LZWDecode, RunLengthDecode) to stream objects.

Installation

bash

sudo apt install pdf-parser

Flags

-aDisplay statistics for the given PDF file

-h, --helpShow this help message and exit

-m, --manPrint manual

-s SEARCH, --search=SEARCHString to search in indirect objects (except streams)

-f, --filterPass stream object through filters (FlateDecode, ASCIIHexDecode, ASCII85Decode, LZWDecode and RunLengthDecode only)

--versionShow program's version number and exit

Examples

Display statistics for the given PDF file, showing counts of comments, XREF, trailer, indirect objects, and breakdowns by dictionary keys like /Catalog, /Font, /Page

pdf-parser -a /usr/share/doc/texmf/fonts/lm/lm-info.pdf

Show help message and usage information

pdf-parser -h

Print manual

pdf-parser -m

Show program's version number and exit

pdf-parser --version

Search for a string in indirect objects (except streams) of the PDF file

pdf-parser -s SEARCH pdf-file

Pass stream objects through supported filters like FlateDecode

pdf-parser -f pdf-file

Parse a PDF file to identify fundamental elements

pdf-parser pdf-file

Parse a zip-file containing PDFs

pdf-parser zip-file

Updated 2026-04-16kali.org ↗