pdf-parser
pdf-parser parses PDF files to identify fundamental elements without rendering the document. It is designed for static analysis of PDF structures and objects.
Description
pdf-parser is a tool that parses PDF documents to identify the fundamental elements used in the analyzed file, such as comments, XREF tables, trailers, indirect objects, and specific PDF dictionary entries like /Catalog, /Font, /Page. It does not render the PDF, making it suitable for security analysis where rendering could be risky.
Use cases include malware analysis, particularly inspecting PDF documents for malicious content or anomalies, as referenced in MITRE D3FEND tactics for basic static malware analysis. The tool provides statistics on object counts and locations, helping analysts understand the document's structure.
It supports searching strings in indirect objects and filtering streams through specific decoders, aiding in detailed forensic examination of PDF files, zip files, or URLs.
How It Works
pdf-parser reads PDF files, zip-files, or URLs and parses their structure to extract and count elements like indirect objects, XREF, trailers, and dictionary keys (/Catalog, /Font, etc.). It lists object numbers and provides statistics without rendering, optionally searching strings in non-stream indirect objects or applying filters (FlateDecode, ASCIIHexDecode, ASCII85Decode, LZWDecode, RunLengthDecode) to stream objects.
Installation
sudo apt install pdf-parserFlags
Examples
pdf-parser -a /usr/share/doc/texmf/fonts/lm/lm-info.pdfpdf-parser -hpdf-parser -mpdf-parser --versionpdf-parser -s SEARCH pdf-filepdf-parser -f pdf-filepdf-parser pdf-filepdf-parser zip-file