In: Vasconcelos, V., Domingues, I., Paredes, S. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2023. Lecture Notes in Computer Science, vol 14469. Springer, Cham .
ISSN/ISBN: Not available at this time. DOI: 10.1007/978-3-031-49018-7_4
Abstract: The data exchange between different sectors of society has led to the development of electronic documents supported by different reading formats, namely portable PDF format. These documents have characteristics similar to those used in programming languages, allowing the incorporation of potentially malicious code, which makes them a vector for cyberattacks. Thus, detecting anomalies in digital documents, such as PDF files, has become crucial in several domains, such as finance, digital forensic analysis and law enforcement. Currently, detection methods are mostly based on machine learning and are characterised by being complex, slow and mainly inefficient in detecting zero-day attacks. This paper aims to propose a Benford Law (BL) based model to uncover manipulated PDF documents by analysing potential anomalies in the first digit extracted from the PDF document’s characteristics. The proposed model was evaluated using the CIC Evasive PDFMAL2022 dataset, consisting of 1191 documents (278 benign and 918 malicious). To classify the PDF documents, based on BL, into malicious or benign documents, three statistical models were used in conjunction with the mean absolute deviation: the parametric Pearson and the non-parametric Spearman and Cramer-Von Mises models. The results show a maximum F1 score of 87.63% in detecting malicious documents using Pearson’s model, demonstrating the suitability and effectiveness of applying Benford’s Law in detecting anomalies in digital documents to maintain the accuracy and integrity of information and promoting trust in systems and institutions.
Bibtex:
@InProceedings{,
author="Fernandes, Pedro
and {\'O} Ciardhu{\'a}in, S{\'e}amus
and Antunes, M{\'a}rio",
editor="Vasconcelos, Ver{\'o}nica
and Domingues, In{\^e}s
and Paredes, Sim{\~a}o",
title="Uncovering Manipulated Files Using Mathematical Natural Laws",
booktitle="Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications",
year="2024",
publisher="Springer Nature Switzerland",
address="Cham",
pages="46--62",
isbn="978-3-031-49018-7"
}
Reference Type: Conference Paper
Subject Area(s): Image Processing