View Complete Reference

Nakatoh, T, Suzuki, T, Kamimasu. Y and Hirokawa. S (2020)

Detection of Unnatural Parts of Statistical Data

Information Engineering Express 6(2), pp. 20 – 36.

ISSN/ISBN: Not available at this time. DOI: Not available at this time.

Abstract: Ensuring the authenticity of statistical data is important because such data are used for various decision-making tasks. However, in practical applications, several types of data alterations have been reported. Therefore, it is necessary to validate the accuracy of statistical data. Benford's law is a well-known method for detecting unnatural numerical data. According to Benford's law, the occurrence probability of the first significant digits follows a particular distribution. However, the unnatural parts of data cannot be accurately identified. In this study, we attempted to identify the unnatural parts of statistical data available in tabular format. A subset of the target data was specified using the row and column names that define each cell in the table or the words displayed in the table title. By measuring the divergence of the subsets, we identified the unnatural subsets. In this paper, we present the results of the identification of unnatural subsets using the agricultural data acquired from the China Statistical Yearbook.

@article{, author = {Tetsuya Nakatoh and Takahiko Suzuki and Tsukasa Kamimasu and Sachio Hirokawa}, title = {Detection of Unnatural Parts of Statistical Data}, year = {2020}, journal = {Information Engineering Express}, volume = {6}, number = {2}, pages = {20--36}, doi = {}, url = {}, }

Reference Type: Journal Article

Subject Area(s): Statistics