View Complete Reference

Suzuki, T, Kamimasu, T, Nakatoh, T and Hirokawa, S (2018)

Identification of Unnatural Subsets in Statistical Data

7th International Congress on Advanced Applied Informatics (IIAI-AAI), pp. 74-80.

ISSN/ISBN: Not available at this time. DOI: 10.1109/IIAI-AAI.2018.00024

Abstract: Benford's law is an observation on the frequency distribution of first significant digits in natural numerical data. We can measure the unnaturalness of the data by evaluating estrangement of the frequency distribution of leading digits of the data in relation to the Benford's distribution. However, we cannot identify the unnatural part of the data precisely. In this study, we focus on the fact that statistical data is generally provided in tabular form. We specify a subset of the target data by using the item names of rows and columns that define each cell of the table or words appearing in the table title. By measuring the degree of divergence of the subset from Benford's distribution, we can identify unnatural subsets. We apply this method to agriculture-related data from China Statistical Yearbook and succeeded to identify unnatural subsets.

@INPROCEEDINGS{, author={Takahiko {Suzuki} and Tssukasa {Kamimasu} and Tetsuya {Nakatoh} and Sachio {Hirokawa}}, booktitle={2018 7th International Congress on Advanced Applied Informatics (IIAI-AAI)}, title={Identification of Unnatural Subsets in Statistical Data}, year={2018}, pages={74-80}, doi={10.1109/IIAI-AAI.2018.00024}, month={July},}

Reference Type: Conference Paper

Subject Area(s): Statistics