Real-World Data Difficulty Estimation with the Use of Entropy期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

Real-World Data Difficulty Estimation with the Use of Entropy

Authors:	Przemys&#x;aw Juszczuk Jan Kozak Grzegorz Dziczkowski Szymon G&#x;owania Tomasz Jach Barbara Probierz

Institution:	1.Systems Research Institute, Polish Academy of Sciences, Newelska 6, 01-447 Warsaw, Poland;2.Faculty of Informatics and Communication, Department of Machine Learning, University of Economics in Katowice, 1 Maja 50, 40-287 Katowice, Poland; (J.K.); (G.D.); (S.G.); (T.J.); (B.P.)

Abstract:	In the era of the Internet of Things and big data, we are faced with the management of a flood of information. The complexity and amount of data presented to the decision-maker are enormous, and existing methods often fail to derive nonredundant information quickly. Thus, the selection of the most satisfactory set of solutions is often a struggle. This article investigates the possibilities of using the entropy measure as an indicator of data difficulty. To do so, we focus on real-world data covering various fields related to markets (the real estate market and financial markets), sports data, fake news data, and more. The problem is twofold: First, since we deal with unprocessed, inconsistent data, it is necessary to perform additional preprocessing. Therefore, the second step of our research is using the entropy-based measure to capture the nonredundant, noncorrelated core information from the data. Research is conducted using well-known algorithms from the classification domain to investigate the quality of solutions derived based on initial preprocessing and the information indicated by the entropy measure. Eventually, the best 25% (in the sense of entropy measure) attributes are selected to perform the whole classification procedure once again, and the results are compared.

Keywords:	entropy measure real-world data preprocessing decision table classification

设为首页 | 免责声明 | 关于勤云 | 加入收藏