共查询到20条相似文献,搜索用时 15 毫秒
1.
Mohammad Azad Igor Chikalov Shahid Hussain Mikhail Moshkov Beata Zielosko 《Entropy (Basel, Switzerland)》2021,23(12)
Conventional decision trees use queries each of which is based on one attribute. In this study, we also examine decision trees that handle additional queries based on hypotheses. This kind of query is similar to the equivalence queries considered in exact learning. Earlier, we designed dynamic programming algorithms for the computation of the minimum depth and the minimum number of internal nodes in decision trees that have hypotheses. Modification of these algorithms considered in the present paper permits us to build decision trees with hypotheses that are optimal relative to the depth or relative to the number of the internal nodes. We compare the length and coverage of decision rules extracted from optimal decision trees with hypotheses and decision rules extracted from optimal conventional decision trees to choose the ones that are preferable as a tool for the representation of information. To this end, we conduct computer experiments on various decision tables from the UCI Machine Learning Repository. In addition, we also consider decision tables for randomly generated Boolean functions. The collected results show that the decision rules derived from decision trees with hypotheses in many cases are better than the rules extracted from conventional decision trees. 相似文献
2.
3.
Some theories are explored in this research about decision trees which give theoretical support to the applications based on decision trees. The first is that there are many splitting criteria to choose in the tree growing process. The splitting bias that influences the criterion chosen due to missing values and variables with many possible values has been studied. Results show that the Gini index is superior to entropy information as it has less bias regarding influences. The second is that noise variables with more missing values have a better chance to be chosen while informative variables do not. The third is that when there are many noise variables involved in the tree building process, it influences the corresponding computational complexity. Results show that the computational complexity increase is linear to the number of noise variables. So methods that decompose more information from the original data but increase the variable dimension can also be considered in real applications. 相似文献
4.
The string-matching paradigm is applied in every computer science and science branch in general. The existence of a plethora of string-matching algorithms makes it hard to choose the best one for any particular case. Expressing, measuring, and testing algorithm efficiency is a challenging task with many potential pitfalls. Algorithm efficiency can be measured based on the usage of different resources. In software engineering, algorithmic productivity is a property of an algorithm execution identified with the computational resources the algorithm consumes. Resource usage in algorithm execution could be determined, and for maximum efficiency, the goal is to minimize resource usage. Guided by the fact that standard measures of algorithm efficiency, such as execution time, directly depend on the number of executed actions. Without touching the problematics of computer power consumption or memory, which also depends on the algorithm type and the techniques used in algorithm development, we have developed a methodology which enables the researchers to choose an efficient algorithm for a specific domain. String searching algorithms efficiency is usually observed independently from the domain texts being searched. This research paper aims to present the idea that algorithm efficiency depends on the properties of searched string and properties of the texts being searched, accompanied by the theoretical analysis of the proposed approach. In the proposed methodology, algorithm efficiency is expressed through character comparison count metrics. The character comparison count metrics is a formal quantitative measure independent of algorithm implementation subtleties and computer platform differences. The model is developed for a particular problem domain by using appropriate domain data (patterns and texts) and provides for a specific domain the ranking of algorithms according to the patterns’ entropy. The proposed approach is limited to on-line exact string-matching problems based on information entropy for a search pattern. Meticulous empirical testing depicts the methodology implementation and purports soundness of the methodology. 相似文献
5.
The research concerns data collected in independent sets—more specifically, in local decision tables. A possible approach to managing these data is to build local classifiers based on each table individually. In the literature, many approaches toward combining the final prediction results of independent classifiers can be found, but insufficient efforts have been made on the study of tables’ cooperation and coalitions’ formation. The importance of such an approach was expected on two levels. First, the impact on the quality of classification—the ability to build combined classifiers for coalitions of tables should allow for the learning of more generalized concepts. In turn, this should have an impact on the quality of classification of new objects. Second, combining tables into coalitions will result in reduced computational complexity—a reduced number of classifiers will be built. The paper proposes a new method for creating coalitions of local tables and generating an aggregated classifier for each coalition. Coalitions are generated by determining certain characteristics of attribute values occurring in local tables and applying the Pawlak conflict analysis model. In the study, the classification and regression trees with Gini index are built based on the aggregated table for one coalition. The system bears a hierarchical structure, as in the next stage the decisions generated by the classifiers for coalitions are aggregated using majority voting. The classification quality of the proposed system was compared with an approach that does not use local data cooperation and coalition creation. The structure of the system is parallel and decision trees are built independently for local tables. In the paper, it was shown that the proposed approach provides a significant improvement in classification quality and execution time. The Wilcoxon test confirmed that differences in accuracy rate of the results obtained for the proposed method and results obtained without coalitions are significant, with a p level = 0.005. The average accuracy rate values obtained for the proposed approach and the approach without coalitions are, respectively: 0.847 and 0.812; so the difference is quite large. Moreover, the algorithm implementing the proposed approach performed up to 21-times faster than the algorithm implementing the approach without using coalitions. 相似文献
6.
Rebecca M. Kuiper 《Entropy (Basel, Switzerland)》2022,24(11)
Meta-analysis techniques allow researchers to aggregate effect sizes—like standardized mean difference(s), correlation(s), or odds ratio(s)—of different studies. This leads to overall effect-size estimates and their confidence intervals. Additionally, researchers can aim for theory development or theory evaluation. That is, researchers may not only be interested in these overall estimates but also in a specific ordering or size of them, which then reflects a theory. Researchers may have expectations regarding the ordering of standardized mean differences or about the (ranges of) sizes of an odds ratio or Hedges’ g. Such theory-based hypotheses most probably contain inequality constraints and can be evaluated with the Akaike’s information criterion type (i.e., AIC-type) confirmatory model selection criterion called generalized order-restricted information criterion (GORICA). This paper introduces and illustrates how the GORICA can be applied to meta-analyzed estimates. Additionally, it compares the use of the GORICA to that of classical null hypothesis testing and the AIC, that is, the use of theory-based hypotheses versus null hypotheses. By using the GORICA, researchers from all types of fields (e.g., psychology, sociology, political science, biomedical science, and medicine) can quantify the support for theory-based hypotheses specified a priori. This leads to increased statistical power, because of (i) the use of theory-based hypotheses (cf. one-sided vs. two-sided testing) and (ii) the use of meta-analyzed results (that are based on multiple studies which increase the combined sample size). The quantification of support and the power increase aid in, for instance, evaluating and developing theories and, therewith, developing evidence-based treatments and policy. 相似文献
7.
Wojciech Wieczorek Jan Kozak ukasz Strk Arkadiusz Nowakowski 《Entropy (Basel, Switzerland)》2021,23(12)
A new two-stage method for the construction of a decision tree is developed. The first stage is based on the definition of a minimum query set, which is the smallest set of attribute-value pairs for which any two objects can be distinguished. To obtain this set, an appropriate linear programming model is proposed. The queries from this set are building blocks of the second stage in which we try to find an optimal decision tree using a genetic algorithm. In a series of experiments, we show that for some databases, our approach should be considered as an alternative method to classical ones (CART, C4.5) and other heuristic approaches in terms of classification quality. 相似文献
8.
Aziz Khan Shougi S. Abosuliman Saleem Abdullah Muhammad Ayaz 《Entropy (Basel, Switzerland)》2021,23(4)
Spherical hesitant fuzzy sets have recently become more popular in various fields. It was proposed as a generalization of picture hesitant fuzzy sets and Pythagorean hesitant fuzzy sets in order to deal with uncertainty and fuzziness information. Technique of Aggregation is one of the beneficial tools to aggregate the information. It has many crucial application areas such as decision-making, data mining, medical diagnosis, and pattern recognition. Keeping in view the importance of logarithmic function and aggregation operators, we proposed a novel algorithm to tackle the multi-attribute decision-making (MADM) problems. First, novel logarithmic operational laws are developed based on the logarithmic, t-norm, and t-conorm functions. Using these operational laws, we developed a list of logarithmic spherical hesitant fuzzy weighted averaging/geometric aggregation operators to aggregate the spherical hesitant fuzzy information. Furthermore, we developed the spherical hesitant fuzzy entropy to determine the unknown attribute weight information. Finally, the design principles for the spherical hesitant fuzzy decision-making have been developed, and a practical case study of hotel recommendation based on the online consumer reviews has been taken to illustrate the validity and superiority of presented approach. Besides this, a validity test is conducted to reveal the advantages and effectiveness of developed approach. Results indicate that the proposed method is suitable and effective for the decision process to evaluate their best alternative. 相似文献
9.
Korean river design standards set general design standards for rivers and river-related projects in Korea, which systematize the technologies and methods involved in river-related projects. This includes measurement methods for parts necessary for river design, but does not include information on shear stress. Shear stress is one of the factors necessary for river design and operation. Shear stress is one of the most important hydraulic factors used in the fields of water, especially for artificial channel design. Shear stress is calculated from the frictional force caused by viscosity and fluctuating fluid velocity. Current methods are based on past calculations, but factors such as boundary shear stress or energy gradient are difficult to actually measure or estimate. The point velocity throughout the entire cross-section is needed to calculate the velocity gradient. In other words, the current Korean river design standards use tractive force and critical tractive force instead of shear stress because it is more difficult to calculate the shear stress in the current method. However, it is difficult to calculate the exact value due to the limitations of the formula to obtain the river factor called the tractive force. In addition, tractive force has limitations that use an empirically identified base value for use in practice. This paper focuses on the modeling of shear-stress distribution in open channel turbulent flow using entropy theory. In addition, this study suggests a shear stress distribution formula, which can easily be used in practice after calculating the river-specific factor T. The tractive force and critical tractive force in the Korean river design standards should be modified by the shear stress obtained by the proposed shear stress distribution method. The present study therefore focuses on the modeling of shear stress distribution in an open channel turbulent flow using entropy theory. The shear stress distribution model is tested using a wide range of forty-two experimental runs collected from the literature. Then, an error analysis is performed to further evaluate the accuracy of the proposed model. The results reveal a correlation coefficient of approximately 0.95–0.99, indicating that the proposed method can estimate shear-stress distribution accurately. Based on this, the results of the distribution of shear stress after calculating the river-specific factors show a correlation coefficient of about 0.86 to 0.98, which suggests that the equation can be applied in practice. 相似文献
10.
Mostafa Rostaghi Mohammad Mahdi Khatibi Mohammad Reza Ashory Hamed Azami 《Entropy (Basel, Switzerland)》2021,23(11)
Bearing vibration signals typically have nonlinear components due to their interaction and coupling effects, friction, damping, and nonlinear stiffness. Bearing faults affect the signal complexity at various scales. Hence, measuring signal complexity at different scales is helpful to diagnosis of bearing faults. Numerous studies have investigated multiscale algorithms; nevertheless, multiscale algorithms using the first moment lose important complexity data. Accordingly, generalized multiscale algorithms have been recently introduced. The present research examined the use of refined composite generalized multiscale dispersion entropy (RCGMDispEn) based on the second moment (variance) and third moment (skewness) along with refined composite multiscale dispersion entropy (RCMDispEn) in bearing fault diagnosis. Moreover, multiclass FCM-ANFIS, which is a combination of adaptive network-based fuzzy inference systems (ANFIS), was developed to improve the efficiency of rotating machinery fault classification. According to the results, it is recommended that generalized multiscale algorithms based on variance and skewness be examined for diagnosis, along with multiscale algorithms, and be used to achieve an improvement in the results. The simultaneous usage of the multiscale algorithm and generalized multiscale algorithms improved the results in all three real datasets used in this study. 相似文献
11.
Sample entropy, an approximation of the Kolmogorov entropy, was proposed to characterize complexity of a time series, which is essentially defined as , where B denotes the number of matched template pairs with length m and A denotes the number of matched template pairs with , for a predetermined positive integer m. It has been widely used to analyze physiological signals. As computing sample entropy is time consuming, the box-assisted, bucket-assisted, x-sort, assisted sliding box, and kd-tree-based algorithms were proposed to accelerate its computation. These algorithms require or computational complexity, where N is the length of the time series analyzed. When N is big, the computational costs of these algorithms are large. We propose a super fast algorithm to estimate sample entropy based on Monte Carlo, with computational costs independent of N (the length of the time series) and the estimation converging to the exact sample entropy as the number of repeating experiments becomes large. The convergence rate of the algorithm is also established. Numerical experiments are performed for electrocardiogram time series, electroencephalogram time series, cardiac inter-beat time series, mechanical vibration signals (MVS), meteorological data (MD), and noise. Numerical results show that the proposed algorithm can gain 100–1000 times speedup compared to the kd-tree and assisted sliding box algorithms while providing satisfactory approximate accuracy. 相似文献
12.
Liuhai Wang Xin Du Bo Jiang Weifeng Pan Hua Ming Dongsheng Liu 《Entropy (Basel, Switzerland)》2022,24(5)
Software maintenance is indispensable in the software development process. Developers need to spend a lot of time and energy to understand the software when maintaining the software, which increases the difficulty of software maintenance. It is a feasible method to understand the software through the key classes of the software. Identifying the key classes of the software can help developers understand the software more quickly. Existing techniques on key class identification mainly use static analysis techniques to extract software structure information. Such structure information may contain redundant relationships that may not exist when the software runs and ignores the actual interaction times between classes. In this paper, we propose an approach based on dynamic analysis and entropy-based metrics to identify key classes in the Java GUI software system, called KEADA (identifying KEy clAsses based on Dynamic Analysis and entropy-based metrics). First, KEADA extracts software structure information by recording the calling relationship between classes during the software running process; such structure information takes into account the actual interaction of classes. Second, KEADA represents the structure information as a weighted directed network and further calculates the importance of each node using an entropy-based metric OSE (One-order Structural Entropy). Third, KEADA ranks classes in descending order according to their OSE values and selects a small number of classes as the key class candidates. In order to verify the effectiveness of our approach, we conducted experiments on three Java GUI software systems and compared them with seven state-of-the-art approaches. We used the Friedman test to evaluate all approaches, and the results demonstrate that our approach performs best in all software systems. 相似文献
13.
Lina Zhao Jianqing Li Xiangkui Wan Shoushui Wei Chengyu Liu 《Entropy (Basel, Switzerland)》2021,23(9)
Entropy algorithm is an important nonlinear method for cardiovascular disease detection due to its power in analyzing short-term time series. In previous a study, we proposed a new entropy-based atrial fibrillation (AF) detector, i.e., EntropyAF, which showed a high classification accuracy in identifying AF and non-AF rhythms. As a variation of entropy measures, EntropyAF has two parameters that need to be initialized before the calculation: (1) tolerance threshold r and (2) similarity weight n. In this study, a comprehensive analysis for the two parameters determination was presented, aiming to achieve a high detection accuracy for AF events. Data were from the MIT-BIH AF database. RR interval recordings were segmented using a 30-beat time window. The parameters r and n were initialized from a relatively small value, then gradually increased, and finally the best parameter combination was determined using grid searching. AUC (area under curve) values from the receiver operator characteristic curve (ROC) were compared under different parameter combinations of parameters r and n, and the results demonstrated that the selection of these two parameters plays an important role in AF/non-AF classification. Small values of parameters r and n can lead to a better detection accuracy than other selections. The best AUC value for AF detection was 98.15%, and the corresponding parameter combinations for EntropyAF were as follows: r = 0.01, n = 0.0625, 0.125, 0.25, or 0.5; r = 0.05 and n = 0.0625, 0.125, or 0.25; and r = 0.10 and n = 0.0625 or 0.125. 相似文献
14.
Emma Lhermitte Mirvana Hilal Ryan Furlong Vincent OBrien Anne Humeau-Heurtier 《Entropy (Basel, Switzerland)》2022,24(11)
In the domain of computer vision, entropy—defined as a measure of irregularity—has been proposed as an effective method for analyzing the texture of images. Several studies have shown that, with specific parameter tuning, entropy-based approaches achieve high accuracy in terms of classification results for texture images, when associated with machine learning classifiers. However, few entropy measures have been extended to studying color images. Moreover, the literature is missing comparative analyses of entropy-based and modern deep learning-based classification methods for RGB color images. In order to address this matter, we first propose a new entropy-based measure for RGB images based on a multivariate approach. This multivariate approach is a bi-dimensional extension of the methods that have been successfully applied to multivariate signals (unidimensional data). Then, we compare the classification results of this new approach with those obtained from several deep learning methods. The entropy-based method for RGB image classification that we propose leads to promising results. In future studies, the measure could be extended to study other color spaces as well. 相似文献
15.
Jazmín S. De la Cruz-García Juan Bory-Reyes Aldo Ramirez-Arellano 《Entropy (Basel, Switzerland)》2022,24(5)
Decision trees are decision support data mining tools that create, as the name suggests, a tree-like model. The classical C4.5 decision tree, based on the Shannon entropy, is a simple algorithm to calculate the gain ratio and then split the attributes based on this entropy measure. Tsallis and Renyi entropies (instead of Shannon) can be employed to generate a decision tree with better results. In practice, the entropic index parameter of these entropies is tuned to outperform the classical decision trees. However, this process is carried out by testing a range of values for a given database, which is time-consuming and unfeasible for massive data. This paper introduces a decision tree based on a two-parameter fractional Tsallis entropy. We propose a constructionist approach to the representation of databases as complex networks that enable us an efficient computation of the parameters of this entropy using the box-covering algorithm and renormalization of the complex network. The experimental results support the conclusion that the two-parameter fractional Tsallis entropy is a more sensitive measure than parametric Renyi, Tsallis, and Gini index precedents for a decision tree classifier. 相似文献
16.
Mingwei Huang Zijing Zhang Jiaheng Xie Jiahuan Li Yuan Zhao 《Entropy (Basel, Switzerland)》2021,23(11)
Photon counting lidar for long-range detection faces the problem of declining ranging performance caused by background noise. Current anti-noise methods are not robust enough in the case of weak signal and strong background noise, resulting in poor ranging error. In this work, based on the characteristics of the uncertainty of echo signal and noise in photon counting lidar, an entropy-based anti-noise method is proposed to reduce the ranging error under high background noise. Firstly, the photon counting entropy, which is considered as the feature to distinguish signal from noise, is defined to quantify the uncertainty of fluctuation among photon events responding to the Geiger mode avalanche photodiode. Then, the photon counting entropy is combined with a windowing operation to enhance the difference between signal and noise, so as to mitigate the effect of background noise and estimate the time of flight of the laser pulses. Simulation and experimental analysis show that the proposed method improves the anti-noise performance well, and experimental results demonstrate that the proposed method effectively mitigates the effect of background noise to reduce ranging error despite high background noise. 相似文献
17.
The uncertainty of information is an important issue that must be faced when dealing with decision-making problems. Randomness and fuzziness are the two most common types of uncertainty. In this paper, we propose a multicriteria group decision-making method based on intuitionistic normal cloud and cloud distance entropy. First, the backward cloud generation algorithm for intuitionistic normal clouds is designed to transform the intuitionistic fuzzy decision information given by all experts into an intuitionistic normal cloud matrix to avoid the loss and distortion of information. Second, the distance measurement of the cloud model is introduced into the information entropy theory, and the concept of cloud distance entropy is proposed. Then, the distance measurement for intuitionistic normal clouds based on numerical features is defined and its properties are discussed, based on which the criterion weight determination method under intuitionistic normal cloud information is proposed. In addition, the VIKOR method, which integrates group utility and individual regret, is extended to the intuitionistic normal cloud environment, and thus the ranking results of the alternatives are obtained. Finally, the effectiveness and practicality of the proposed method are demonstrated by two numerical examples. 相似文献
18.
Yuta Nakahara Shota Saito Akira Kamatsuka Toshiyasu Matsushima 《Entropy (Basel, Switzerland)》2022,24(3)
The recursive and hierarchical structure of full rooted trees is applicable to statistical models in various fields, such as data compression, image processing, and machine learning. In most of these cases, the full rooted tree is not a random variable; as such, model selection to avoid overfitting is problematic. One method to solve this problem is to assume a prior distribution on the full rooted trees. This enables the optimal model selection based on Bayes decision theory. For example, by assigning a low prior probability to a complex model, the maximum a posteriori estimator prevents the selection of the complex one. Furthermore, we can average all the models weighted by their posteriors. In this paper, we propose a probability distribution on a set of full rooted trees. Its parametric representation is suitable for calculating the properties of our distribution using recursive functions, such as the mode, expectation, and posterior distribution. Although such distributions have been proposed in previous studies, they are only applicable to specific applications. Therefore, we extract their mathematically essential components and derive new generalized methods to calculate the expectation, posterior distribution, etc. 相似文献
19.
Zhe Li Yahui Cui Longlong Li Runlin Chen Liang Dong Juan Du 《Entropy (Basel, Switzerland)》2022,24(3)
In order to detect the incipient fault of rolling bearings and to effectively identify fault characteristics, based on amplitude-aware permutation entropy (AAPE), an enhanced method named hierarchical amplitude-aware permutation entropy (HAAPE) is proposed in this paper to solve complex time series in a new dynamic change analysis. Firstly, hierarchical analysis and AAPE are combined to excavate multilevel fault information, both low-frequency and high-frequency components of the abnormal bearing vibration signal. Secondly, from the experimental analysis, it is found that HAAPE is sensitive to the early failure of rolling bearings, which makes it suitable to evaluate the performance degradation of a bearing in its run-to-failure life cycle. Finally, a fault feature selection strategy based on HAAPE is put forward to select the bearing fault characteristics after the application of the least common multiple in singular value decomposition (LCM-SVD) method to the fault vibration signal. Moreover, several other entropy-based methods are also introduced for a comparative analysis of the experimental data, and the results demonstrate that HAAPE can extract fault features more effectively and with a higher accuracy. 相似文献
20.
Krzysztof Okarma Wojciech Chlewicki Mateusz Kopytek Beata Marciniak Vladimir Lukin 《Entropy (Basel, Switzerland)》2021,23(11)
Quality assessment of stitched images is an important element of many virtual reality and remote sensing applications where the panoramic images may be used as a background as well as for navigation purposes. The quality of stitched images may be decreased by several factors, including geometric distortions, ghosting, blurring, and color distortions. Nevertheless, the specificity of such distortions is different than those typical for general-purpose image quality assessment. Therefore, the necessity of the development of new objective image quality metrics for such type of emerging applications becomes obvious. The method proposed in the paper is based on the combination of features used in some recently proposed metrics with the results of the local and global image entropy analysis. The results obtained applying the proposed combined metric have been verified using the ISIQA database, containing 264 stitched images of 26 scenes together with the respective subjective Mean Opinion Scores, leading to a significant increase of its correlation with subjective evaluation results. 相似文献