首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
There is a growing interest in applying mathematical theories and methods from topology, computational geometry, differential equations, fluid dynamics, quantum statistics, etc. to describe and to analyze scientific regularities of diverse, massive, complex, nonlinear, and fast changing data accumulated continuously around the world and in discovering and revealing valid, insightful, and valuable knowledge that data imply. With increasingly solid mathematical foundations, various methods and techniques have been studied and developed for data mining, modeling, and processing, and knowledge representation, organization, and verification; different systems and mechanisms have been designed to perform data-intensive tasks in many application fields for classification, predication, recommendation, ranking, filtering, etc. This special focus of Mathematics in Computer Science is organized to stimulate original research on the interaction of mathematics with data and knowledge, in particular the exploration of new mathematical theories and methodologies for data modeling and analysis and knowledge discovery and management, the study of mathematical models of big data and complex knowledge, and the development of novel solutions and strategies to enhance the performance of existing systems and mechanisms for data and knowledge processing. The present foreword provides a short review of some key ideas and techniques on how mathematics interacts with data and knowledge, together with a few selected research directions and problems and a brief introduction to the four papers published in the focus.  相似文献   

2.
遥感影像分类作为遥感技术的一个重要应用,对遥感技术的发展具有重要作用.针对遥感影像数据特点,在目前的非线性研究方法中主要用到的是BP神经网络模型.但是BP神经网络模型存在对初始权阈值敏感、易陷入局部极小值和收敛速度慢的问题.因此,为了提高模型遥感影像分类精度,提出采用MEA-BP模型进行遥感影像数据分类.首先采用思维进化算法代替BP神经网络算法进行初始寻优,再用改进BP算法对优化的网络模型权阈值进一步精确优化,随后建立基于思维进化算法的BP神经网络分类模型,并将其应用到遥感影像数据分类研究中.仿真结果表明,新模型有效提高了遥感影像分类准确性,为遥感影像分类提出了一种新的方法,具有广泛研究价值.  相似文献   

3.
Classification models can be developed by statistical or mathematical programming discriminant analysis techniques. Variable selection extensions of these techniques allow the development of classification models with a limited number of variables. Although stepwise statistical variable selection methods are widely used, the performance of the resultant classification models may not be optimal because of the stepwise selection protocol and the nature of the group separation criterion. A mixed integer programming approach for selecting variables for maximum classification accuracy is developed in this paper and the performance of this approach, measured by the leave-one-out hit rate, is compared with the published results from a statistical approach in which all possible variable subsets were considered. Although this mixed integer programming approach can only be applied to problems with a relatively small number of observations, it may be of great value where classification decisions must be based on a limited number of observations.  相似文献   

4.
The development of credit risk assessment models is often considered within a classification context. Recent studies on the development of classification models have shown that a combination of methods often provides improved classification results compared to a single-method approach. Within this context, this study explores the combination of different classification methods in developing efficient models for credit risk assessment. A variety of methods are considered in the combination, including machine learning approaches and statistical techniques. The results illustrate that combined models can outperform individual models for credit risk analysis. The analysis also covers important issues such as the impact of using different parameters for the combined models, the effect of attribute selection, as well as the effects of combining strong or weak models.  相似文献   

5.
The Bayesian data reduction algorithm (BDRA) is compared to traditional classification methods as well as feed forward artificial neural networks through a rigorous experiment. The BDRA performs comparably to alternative techniques and approaches theoretical optimal classification rates. Furthermore, it has a fundamentally different method for determining class membership. This study is novel in that it explores how the BDRA relates to established techniques, how it might be used in an explanatory manner, and how best to use it. © 2009 Wiley Periodicals, Inc. Complexity, 2010  相似文献   

6.
Cancer classification using genomic data is one of the major research areas in the medical field. Therefore, a number of binary classification methods have been proposed in recent years. Top Scoring Pair (TSP) method is one of the most promising techniques that classify genomic data in a lower dimensional subspace using a simple decision rule. In the present paper, we propose a supervised classification technique that utilizes incremental generalized eigenvalue and top scoring pair classifiers to obtain higher classification accuracy with a small training set. We validate our method by applying it to well known microarray data sets.  相似文献   

7.
There is lot of excitement with Pattern Recognition methods with high precision, since this problem area is a well-established field of Operations Research (O.R.). Recent work of some researchers has shown that O.R. methods in general and Optimisation methods in particular, can be applied to give some very good results. Thus this research area has been won back from the Artificial Intelligence community and is quickly becoming once more a fast growing field in O.R. The aim of this review is to examine the early success of classification methods and Pattern Recognition methods, consider their downfall and examine the new techniques that have been applied to make it like a resurgent Phoenix. It will be shown that optimisation methods, if carried out properly, through a formal analysis of their structure and their requirements can achieve correct classification with probability one. Many researchers make it more difficult for themselves by not considering the formalisation of the task concerned and so adapt heuristics to the problem. Computational methods taken from the Irvine Repository database on recognition instances will be placed in evidence. The outline of the paper is as follows. After the introduction a historical sketch of the field will be presented. Then in Section 3, the need for formal methods will be argued and various results on formal requirements as convergence etc. will be derived. Many of these formal requirements are of course related to the best-unbiased estimate (b.u.e) requirements in Statistics. In Section 4 some popular algorithms for Pattern Recognition will be presented and their degree of satisfaction of the formal requirements stressed, allowing in Section 5 to present many applications, so that conclusions can be reached in Section 6. It will be found that the satisfaction of the formal requirements is a necessary and sufficient condition to reach recognition with probability one.  相似文献   

8.
This paper presents an empirical comparison of three classification methods: neural networks, decision tree induction and linear discriminant analysis. The comparison is based on seven datasets with different characteristics, four being real, and three artificially created. Analysis of variance was used to detect any significant differences between the performance of the methods. There is also some discussion of the problems involved with using neural networks and, in particular, on overfitting of the training data. A comparison between two methods to prevent overfitting is presented: finding the most appropriate network size, and the use of an independent validation set to determine when to stop training the network.  相似文献   

9.
The wide availability of computer technology and large electronic storage media has led to an enormous proliferation of databases in almost every area of human endeavour. This naturally creates an intense demand for powerful methods and tools for data analysis. Current methods and tools are primarily oriented toward extracting numerical and statistical data characteristics. While such characteristics are very important and useful, they are often insufficient. A decision maker typically needs an interpretation of these findings, and this has to be done by a data analyst. With the growth in the amount and complexity of the data, making such interpretations is an increasingly difficult problem. As a potential solution, this paper advocates the development of methods for conceptual data analysis. Such methods aim at semi-automating the processes of determining high-level data interpretations, and discovering qualitative patterns in data. It is argued that these methods could be built on the basis of algorithms developed in the area of machine learning. An exemplary system utilizing such algorithms, INLEN, is discussed. The system integrates machine learning and statistical analysis techniques with database and expert system technologies. Selected capabilities of the system are illustrated by examples from implemented modules.  相似文献   

10.
For more than a decade, the number of research works that deal with ensemble methods applied to bankruptcy prediction has been increasing. Ensemble techniques present some characteristics that, in most situations, allow them to achieve better forecasts than those estimated with single models. However, the difference between the performance of an ensemble and that of its base classifier but also between that of ensembles themselves, is often low. This is the reason why we studied a way to design an ensemble method that might achieve better forecasts than those calculated with traditional ensembles. It relies on a quantification process of data that characterize the financial situation of a sample of companies using a set of self-organizing neural networks, where each network has two main characteristics: its size is randomly chosen and the variables used to estimate its weights are selected based on a criterion that ensures the fit between the structure of the network and the data used over the learning process. The results of our study show that this technique makes it possible to significantly reduce both the type I and type II errors that can be obtained with conventional methods.  相似文献   

11.
Stochastic Global Optimization: Problem Classes and Solution Techniques   总被引:4,自引:0,他引:4  
There is a lack of a representative set of test problems for comparing global optimization methods. To remedy this a classification of essentially unconstrained global optimization problems into unimodal, easy, moderately difficult, and difficult problems is proposed. The problem features giving this classification are the chance to miss the region of attraction of the global minimum, embeddedness of the global minimum, and the number of minimizers. The classification of some often used test problems are given and it is recognized that most of them are easy and some even unimodal. Global optimization solution techniques treated are global, local, and adaptive search and their use for tackling different classes of problems is discussed. The problem of fair comparison of methods is then adressed. Further possible components of a general global optimization tool based on the problem classes and solution techniques is presented.  相似文献   

12.
We present a very fast algorithm for general matrix factorization of a data matrix for use in the statistical analysis of high-dimensional data via latent factors. Such data are prevalent across many application areas and generate an ever-increasing demand for methods of dimension reduction in order to undertake the statistical analysis of interest. Our algorithm uses a gradient-based approach which can be used with an arbitrary loss function provided the latter is differentiable. The speed and effectiveness of our algorithm for dimension reduction is demonstrated in the context of supervised classification of some real high-dimensional data sets from the bioinformatics literature.  相似文献   

13.
Statistics education is under review at all educational levels. Statistical concepts, as well as the use of statistical methods and techniques, can be taught in at least two contrasting ways. Specifically, (1) teaching can be theoretically and mathematically oriented, or (2) it can be less mathematically oriented being focused, instead, on application and the use of data to solve real-world problems. The second approach is growing in practice and new goals have recently emerged. At present, statistics courses stress probability concepts, data analysis, and the interpretation and communication of results. Understanding the process of statistical investigation is established as a way of improving mastery of statistical reasoning. In this context, a project-based approach allows the design and implementation of participating learning scenarios in order to understand the statistical methodology and, as a consequence, improve research. This approach points out that statistics is a rational methodology used to solve practical problems. The purpose of this paper is to present the design and results of an applied statistics course for PhD students in ecology and systematics using a project-based approach. Examples involving character coding, species classification, and the interpretation of geographical variation, which are the principal systematic analyses requiring statistical techniques, are presented using the results from student projects. In addition, an example from conservation ecology is presented. Results indicate that the students understood the concepts and applied the systematic and statistical techniques accurately using a data oriented approach.  相似文献   

14.
This paper discusses the applications of certain combinatorial and probabilistic techniques to the analysis of machine learning. Probabilistic models of learning initially addressed binary classification (or pattern classification). Subsequently, analysis was extended to regression problems, and to classification problems in which the classification is achieved by using real-valued functions (where the concept of a large margin has proven useful). Another development, important in obtaining more applicable models, has been the derivation of data-dependent bounds. Here, we discuss some of the key probabilistic and combinatorial techniques and results, focusing on those of most relevance to researchers in discrete applied mathematics.  相似文献   

15.
One issue in data classification problems is to find an optimal subset of instances to train a classifier. Training sets that represent well the characteristics of each class have better chances to build a successful predictor. There are cases where data are redundant or take large amounts of computing time in the learning process. To overcome this issue, instance selection techniques have been proposed. These techniques remove examples from the data set so that classifiers are built faster and, in some cases, with better accuracy. Some of these techniques are based on nearest neighbors, ordered removal, random sampling and evolutionary methods. The weaknesses of these methods generally involve lack of accuracy, overfitting, lack of robustness when the data set size increases and high complexity. This work proposes a simple and fast immune-inspired suppressive algorithm for instance selection, called SeleSup. According to self-regulation mechanisms, those cells unable to neutralize danger tend to disappear from the organism. Therefore, by analogy, data not relevant to the learning of a classifier are eliminated from the training process. The proposed method was compared with three important instance selection algorithms on a number of data sets. The experiments showed that our mechanism substantially reduces the data set size and is accurate and robust, specially on larger data sets.  相似文献   

16.
With the rapid growth of databases in many modern enterprises data mining has become an increasingly important approach for data analysis. The operations research community has contributed significantly to this field, especially through the formulation and solution of numerous data mining problems as optimization problems, and several operations research applications can also be addressed using data mining methods. This paper provides a survey of the intersection of operations research and data mining. The primary goals of the paper are to illustrate the range of interactions between the two fields, present some detailed examples of important research work, and provide comprehensive references to other important work in the area. The paper thus looks at both the different optimization methods that can be used for data mining, as well as the data mining process itself and how operations research methods can be used in almost every step of this process. Promising directions for future research are also identified throughout the paper. Finally, the paper looks at some applications related to the area of management of electronic services, namely customer relationship management and personalization.  相似文献   

17.
Statistical methods of discrimination and classification are used for the prediction of protein structure from amino acid sequence data. This provides information for the establishment of new paradigms of carcinogenesis modeling on the basis of gene expression. Feed forward neural networks and standard statistical classification procedures are used to classify proteins into fold classes. Logistic regression, additive models, and projection pursuit regression from the family of methods based on a posterior probabilities; linear, quadratic, and a flexible discriminant analysis from the class of methods based on class conditional probabilities, and the nearest-neighbors classification rule are applied to a data set of 268 sequences. From analyzing the prediction error obtained with a test sample (n = 125) and with a cross validation procedure, we conclude that the standard linear discriminant analysis and nearest-neighbor methods are at the same time statistically feasible and potent competitors to the more flexible tools of feed forward neural networks. Further research is needed to explore the gain obtainable from statistical methods by the application to larger sets of protein sequence data and to compare the results with those from biophysical approaches.  相似文献   

18.
传统的聚类方法由于无法提取样本和变量间的局部对应关系,并且当数据具有高维性和稀疏性时表现不佳,因此学者们提出了双向聚类,基于样本和变量间的局部关系,同时对样本和变量进行聚类,形成一系列子矩阵的聚类结果。近年来,双向聚类发展迅速,在基因分析、文本聚类、推荐系统等领域应用广泛。首先,对双向聚类方法进行梳理与归纳,重点阐述稀疏双向聚类、谱双向聚类和信息双向聚类三类方法,分析它们之间的区别和联系,并且介绍这三类方法在多源数据的整合分析、多层聚类、半监督学习以及集成学习上的发展现状和趋势;其次,重点介绍双向聚类在基因分析、文本聚类、推荐系统等领域的应用研究情况;最后,结合大数据时代的数据特征和双向聚类存在的问题,展望双向聚类未来的研究方向。  相似文献   

19.
For degradation data in reliability analysis, estimation of the first‐passage time (FPT) distribution to a threshold provides valuable information on reliability characteristics. Recently, Balakrishnan and Qin (2019; Applied Stochastic Models in Business and Industry, 35:571–590) studied a nonparametric method to approximate the FPT distribution of such degradation processes if the underlying process type is unknown. In this article, we propose some improved techniques based on saddlepoint approximation, which enhance those existing methods. Numerical examples and Monte Carlo simulation studies are used to illustrate the advantages of the proposed techniques. Limitations of the improved techniques are discussed and some possible solutions to such are proposed. Some concluding remarks and practical recommendations are provided based on the results.  相似文献   

20.
This paper describes some aspects of cost effectiveness methodology and operational research as they have been applied in a system design study for a military communications system. There are two main areas of interest from an operational research point of view:(a) The attempt to use cost effectiveness analysis as an integral part of system design.(b) The development and application of new techniques (notably in stochastic network analysis and simulation) which are potentially of much wider application.There are several ways of attacking the problem of multiple objectives encountered in a cost effectiveness analysis. These are briefly described and the preferred method of a single measure of effectiveness is discussed in detail. The measure used in the communications system design study is presented and the method of evaluating it by simulation is described. The next step after evaluation of the effectiveness is optimization and here the use of the Lagrange multipliers is introduced. This method requires iteration on the values of performance parameters and their costs and this becomes very time-consuming if a simulation must be performed each time. It is here that the novel methods of analysing networks are developed. The main use of these methods of analysis, or reduction rules, has been in reducing the size and complexity of the simulations. The technique which has contributed most to the reduction in the number of simulations required to arrive at an optimum disposition of resources is a method of carrying out a sensitivity analysis based on data collected during a single simulation run. This hybrid analytical-cum-simulation technique is discussed in detail with reference to a communications system, and its application to a wider range of problems, such as probabilistic PERT, indicated.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号