首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 375 毫秒
1.
The features used may have an important effect on the performance of credit scoring models. The process of choosing the best set of features for credit scoring models is usually unsystematic and dominated by somewhat arbitrary trial. This paper presents an empirical study of four machine learning feature selection methods. These methods provide an automatic data mining technique for reducing the feature space. The study illustrates how four feature selection methods—‘ReliefF’, ‘Correlation-based’, ‘Consistency-based’ and ‘Wrapper’ algorithms help to improve three aspects of the performance of scoring models: model simplicity, model speed and model accuracy. The experiments are conducted on real data sets using four classification algorithms—‘model tree (M5)’, ‘neural network (multi-layer perceptron with back-propagation)’, ‘logistic regression’, and ‘k-nearest-neighbours’.  相似文献   

2.
We studied a population of paraplegic patients in order to give prominence to a possible relationship between the topography of their spinal lesion and the occurrence of special articular diseases (P.O.A.). According to the motor and sensory state of their spinal cord, we first tried to obtain a classification of these lesions (the usual one schematically separates ‘flaccid’ and ‘rigid’ paraplegics). We mainly put the emphasis on this clustering step of the study:
    相似文献   

3.
Databases require a management system which is capable of retrieving and storing information as efficiently as possible. The data placement problem is concerned with obtaining an optimal assignment of data tuples onto secondary storage devices. Such tuples have complicated interrelationships which make it difficult to find an exact solution to our problem in a realistic time.We therefore consider heuristic methods—three of which are discussed and compared — the ‘greedy’ graph-collapsing method, the probabilistic hill-climbing method of simulated annealing and a third ‘greedy’ heuristic, the random improvement method, which is a local search heuristic. Overall, the best performance is obtained from the graph-collapsing method for the less complicated situations, but for larger-scale problems with complex interrelationships between tuples the simulated annealing and random improvement algorithms give better results.  相似文献   

4.

K-Nearest Neighbours (k-NN) is a popular classification and regression algorithm, yet one of its main limitations is the difficulty in choosing the number of neighbours. We present a Bayesian algorithm to compute the posterior probability distribution for k given a target point within a data-set, efficiently and without the use of Markov Chain Monte Carlo (MCMC) methods or simulation—alongside an exact solution for distributions within the exponential family. The central idea is that data points around our target are generated by the same probability distribution, extending outwards over the appropriate, though unknown, number of neighbours. Once the data is projected onto a distance metric of choice, we can transform the choice of k into a change-point detection problem, for which there is an efficient solution: we recursively compute the probability of the last change-point as we move towards our target, and thus de facto compute the posterior probability distribution over k. Applying this approach to both a classification and a regression UCI data-sets, we compare favourably and, most importantly, by removing the need for simulation, we are able to compute the posterior probability of k exactly and rapidly. As an example, the computational time for the Ripley data-set is a few milliseconds compared to a few hours when using a MCMC approach.

  相似文献   

5.
Traditional classification methods are divided into two broad types: hierarchical methods and non-hierarchical methods in which the number of classes has to be fixed in advance. Both methods can handle both quantitative and qualitative data. A third type finds a partition optimizing a linear classification criterion (e.g. the Condorcet criterion) in which the number of classes does not have to be fixed in advance, but the data must be qualitative. A recent generalization, the ‘S theory’ can handle simultaneously both quantitative and qualitative data, and both linear and non-linear classification criteria (in the space of paired comparisons of elements). With this ‘S theory’ the partition is obtained in order n (in terms of memory space and elementary operations), n being the number of elements to classify.  相似文献   

6.
In this paper we combine the idea of ‘power steady model’, ‘discount factor’ and ‘power prior’, for a general class of filter model, more specifically within a class of dynamic generalized linear models (DGLM). We show an optimality property for our proposed method and present the particle filter algorithm for DGLM as an alternative to Markov chain Monte Carlo method. We also present two applications; one on dynamic Poisson models for hurricane count data in Atlantic ocean and the another on the dynamic Poisson regression model for longitudinal count data.  相似文献   

7.
Measuring performance of microfinance institutions (MFIs) is challenging as MFIs must achieve the twin objectives of outreach and sustainability. We propose a new measure to capture the performance of MFIs by placing their twin achievements in a 2 × 2 grid of a classification matrix. To make a dichotomous classification, MFIs that meet both their twin objectives are classified as ‘1’ and MFIs who could not meet their dual objectives simultaneously are designated as ‘0’. Six classifiers are applied to analyze the operating and financial characteristics of MFIs that can offer a predictive modeling solution in achieving their objectives and the results of the classifiers are comprehended using technique for order preference by similarity to ideal solution to identify an appropriate classifier based on ranking of measures of performance. Out of six classifiers applied in the study, kernel lab-support vector machines achieved highest accuracy and lowest classification error rate that discriminates the best achievement of the MFIs’ twin objective. MFIs can use both these steps to identify whether they are on the right path to attaining their multiple objectives from their operating characteristics.  相似文献   

8.
The reinstallation of different plant-species and their evolution during a ten-year period in a heathland after a fire accident has been studied by using the algorithm of heirarchical classification based on correlation (ABC) introduced by Tallur.1 Classification under contiguity restraint of the set of observation points (subintervals of an experimental observation line) enables one to determine the ‘patches’ having uniform vegetation structure. Lerman's ‘local’ and ‘global’ statistics are used to condense the classification tree to its significant nodes and to choose the most significant partition. x2 statistics are proposed to test whether a given patch has a ‘significant’ vegetation structure and if the association between a patch and a plant species is significant. Evolution of the horizontal structure of vegetation is studied by comparing the sets of patches obtained at successive observation dates and the corresponding dominant species.  相似文献   

9.
We introduce a new approach to assigning bank account holders to ‘good’ or ‘bad’ classes based on their future behaviour. Traditional methods simply treat the classes as qualitatively distinct, and seek to predict them directly, using statistical techniques such as logistic regression or discriminant analysis based on application data or observations of previous behaviour. We note, however, that the ‘good’ and ‘bad’ classes are defined in terms of variables such as the amount overdrawn at the time at which the classification is required. This permits an alternative, ‘indirect’, form of classification model in which, first, the variables defining the classes are predicted, for example using regression, and then the class membership is derived deterministically from these predicted values. We compare traditional direct methods with these new indirect methods using both real bank data and simulated data. The new methods appear to perform very similarly to the traditional methods, and we discuss why this might be. Finally, we note that the indirect methods also have certain other advantages over the traditional direct methods.  相似文献   

10.
Relationships between the concept of an exercise, of formulation of the text, of an exercise solution, and of the final result are analysed.

For mathematical exercises, deductive directed reasoning is of special importance. A classification of reasonings gives a basis for the division of the exercises. Among problem exercises, ‘open problems’ and solutions of exercises extended by an approach called in German, ‘Methode der erzeugenden Probleme’, are important from the point of view of didactics.

Empirical studies concerning the influence of the exercise's formulation on the efficiency of its solution, have been carried out for a set of about 16 thousand exercises.  相似文献   

11.
We extend Whitney's Theorem that every plane triangulation without separating triangles is hamiltonian by allowing some separating triangles. More precisely, we define a decomposition of a plane triangulation G into 4‐connected ‘pieces,’ and show that if each piece shares a triangle with at most three other pieces then G is hamiltonian. We provide an example to show that our hypothesis that each piece shares a triangle with at most three other pieces' cannot be weakened to ‘four other pieces.’ As part of our proof, we also obtain new results on Tutte cycles through specified vertices in planar graphs. © 2002 Wiley Periodicals, Inc. J Graph Theory 41: 138–150, 2002  相似文献   

12.
The curse of dimensionality is based on the fact that high dimensional data is often difficult to work with. A large number of features can increase the noise of the data and thus the error of a learning algorithm. Feature selection is a solution for such problems where there is a need to reduce the data dimensionality. Different feature selection algorithms may yield feature subsets that can be considered local optima in the space of feature subsets. Ensemble feature selection combines independent feature subsets and might give a better approximation to the optimal subset of features. We propose an ensemble feature selection approach based on feature selectors’ reliability assessment. It aims at providing a unique and stable feature selection without ignoring the predictive accuracy aspect. A classification algorithm is used as an evaluator to assign a confidence to features selected by ensemble members based on their associated classification performance. We compare our proposed approach to several existing techniques and to individual feature selection algorithms. Results show that our approach often improves classification performance and feature selection stability for high dimensional data sets.  相似文献   

13.
Harmonic oscillator equations of the form ÿ + ?2y = h(t) where ? is a real constant and h(t) is a continuous, piecewise smooth, periodic ‘forcing’ function are considered. The exact solution, obtained through the Laplace transform is cumbersome to handle over long t intervals, and thus solving ‘term-by-term’ by replacing h(t) by its Fourier series is an attractive and accurate alternative. But this solution is an infinite series involving sums of sine and cosine terms, and thus one should worry about convergence of a solution in this form. In the article, it is shown that such a series solution indeed converges uniformly over the entire real line and is twice continuously differentiable, the derivatives being calculated ‘term-by-term’. Only results commonly available in the undergraduate literature are used to verify this and in so doing, a non-trivial application of these results is given. Also included are some interesting problems suitable for undergraduate research.  相似文献   

14.
15.
The Monster tower ([MZ01], [MZ10]), known as the Semple Tower in Algebraic Geometry ([Sem54], [Ber10]), is a tower of fibrations canonically constructed over an initial smooth n-dimensional base manifold. Each consecutive fiber is a projective n — 1 space. Each level of the tower is endowed with a rank n distribution, that is, a subbundle of its tangent bundle. The pseudogroup of diffeomorphisms of the base acts on each level so as to preserve the fibration and the distribution. The main problem is to classify orbits (equivalence classes) relative to this action. Analytic curves in the base can be prolonged (= Nash blown-up) to curves in the tower which are integral for the distribution. Prolongation yields a dictionary between singularity classes of curves in the base n-space and orbits in the tower. This dictionary yielded a rather complete solution to the classification problem for n = 2 ([MZ10]). A key part of this solution was the construction of the ‘RVT’ classes, a discrete set of equivalence classes built from verifying conditions of transversality or tangency to the fiber at each level ([MZ10]). Here we define analogous ‘RC’ classes for n > 2 indexed by words in the two letters, R (for regular, or transverse) and C (for critical, or tangent). There are 2 k?1 such classes of length k and they exhaust the tower at level k. The codimension of such a class is the number of C’s in its word. We attack the classification problem by codimension, rather than level. The codimension 0 class is open and dense and its structure is well known. We prove that any point of any codimension 1 class is realized by a curve having a classical A 2k singularity (k depending on the type of class). Following ([MZ10]) we define what it means for a singularity class in the tower to be “tower simple”. The codimension 0 and 1 classes are tower simple, and tower simple implies simple in the usual sense of singularity. Our main result is a classification of the codimension 2 tower simple classes in any dimension n. A key step in the classification asserts that any point of any codimension 2 singularity is realized by a curve of multiplicity 3 or 4. A central tool used in the classification are the listings of curve singularities due to Arnol’d ([Arn99], Bruce-Gaffney ([BG82]), and Gibson-Hobbs ([GH93]). We also classify the first occurring truly spatial singularities as subclasses of the codimension 2 classes. (A point or a singularity class is “spatial” if there is no curve which realizes it and which can be made to lie in some smooth surface.) As a step in the classification theorem we establish the existence of a canonical arrangement of hyperplanes at each point, lying in the distribution n-plane at that point. This arrangement leads to a coding scheme finer than the RC coding. Using the arrangement coding we establish the lower bound of 29 for the number of distinct orbits in the case n = 3 and level 4. Finally, Mormul ([Mor04], [Mor09]) has defined a different coding scheme for singularity classes in the tower and in an appendix we establish some relations between our coding and his.  相似文献   

16.
We show that, under certain smoothness conditions, a Brownian martingale, when evaluated at a fixed time, can be represented via an exponential formula at a later time. The time-dependent generator of this exponential operator only depends on the second order Malliavin derivative operator evaluated along a ‘frozen path’. The exponential operator can be expanded explicitly to a series representation, which resembles the Dyson series of quantum mechanics. Our continuous-time martingale representation result can be proven independently by two different methods. In the first method, one constructs a time-evolution equation, by passage to the limit of a special case of a backward Taylor expansion of an approximating discrete-time martingale. The exponential formula is a solution of the time-evolution equation, but we emphasize in our article that the time-evolution equation is a separate result of independent interest. In the second method, we use the property of denseness of exponential functions. We provide several applications of the exponential formula, and briefly highlight numerical applications of the backward Taylor expansion.  相似文献   

17.
In this paper we apply stochastic programming modelling and solution techniques to planning problems for a consortium of oil companies. A multiperiod supply, transformation and distribution scheduling problem—the Depot and Refinery Optimization Problem (DROP)—is formulated for strategic or tactical level planning of the consortium's activities. This deterministic model is used as a basis for implementing a stochastic programming formulation with uncertainty in the product demands and spot supply costs (DROPS), whose solution process utilizes the deterministic equivalent linear programming problem. We employ our STOCHGEN general purpose stochastic problem generator to ‘recreate’ the decision (scenario) tree for the unfolding future as this deterministic equivalent. To project random demands for oil products at different spatial locations into the future and to generate random fluctuations in their future prices/costs a stochastic input data simulator is developed and calibrated to historical industry data. The models are written in the modelling language XPRESS-MP and solved by the XPRESS suite of linear programming solvers. From the viewpoint of implementation of large-scale stochastic programming models this study involves decisions in both space and time and careful revision of the original deterministic formulation. The first part of the paper treats the specification, generation and solution of the deterministic DROP model. The stochastic version of the model (DROPS) and its implementation are studied in detail in the second part and a number of related research questions and implications discussed.  相似文献   

18.
We consider a two-dimensional homogeneous elastic state in the arch-like region a?≤?r?≤?b, 0?≤?θ?≤?α, where (r,θ) denotes plane polar coordinates. We assume that three of the edges are traction-free, while the fourth edge is subjected to a (in plane) self-equilibrated load. The Airy stress function ‘?’ satisfies a fourth-order differential equation in the plane polar coordinates with appropriate boundary conditions. We develop a method which allows us to treat in a unitary way the two problems corresponding to the self-equilibrated loads distributed on the straight and curved edges of the region. In fact, we introduce an appropriate change for the variable r and for the Airy stress functions to reduce the corresponding boundary value problem to a simpler one which allows us to indicate an appropriate measure of the solution valuable for both the types of boundary value problems. In terms of such measures we are able to establish some spatial estimates describing the spatial behavior of the Airy stress function. In particular, our spatial decay estimates prove a clear relationship with the Saint-Venant's principle on such regions.  相似文献   

19.
The system dynamics concept of ‘generic structure’ is dividable into three sub-types. This paper analyses the validity of these three, using both practical and theoretical perspectives. Firstly, a new set of measures is developed for generating validity—‘confidence’—amongst a group using generic structures in a practical modelling situation. It is concluded that different confidence criteria are implicitly employed; there is an argument for trading-off model precision and analytical quality for simplicity and ease of use and future research is needed to combine these ‘process’ and ‘content’ aspects of confidence. From a theoretical stance it is shown that with two of the sub-types a scientific notion of confidence is achievable whereas the third (‘archetypes’) involves merely metaphorical thinking. It is concluded that the theoretical status of archetypes requires further development, whilst ensuring that its benefits are retained.  相似文献   

20.
Analytic network process (ANP) addresses multi-attribute decision-making where attributes exhibit dependencies. A principal characteristic of such problems is that pairwise comparisons are needed for attributes that have interdependencies. We propose that before such comparison matrices are used—in addition to a test that assesses the consistency of a pairwise comparison matrix—a test must also be conducted to assess ‘consistency’ across interdependent matrices. We call such a cross-matrix consistency test as a compatibility test. In this paper, we design a compatibility test for interdependent matrices between two clusters of attributes. We motivate our exposition by addressing compatibility in Sinarchy, a special form of ANP where interdependency exists between the last and next-to-last level. The developed compatibility test is applicable to any pair of interdependent matrices that are a part of an ANP.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号