首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 203 毫秒
1.
遥感影像分类作为遥感技术的一个重要应用,对遥感技术的发展具有重要作用.针对遥感影像数据特点,在目前的非线性研究方法中主要用到的是BP神经网络模型.但是BP神经网络模型存在对初始权阈值敏感、易陷入局部极小值和收敛速度慢的问题.因此,为了提高模型遥感影像分类精度,提出采用MEA-BP模型进行遥感影像数据分类.首先采用思维进化算法代替BP神经网络算法进行初始寻优,再用改进BP算法对优化的网络模型权阈值进一步精确优化,随后建立基于思维进化算法的BP神经网络分类模型,并将其应用到遥感影像数据分类研究中.仿真结果表明,新模型有效提高了遥感影像分类准确性,为遥感影像分类提出了一种新的方法,具有广泛研究价值.  相似文献   

2.
解决不平衡数据分类问题,在现实中有着深远的意义。马田系统利用单一的正常类别构建基准空间和测量基准尺度,并由此建立数据分类模型,十分适合不平衡数据分类问题的处理。本文以传统马田系统方法为基础,结合信噪比及F-value、G-mean等分类精度,建立了基于遗传算法的基准空间优化模型,同时运用Bagging集成化算法,构造了改进马田系统模型算法GBMTS。通过对不同分类方法及相关数据集的实验分析,表明:GBMTS算法较其他分类算法,更能够有效的处理不平衡数据的分类问题。  相似文献   

3.
朴素贝叶斯分类器(Naive Bayes,NB)是一种简单而有效的分类器,特别适用于中小规模数据分类.但作为以整体分类正确率为指导的传统分类方法,它在不平衡数据分类中对少数类的分类能力较弱.针对此问题,本文采用属性加权的方法增强朴素贝叶斯对于少数类的分类能力.类依赖属性加权朴素贝叶斯(class-specific at...  相似文献   

4.
杨鑫  吴密霞 《数学学报》2023,(2):263-276
本文考虑多源异质大数据下线性模型的分布式统计推断问题.首先,提出针对模型参数的通信有效的分布式聚合估计及算法,并在一些正则条件下证明所得到的估计量的最优性和渐近正态性.其次,针对模型中的异质性检验问题,给出了分布式检验方法.最后,通过数值模拟研究,对本文所提出估计和检验方法的优良性进行验证.  相似文献   

5.
因子分析(factor analysis,FA)是一种流行的从多变量中提取公因子的统计技术,但它仅适用于向量值数据(每个数据点为一向量).当FA应用于矩阵值数据(每个数据点为一矩阵)时,一种常用的做法是首先将矩阵值观测向量化.然而,向量化使得因子分析面临两个问题:可解释性变差,容易陷入维数灾难.为了解决这两个问题,文章从矩阵值数据本身固有的矩阵结构出发,提出双线性因子分析(bilinear FA,BFA).新颖性在于BFA采用双线性变换,模型参数大大减少,有效克服了维数灾难问题,同时提取感兴趣的行变量和列变量公因子.文章开发了两种有效算法用于BFA模型参数的极大似然估计,讨论了估计的理论性质并明确地求出Fisher信息矩阵的解析表达式来计算参数估计的准确度,研究了BFA的模型选择问题.与传统因子得分为一向量不同,BFA的因子得分为一矩阵,文章为矩阵因子得分提供了计算方法以及可视化方法.最后,构建实证研究来理解提出的BFA模型并与相关方法进行比较.结果表明了BFA在矩阵值数据分析上的优越性和实用性.  相似文献   

6.
针对大数据背景下机器学习的3种新分类算法:支持向量机、增强决策树、随机森林和传统分类的3种算法:逻辑回归、K最近邻法和线性判别分析法,选取了七个不同行业的实例数据集用上述六种分类算法进行数值分析,计算六种分类算法在测试集的总误判概率和两种错误的误判率.分析结果表明:从预测角度上大数据情况下新的机器学习分类算法尤其是随机森林和增强决策树的表现明显优于传统的分类算法.  相似文献   

7.
李培志  董清利 《运筹与管理》2021,30(11):168-175
电影票房预测对于管理部门一直是一项重要而复杂的工作。电影票房相关变量复杂多变,且数据获取难度较大是制约当前研究的主要因素。相比之下,网络搜索数据是互联网公司发布的用于记录网民搜索行为的结构化数据,能客观及时反映事物的发展趋势。本研究建立了基于网络搜索数据的混合预测模型。首先,匹配与测试集最相似的训练数据构建最优训练集(OTS)。其次,应用帝国竞争算法(ICA)选择最小二乘支持向量机(LSSVM)的最佳参数组合。最后,使用优化模型进行预测。为了测试模型的效果,使用中国大陆上映的电影票房数据进行模拟实验。结果表明混合模型具有更高的预测精度。本研究所构建的模型适用于中国电影业的票房预测,可为有关部门提供决策参考。  相似文献   

8.
针对连续数据流分类问题,基于在线学习理论,提出一种在线logistic回归算法.研究带有正则项的在线logistic回归,提出了在线logistic-l2回归模型,并给出了理论界估计.最终实验结果表明,随着在线迭代次数的增加,提出的模型与算法能够达到离线预测的分类结果.本文工作为处理海量流数据分类问题提供了一种新的有效方法.  相似文献   

9.
基于主题模型的半监督网络文本情感分类研究   总被引:1,自引:0,他引:1  
针对网络评论文本的情感分类问题中存在的数据的不平衡性、无标记性和不规范性问题,提出一种基于主题的闽值调整的半监督学习模型,通过从非结构化文本中提取主题特征,对少量标注情感的文本训练分类器并优化指标调整闽值,达到识别用户评论的情感倾向的目的。仿真研究证明阈值调整的半监督模型对数据非平衡性和无标记性具有较强的适应能力。在实证研究中,对酒店评论文本数据构建的文本情感分类器显示该模型可以有效预测少数类评论样本的情感极性,证实了基于主题模型的闽值调整半监督网络评论文本情感分类模型在实际问题中的适用性与可行性。  相似文献   

10.
研究了基于固定效应的纵向数据模分位点回归模型的参数估计及统计诊断问题.首先给出了参数估计的MM迭代算法,然后讨论了统计诊断中数据删除模型(CDM)和均值移模型(MSOM)的等价性问题,最后利用消炎镇痛药数据说明了方法的应用.  相似文献   

11.
This article introduces a classification tree algorithm that can simultaneously reduce tree size, improve class prediction, and enhance data visualization. We accomplish this by fitting a bivariate linear discriminant model to the data in each node. Standard algorithms can produce fairly large tree structures because they employ a very simple node model, wherein the entire partition associated with a node is assigned to one class. We reduce the size of our trees by letting the discriminant models share part of the data complexity. Being themselves classifiers, the discriminant models can also help to improve prediction accuracy. Finally, because the discriminant models use only two predictor variables at a time, their effects are easily visualized by means of two-dimensional plots. Our algorithm does not simply fit discriminant models to the terminal nodes of a pruned tree, as this does not reduce the size of the tree. Instead, discriminant modeling is carried out in all phases of tree growth and the misclassification costs of the node models are explicitly used to prune the tree. Our algorithm is also distinct from the “linear combination split” algorithms that partition the data space with arbitrarily oriented hyperplanes. We use axis-orthogonal splits to preserve the interpretability of the tree structures. An extensive empirical study with real datasets shows that, in general, our algorithm has better prediction power than many other tree or nontree algorithms.  相似文献   

12.
Fitting semiparametric clustering models to dissimilarity data   总被引:1,自引:0,他引:1  
The cluster analysis problem of partitioning a set of objects from dissimilarity data is here handled with the statistical model-based approach of fitting the “closest” classification matrix to the observed dissimilarities. A classification matrix represents a clustering structure expressed in terms of dissimilarities. In cluster analysis there is a lack of methodologies widely used to directly partition a set of objects from dissimilarity data. In real applications, a hierarchical clustering algorithm is applied on dissimilarities and subsequently a partition is chosen by visual inspection of the dendrogram. Alternatively, a “tandem analysis” is used by first applying a Multidimensional Scaling (MDS) algorithm and then by using a partitioning algorithm such as k-means applied on the dimensions specified by the MDS. However, neither the hierarchical clustering algorithms nor the tandem analysis is specifically defined to solve the statistical problem of fitting the closest partition to the observed dissimilarities. This lack of appropriate methodologies motivates this paper, in particular, the introduction and the study of three new object partitioning models for dissimilarity data, their estimation via least-squares and the introduction of three new fast algorithms.  相似文献   

13.
This paper develops local learning algorithms to solve a classification task with the help of biologically inspired mathematical models of spiking neural networks involving the mechanism of spike-timing-dependent plasticity (STDP). The advantages of the models are their simplicity and, hence, the potential ability to be hardware-implemented in low-energy-consuming biomorphic computing devices. The methods developed are based on two key effects observed in neurons with STDP: mean firing rate stabilization and memorizing repeating spike patterns. As the result, two algorithms to solve a classification task with a spiking neural network are proposed: the first based on rate encoding of the input data and the second based on temporal encoding. The accuracy of the algorithms is tested on the benchmark classification tasks of Fisher's Iris and Wisconsin breast cancer, with several combinations of input data normalization and preprocessing. The respective accuracies are 99% and 94% by F1-score.  相似文献   

14.
A Gaussian kernel approximation algorithm for a feedforward neural network is presented. The approach used by the algorithm, which is based on a constructive learning algorithm, is to create the hidden units directly so that automatic design of the architecture of neural networks can be carried out. The algorithm is defined using the linear summation of input patterns and their randomized input weights. Hidden-layer nodes are defined so as to partition the input space into homogeneous regions, where each region contains patterns belonging to the same class. The largest region is used to define the center of the corresponding Gaussian hidden nodes. The algorithm is tested on three benchmark data sets of different dimensionality and sample sizes to compare the approach presented here with other algorithms. Real medical diagnoses and a biological classification of mushrooms are used to illustrate the performance of the algorithm. These results confirm the effectiveness of the proposed algorithm.  相似文献   

15.
More and more high dimensional data are widely used in many real world applications. This kind of data are obtained from different feature extractors, which represent distinct perspectives of the data. How to classify such data efficiently is a challenge. Despite of existence of millions of unlabeled data samples, it is believed that labeling a handful of data such as the semisupervised scheme will remarkably improve the searching performance. However, the performance of semisupervised data classification highly relies on proposed models and related numerical methods. Following from the extension of the Mumford–Shah–Potts-type model in the spatially continuous setting, we propose some efficient data classification algorithms based on the alternating direction method of multipliers and the primal-dual method to efficiently deal with the nonsmoothing problem in the proposed model. The convergence of the proposed data classification algorithms is established under the framework of variational inequalities. Some balanced and unbalanced classification problems are tested, which demonstrate the efficiency of the proposed algorithms.  相似文献   

16.
In the Knowledge Discovery Process, classification algorithms are often used to help create models with training data that can be used to predict the classes of untested data instances. While there are several factors involved with classification algorithms that can influence classification results, such as the node splitting measures used in making decision trees, feature selection is often used as a pre-classification step when using large data sets to help eliminate irrelevant or redundant attributes in order to increase computational efficiency and possibly to increase classification accuracy. One important factor common to both feature selection as well as to classification using decision trees is attribute discretization, which is the process of dividing attribute values into a smaller number of discrete values. In this paper, we will present and explore a new hybrid approach, ChiBlur, which involves the use of concepts from both the blurring and χ2-based approaches to feature selection, as well as concepts from multi-objective optimization. We will compare this new algorithm with algorithms based on the blurring and χ2-based approaches.  相似文献   

17.
The features used may have an important effect on the performance of credit scoring models. The process of choosing the best set of features for credit scoring models is usually unsystematic and dominated by somewhat arbitrary trial. This paper presents an empirical study of four machine learning feature selection methods. These methods provide an automatic data mining technique for reducing the feature space. The study illustrates how four feature selection methods—‘ReliefF’, ‘Correlation-based’, ‘Consistency-based’ and ‘Wrapper’ algorithms help to improve three aspects of the performance of scoring models: model simplicity, model speed and model accuracy. The experiments are conducted on real data sets using four classification algorithms—‘model tree (M5)’, ‘neural network (multi-layer perceptron with back-propagation)’, ‘logistic regression’, and ‘k-nearest-neighbours’.  相似文献   

18.
Bayesian networks are graphical models that represent the joint distribution of a set of variables using directed acyclic graphs. The graph can be manually built by domain experts according to their knowledge. However, when the dependence structure is unknown (or partially known) the network has to be estimated from data by using suitable learning algorithms. In this paper, we deal with a constraint-based method to perform Bayesian networks structural learning in the presence of ordinal variables. We propose an alternative version of the PC algorithm, which is one of the most known procedures, with the aim to infer the network by accounting for additional information inherent to ordinal data. The proposal is based on a nonparametric test, appropriate for ordinal variables. A comparative study shows that, in some situations, the proposal discussed here is a slightly more efficient solution than the PC algorithm.  相似文献   

19.
Multi-dimensional classification aims at finding a function that assigns a vector of class values to a given vector of features. In this paper, this problem is tackled by a general family of models, called multi-dimensional Bayesian network classifiers (MBCs). This probabilistic graphical model organizes class and feature variables as three different subgraphs: class subgraph, feature subgraph, and bridge (from class to features) subgraph. Under the standard 0-1 loss function, the most probable explanation (MPE) must be computed, for which we provide theoretical results in both general MBCs and in MBCs decomposable into maximal connected components. Moreover, when computing the MPE, the vector of class values is covered by following a special ordering (gray code). Under other loss functions defined in accordance with a decomposable structure, we derive theoretical results on how to minimize the expected loss. Besides these inference issues, the paper presents flexible algorithms for learning MBC structures from data based on filter, wrapper and hybrid approaches. The cardinality of the search space is also given. New performance evaluation metrics adapted from the single-class setting are introduced. Experimental results with three benchmark data sets are encouraging, and they outperform state-of-the-art algorithms for multi-label classification.  相似文献   

20.
Increasingly, fuzzy partitions are being used in multivariate classification problems as an alternative to the crisp classification procedures commonly used. One such fuzzy partition, the grade of membership model, partitions individuals into fuzzy sets using multivariate categorical data. Although the statistical methods used to estimate fuzzy membership for this model are based on maximum likelihood methods, large sample properties of the estimation procedure are problematic for two reasons. First, the number of incidental parameters increases with the size of the sample. Second, estimated parameters fall on the boundary of the parameter space with non-zero probability. This paper examines the consistency of the likelihood approach when estimating the components of a particular probability model that gives rise to a fuzzy partition. The results of the consistency proof are used to determine the large sample distribution of the estimates. Common methods of classifying individuals based on multivariate observations attempt to place each individual into crisply defined sets. The fuzzy partition allows for individual to individual heterogeneity, beyond simply errors in measurement, by defining a set of pure type characteristics and determining each individual's distance from these pure types. Both the profiles of the pure types and the heterogeneity of the individuals must be estimated from data. These estimates empirically define the fuzzy partition. In the current paper, this data is assumed to be categorical data. Because of the large number of parameters to be estimated and the limitations of categorical data, one may be concerned about whether or not the fuzzy partition can be estimated consistently. This paper shows that if heterogeneity is measured with respect to a fixed number of moments of the grade of membership scores of each individual, the estimated fuzzy partition is consistent.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号