首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The recursive and hierarchical structure of full rooted trees is applicable to statistical models in various fields, such as data compression, image processing, and machine learning. In most of these cases, the full rooted tree is not a random variable; as such, model selection to avoid overfitting is problematic. One method to solve this problem is to assume a prior distribution on the full rooted trees. This enables the optimal model selection based on Bayes decision theory. For example, by assigning a low prior probability to a complex model, the maximum a posteriori estimator prevents the selection of the complex one. Furthermore, we can average all the models weighted by their posteriors. In this paper, we propose a probability distribution on a set of full rooted trees. Its parametric representation is suitable for calculating the properties of our distribution using recursive functions, such as the mode, expectation, and posterior distribution. Although such distributions have been proposed in previous studies, they are only applicable to specific applications. Therefore, we extract their mathematically essential components and derive new generalized methods to calculate the expectation, posterior distribution, etc.  相似文献   

2.
As well-known machine learning methods, decision trees are widely applied in classification and recognition areas. In this paper, with the uncertainty of labels handled by belief functions, a new decision tree method based on belief entropy is proposed and then extended to random forest. With the Gaussian mixture model, this tree method is able to deal with continuous attribute values directly, without pretreatment of discretization. Specifically, the tree method adopts belief entropy, a kind of uncertainty measurement based on the basic belief assignment, as a new attribute selection tool. To improve the classification performance, we constructed a random forest based on the basic trees and discuss different prediction combination strategies. Some numerical experiments on UCI machine learning data set were conducted, which indicate the good classification accuracy of the proposed method in different situations, especially on data with huge uncertainty.  相似文献   

3.
4.
Decision trees are decision support data mining tools that create, as the name suggests, a tree-like model. The classical C4.5 decision tree, based on the Shannon entropy, is a simple algorithm to calculate the gain ratio and then split the attributes based on this entropy measure. Tsallis and Renyi entropies (instead of Shannon) can be employed to generate a decision tree with better results. In practice, the entropic index parameter of these entropies is tuned to outperform the classical decision trees. However, this process is carried out by testing a range of values for a given database, which is time-consuming and unfeasible for massive data. This paper introduces a decision tree based on a two-parameter fractional Tsallis entropy. We propose a constructionist approach to the representation of databases as complex networks that enable us an efficient computation of the parameters of this entropy using the box-covering algorithm and renormalization of the complex network. The experimental results support the conclusion that the two-parameter fractional Tsallis entropy is a more sensitive measure than parametric Renyi, Tsallis, and Gini index precedents for a decision tree classifier.  相似文献   

5.
根据卫星干涉多光谱图像的成像特性,提出一种基于分类权值率失真优化截取和自适应编码深度控制的部分SPIHT光谱图像压缩算法.首先根据干涉区域类型和编码平面的重要性,对各棵零树各个编码过程赋予不同的重要性权值,然后采用部分SPIHT算法对每棵零树独立编码,编码时根据比特平面层中重要系数的统计概率自适应地进行3种编码模式的选择,同时依据重要性权值和深度控制因子自适应地控制每棵零树的编码深度,最后在编码深度内,根据不同干涉区域的零树对恢复光谱的失真贡献,采用分类权值率失真方法对码流进行优化截取,使码流分配与失真达成最优.实验结果表明,本算法比传统算法更好地保护了光谱信息.  相似文献   

6.
近年来,随着各大光谱巡天项目的陆续实施,观测得到的天体光谱数据急剧增长。大型光谱巡天项目对光谱的自动分类和分析提出了更高的要求。本文将分类问题转化为回归问题,提出一种基于深度残差网络的光谱类别预测方法,对恒星光谱进行光谱次型预测。网络主要包括25个卷积层,1个最大池化层,1个平均池化层,全连接层以及12个残差结构。最大池化层用来筛选特征,卷积层提取特征,平均池化层用于减少模型参数,提高效率。残差结构可以防止网络退化,加深网络来提取高维抽象特征以及提高训练速度。考虑到数据有非零几率存在错误标签以及损坏数据,采用Log-Cosh作为损失函数来降低坏样本带来的负面影响。实验数据使用的是从LAMOST DR5中随机抽取的80 000条光谱,由于光谱质量等原因,每个光谱型的光谱数量不一。经过剔除坏值,流量归一化后,按7∶1∶2分为训练集、验证集和测试集。实验包括两个部分,第一个部分是使用数据集训练网络在光谱次型上进行类别预测,使用最大绝对误差、平均绝对误差以及标准差来比较不同形状卷积核的性能。将预测值作为横坐标,标签作为纵坐标,对测试集所有样本点使用二阶非线性拟合,得到了一条与y=x重合的直线。证明模型可以很好的预测光谱次型。第二部分是对模型进行内部分析,使用类别激活映射的方法分别研究了模型预测A,F,G和K四种类型光谱时所关注的主要特征,赋予了模型可解释性。在文中数据集上,该方法对91.4%的光谱预测误差在0.5个光谱次型以内,预测的平均绝对误差为0.3个光谱次型。并与非参数回归、Adaboost回归树、K-Means三种方法进行同数据集比较,结果表明文中提出的方法可以很好地预测光谱次型并且速度更快,准确率更高。  相似文献   

7.
In information theory, lossless compression of general data is based on an explicit assumption of a stochastic generative model on target data. However, in lossless image compression, researchers have mainly focused on the coding procedure that outputs the coded sequence from the input image, and the assumption of the stochastic generative model is implicit. In these studies, there is a difficulty in discussing the difference between the expected code length and the entropy of the stochastic generative model. We solve this difficulty for a class of images, in which they have non-stationarity among segments. In this paper, we propose a novel stochastic generative model of images by redefining the implicit stochastic generative model in a previous coding procedure. Our model is based on the quadtree so that it effectively represents the variable block size segmentation of images. Then, we construct the Bayes code optimal for the proposed stochastic generative model. It requires the summation of all possible quadtrees weighted by their posterior. In general, its computational cost increases exponentially for the image size. However, we introduce an efficient algorithm to calculate it in the polynomial order of the image size without loss of optimality. As a result, the derived algorithm has a better average coding rate than that of JBIG.  相似文献   

8.
A phenomenology-based virtual metrology (VM) for monitoring SiO2 etching depth was proposed by Park (2015). It achieved high prediction accuracy by introducing newly developed plasma information (PI) variables as designated inputs, called PI-VM. The PI variables represent the state of the plasma, the sheath, and the target during the process. We investigate how a PI variable can help to improve prediction accuracy of VM and how it plays a special role in the statistical selection. We choose only PIEEDF among the three PI variables to focus on the investigation. The PIEEDF is determined from the ratio of line-intensities of optical emission spectroscopy. We apply Pearson's correlation filter (PCF), principal component analysis (PCA), and stepwise variable selection (SVS) as statistical selection methods on the variables set including PIEEDF or not. Multilinear regression is used to model the VM. This study reveals that PIEEDF variable is a good variable in terms of independence from other input variables and explanatory power for an output variable. Especially, VM using SVS method applied to variable sets including PIEEDF achieves the highest accuracy, comparable to Park's PI-VM. This study shows that PIEEDF variable is particularly useful for monitoring of the fine variations in semiconductor manufacturing process and it also extends the utilization of OES sensor data.  相似文献   

9.
近红外光谱分析中建模校正集的选择   总被引:5,自引:0,他引:5  
将极大线性无关组的概念及方法引入近红外光谱分析,探讨了在建立定量分析模型时代表性样品,即校正集样品的选择问题。以2 652个烟末样品为实验材料,随机选取1 001个样品构成预测集,其余1 651个样品为代表性样品备选集。用Matlab软件求出代表性样品备选集光谱矩阵的极大线性无关组,以此作为代表性样品,构成建模的校正集。用PLS回归法建立了烟末样品总糖含量定量分析的预测模型,并将模型用于预测集中1 001个烟末样品总糖含量的预测分析。实验结果表明,当选择的校正集包含的样品数量大于32时,所建各模型对预测集样品预测的平均相对误差均小于4%,平均相关系数大于0.96。其中选择32个代表性样品和146个代表性样品所建模型定量分析预测集中各样品的总糖含量,两个结果经统计检验没有显著性差异(α=0.05),说明求极大线性无关组的方法用于校正集样品的选择,可实现“少而精”选择样品的目的。此外,我们用求极大线性无关组选择校正集样品和随机方法选择校正集样品两种方法,选择了同样数目28,32,41,76,146,163个样品建模进行预测效果的对比实验,结果显示,求极大线性无关组法选择校正集建模的预测效果优于随机选择校正集建模的预测效果。  相似文献   

10.
Some theories are explored in this research about decision trees which give theoretical support to the applications based on decision trees. The first is that there are many splitting criteria to choose in the tree growing process. The splitting bias that influences the criterion chosen due to missing values and variables with many possible values has been studied. Results show that the Gini index is superior to entropy information as it has less bias regarding influences. The second is that noise variables with more missing values have a better chance to be chosen while informative variables do not. The third is that when there are many noise variables involved in the tree building process, it influences the corresponding computational complexity. Results show that the computational complexity increase is linear to the number of noise variables. So methods that decompose more information from the original data but increase the variable dimension can also be considered in real applications.  相似文献   

11.
《Solid State Communications》2002,121(2-3):111-115
Based on the Landau theory, the first-order phase transition properties of ferroelectric thin films have been studied by taking into account uniaxial stress distribution effects. The stress is supposed to decrease from interface to surface exponentially according to the experimental results. It is shown that tensile stress decreases the polarization and the Curie temperature while compressive stress increases the polarization and the Curie temperature. A stress-driven phase transition is found at the critical stress. Our prediction is compared with the available experiment results.  相似文献   

12.
Numerous researchers have used the isotopic signatures of C, H, and O in tree rings to provide a long-term record of changes in the physiological status, climate, or water-source use of trees. The frequently limiting element N is also found in tree rings, and variation in its isotopic signature may provide insight into long-term changes in soil N availability of a site. However, research has suggested that N is readily translocated among tree ring of different years; such infidelity between the isotopic compositions of the N taken up from the soil and the N contained in the ring of that growth year would obscure the long-term N isotopic record. We used a 15-year 15N-tracer study to assess the degree of N translocation among tree rings in ponderosa pine (Pinus ponderosa) trees growing in a young, mixed-conifer plantation. We also measured delta13C and delta15N values in unlabeled trees to assess the degree of their covariance in wood tissue, and to explore the potential for a biological linkage between them. We found that the maximum delta15N values in rings from the labeled trees occurred in the ring formed one-year after the 15N was applied to the roots. The delta15N value of rings from labeled trees declined exponentially and bidirectionally from this maximum peak, toward younger and older rings. The unlabeled trees showed considerable interannual variation in the delta15N values of their rings (up to 3 and 5 per thousand), but these values correlated poorly between trees over time and differed by as much as 6 per thousand. Removal of extractives from the wood reduced their delta15N value, but the change was fairly small and consistent among unlabeled trees. The delta13C and delta15N values of tree rings were correlated over time in only one of the unlabeled trees. Across all trees, both delta13C values of tree rings and annual stem wood production were well correlated with annual precipitation, suggesting that soil water balance is an important environmental factor controlling both net C gain and transpirational water loss at this site. Our results suggest that interannual translocation of N among tree rings is substantial, but may be predictable enough to remove this source of variation from the tree-ring record, potentially allowing the assessment of long-term changes in soil N availability of a site.  相似文献   

13.
Deep Neural Networks (DNNs) usually work in an end-to-end manner. This makes the trained DNNs easy to use, but they remain an ambiguous decision process for every test case. Unfortunately, the interpretability of decisions is crucial in some scenarios, such as medical or financial data mining and decision-making. In this paper, we propose a Tree-Network-Tree (TNT) learning framework for explainable decision-making, where the knowledge is alternately transferred between the tree model and DNNs. Specifically, the proposed TNT learning framework exerts the advantages of different models at different stages: (1) a novel James–Stein Decision Tree (JSDT) is proposed to generate better knowledge representations for DNNs, especially when the input data are in low-frequency or low-quality; (2) the DNNs output high-performing prediction result from the knowledge embedding inputs and behave as a teacher model for the following tree model; and (3) a novel distillable Gradient Boosted Decision Tree (dGBDT) is proposed to learn interpretable trees from the soft labels and make a comparable prediction as DNNs do. Extensive experiments on various machine learning tasks demonstrated the effectiveness of the proposed method.  相似文献   

14.
This paper proposes a new multichannel time reversal focusing (MTRF) method for circumferential Lamb waves which is based on modified time reversal algorithm and applies this method for detecting different kinds of defects in thick-walled pipe with large-diameter. The principle of time reversal of circumferential Lamb waves in pipe is presented along with the influence from multiple guided wave modes and propagation paths. Experimental study is carried out in a thick-walled and large-diameter pipe with three artificial defects, namely two axial notches on its inner and outer surface respectively, and a corrosion-like defect on its outer surface. By using the proposed MTRF method, the multichannel signals focus at the defects, leading to the amplitude improvement of the defect scattered signal. Besides, another energy focus arises in the direct signal due to the partial compensation of dispersion and multimode of circumferential Lamb waves, alongside the multichannel focusing, during MTRF process. By taking the direct focus as a time base, accurate defect localization is implemented. Secondly, a new phenomenon is exhibited in this paper that defect scattered wave packet appears just before the right boundary of truncation window after time reversal, and to which two feasible explanations are given. Moreover, this phenomenon can be used as the theoretical basis in the determination of defect scattered waves in time reversal response signal. At last, in order to detect defects without prior knowing their exact position, a large-range truncation window is used in the proposed method. As a result, the experimental operation of MTRF method is simplified and defect detection and localization are well accomplished.  相似文献   

15.
基于叶片反射光谱特征的银杏健康量化评价技术   总被引:1,自引:0,他引:1  
准确地诊断树木健康状况是城市森林树木管理工作的基础,也是目前生产中急需的技术。通过土壤和植物养分分析诊断树木健康可靠性差,通过形态特征调查诊断树木健康费时、费力,如何快速、准确、无损地诊断树木健康已经成为城市树木健康管理的重要技术瓶颈。以北京市银杏为研究对象,对基于叶片反射光谱特征的树木健康诊断技术进行了研究。通过13个外貌形态特征聚类将树木健康划分成健康木、亚健康木、一般健康木和不健康木4个等级,不同健康等级树木叶片色素含量差异极其显著(p<0.001) ,因叶绿素含量与光谱反射率之间存在相关关系,所以采用叶片反射光谱特征判断树木健康状况是可行的。采用因子分析法,通过15个叶片反射光谱指标构建了能够综合反映叶片反射光谱特征的绿度指数、色素指数、三边指数。不同健康等级间叶片反射光谱指标以及三个反射光谱指数均有极显著差异(p<0.001)。所以,采用三个反射光谱指数构建了银杏健康评价的多元二次模型,经检验模型预测精度达到79%,可以作为银杏树木健康快速诊断。选取的光谱指标较为全面,方法简洁,并通过综合分析,确定了不同健康等级树木核心形态指标以及叶片的绿度指数、色素指数、三边指数等综合得分以及得分范围,为生产中直接使用该方法诊断银杏健康状况提供了标准。  相似文献   

16.
Smart transportation is an important part of smart urban areas, and travel characteristics analysis and traffic prediction modeling are the two key technical measures of building smart transportation systems. Although online car-hailing has developed rapidly and has a large number of users, most of the studies on travel characteristics do not focus on online car-hailing, but instead on taxis, buses, metros, and other traditional means of transportation. The traditional univariate variable hybrid time series traffic prediction model based on the autoregressive integrated moving average (ARIMA) ignores other explanatory variables. To fill the research gap on online car-hailing travel characteristics analysis and overcome the shortcomings of the univariate variable hybrid time series traffic prediction model based on ARIMA, based on online car-hailing operational data sets, we analyzed the online car-hailing travel characteristics from multiple dimensions, such as district, time, traffic jams, weather, air quality, and temperature. A traffic prediction method suitable for multivariate variables hybrid time series modeling is proposed in this paper, which uses the maximal information coefficient (MIC) to perform feature selection, and fuses autoregressive integrated moving average with explanatory variable (ARIMAX) and long short-term memory (LSTM) for data regression. The effectiveness of the proposed multivariate variables hybrid time series traffic prediction model was verified on the online car-hailing operational data sets.  相似文献   

17.
基于SVM与RF的苹果树冠LAI高光谱估测   总被引:7,自引:0,他引:7  
叶面积指数(leaf area index,LAI)是反映作物群体大小的较好的动态指标。运用高光谱技术快速、无损地估测苹果树冠叶面积指数,为监测苹果树长势和估产提供参考。以盛果期红富士苹果树为研究对象,采用ASD地物光谱仪和LAI-2200冠层分析仪,在山东省烟台栖霞研究区,连续2年测量了30个果园90棵苹果树冠层光谱反射率及LAI值;通过相关性分析方法构建并筛选出了最优的植被指数;利用支持向量机(support vector machine, SVM)与随机森林(random forests, RF)多元回归分析方法构建了LAI估测模型。新建的GNDVI527,NDVI676,RVI682,FD-NVI656和GRVI517五个植被指数及前人建立的两个植被指数NDVI670和NDVI705与LAI的相关性都达到了极显著水平;建立的RF回归模型中,校正集决定系数C-R2和验证集决定系数V-R2为0.920,0.889,分别比SVM回归模型提高了0.045和0.033,校正集均方根误差C-RMSE、验证集均方根误差V-RMSE为0.249,0.236,分别比SVM回归模型降低了0.054和0.058, 校正集相对分析误C-RPD、验证集相对分析误V-RPD达到了3.363和2.520,分别比SVM回归模型提高了0.598和0.262,校正集及验证集的实测值与预测值散点图趋势线的斜率C-SV-S都接近于1,RF回归模型的估测效果优于SVM。RF多元回归模型适合盛果期红富士苹果树LAI的估测。  相似文献   

18.
负载预测在故障管理中有着十分重要的作用,通过对CPU负载以及内存使用率的预测可以对系统进行实时监控,预知未来时间段资源的可用性,发出异常告警;文中提出一种加权改进的自回归模型,通过对最小二乘法求出的参数进行加权处理,结合时间序列分析理论,建立一个负载预测模型,用于CPU负载和内存使用率的预测;实验证明,对AR模型的参数进行加权的方法优化了参数估计,预测误差减小了60%~80%。  相似文献   

19.
The main cause of degradation and breakdown in silicon rubber (SIR) is electrical treeing. Based on a series of experiments, this paper discusses the morphology of the electrical trees. The types of morphology of electrical trees in SIR are concluded. The effective factors of the tree initial type are explored. And the propagation characteristics are also studied through long-term electrical tree ageing experiments. These results are also compared with the electrical trees occurred in on-site cable accessories and those in PE which are more familiar to researchers. Based on those experiment results, an explanatory mechanism is proposed.  相似文献   

20.
Conventional decision trees use queries each of which is based on one attribute. In this study, we also examine decision trees that handle additional queries based on hypotheses. This kind of query is similar to the equivalence queries considered in exact learning. Earlier, we designed dynamic programming algorithms for the computation of the minimum depth and the minimum number of internal nodes in decision trees that have hypotheses. Modification of these algorithms considered in the present paper permits us to build decision trees with hypotheses that are optimal relative to the depth or relative to the number of the internal nodes. We compare the length and coverage of decision rules extracted from optimal decision trees with hypotheses and decision rules extracted from optimal conventional decision trees to choose the ones that are preferable as a tool for the representation of information. To this end, we conduct computer experiments on various decision tables from the UCI Machine Learning Repository. In addition, we also consider decision tables for randomly generated Boolean functions. The collected results show that the decision rules derived from decision trees with hypotheses in many cases are better than the rules extracted from conventional decision trees.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号