共查询到20条相似文献,搜索用时 0 毫秒
1.
Alexander N. Gorban Bogdan Grechuk Evgeny M. Mirkes Sergey V. Stasenko Ivan Y. Tyukin 《Entropy (Basel, Switzerland)》2021,23(8)
This work is driven by a practical question: corrections of Artificial Intelligence (AI) errors. These corrections should be quick and non-iterative. To solve this problem without modification of a legacy AI system, we propose special ‘external’ devices, correctors. Elementary correctors consist of two parts, a classifier that separates the situations with high risk of error from the situations in which the legacy AI system works well and a new decision that should be recommended for situations with potential errors. Input signals for the correctors can be the inputs of the legacy AI system, its internal signals, and outputs. If the intrinsic dimensionality of data is high enough then the classifiers for correction of small number of errors can be very simple. According to the blessing of dimensionality effects, even simple and robust Fisher’s discriminants can be used for one-shot learning of AI correctors. Stochastic separation theorems provide the mathematical basis for this one-short learning. However, as the number of correctors needed grows, the cluster structure of data becomes important and a new family of stochastic separation theorems is required. We refuse the classical hypothesis of the regularity of the data distribution and assume that the data can have a rich fine-grained structure with many clusters and corresponding peaks in the probability density. New stochastic separation theorems for data with fine-grained structure are formulated and proved. On the basis of these theorems, the multi-correctors for granular data are proposed. The advantages of the multi-corrector technology were demonstrated by examples of correcting errors and learning new classes of objects by a deep convolutional neural network on the CIFAR-10 dataset. The key problems of the non-classical high-dimensional data analysis are reviewed together with the basic preprocessing steps including the correlation transformation, supervised Principal Component Analysis (PCA), semi-supervised PCA, transfer component analysis, and new domain adaptation PCA. 相似文献
2.
Gavriel Segre 《International Journal of Theoretical Physics》2004,43(6):1397-1408
Expanding the remark 5.2.7 of Segre (Segre, G. (2002). Algorithmic Information Theoretic Issues in Quantum Mechanics, PhD Thesis, Dipartimento di Fisica Nucleare e Teorica, Pavia, Italy. quant-ph/0110018.) the noncommutative bayesian statistical inference from one wedge of a bifurcate Killing horizon is analyzed looking at its interrelation with the Unruh effect. 相似文献
3.
通过对蔡氏忆阻电路的数学建模分析,提出了忆阻电路动力学建模的降维问题.以包含两个磁控忆阻器的忆阻电路为例,进行了忆阻电路降维建模,由此建立了一个三维系统模型.基于该模型,分析了忆阻电路的平衡点和稳定性,研究了电路参数变化时忆阻电路的动力学特性.进一步,对包含两个磁控忆阻器的忆阻电路常规模型的分析结果和其降维模型的分析结果进行了比较.结果表明:忆阻电路降维模型的维数只与电容器的数量和电感器的数量有关,而与忆阻器的数量无关;当电路参数变化时忆阻电路存在分岔模式共存等非线性现象;降维建模降低了系统建模复杂度,有利于系统的动力学特性分析,但消除了忆阻器内部状态变量的初始条件对忆阻电路动力学特性的影响. 相似文献
4.
5.
Vladimir Majerník 《Central European Journal of Physics》2008,6(2):363-371
In this article we use a new entropic function, derived from an f-divergence between two probability distributions, for the
construction of an alternative entropic uncertainty relation. After a brief review of some existing f-divergences, a new f-divergence
and the corresponding entropic function, derived from it, is introduced and its useful characteristics are presented. This
entropic function is then applied to construct an alternative uncertainty relation of two non-commuting observables in quantum
physics. An explicit expression for such an uncertainty relation is found for the case of two observables which are the x-
and z-components of the angular momentum of the spin-1/2 system.
相似文献
6.
Wyner’s common information is a measure that quantifies and assesses the commonality between two random variables. Based on this, we introduce a novel two-step procedure to construct features from data, referred to as Common Information Components Analysis (CICA). The first step can be interpreted as an extraction of Wyner’s common information. The second step is a form of back-projection of the common information onto the original variables, leading to the extracted features. A free parameter controls the complexity of the extracted features. We establish that, in the case of Gaussian statistics, CICA precisely reduces to Canonical Correlation Analysis (CCA), where the parameter determines the number of CCA components that are extracted. In this sense, we establish a novel rigorous connection between information measures and CCA, and CICA is a strict generalization of the latter. It is shown that CICA has several desirable features, including a natural extension to beyond just two data sets. 相似文献
7.
8.
Multi-modal fusion can achieve better predictions through the amalgamation of information from different modalities. To improve the performance of accuracy, a method based on Higher-order Orthogonal Iteration Decomposition and Projection (HOIDP) is proposed, in the fusion process, higher-order orthogonal iteration decomposition algorithm and factor matrix projection are used to remove redundant information duplicated inter-modal and produce fewer parameters with minimal information loss. The performance of the proposed method is verified by three different multi-modal datasets. The numerical results validate the accuracy of the performance of the proposed method having 0.4% to 4% improvement in sentiment analysis, 0.3% to 8% improvement in personality trait recognition, and 0.2% to 25% improvement in emotion recognition at three different multi-modal datasets compared with other 5 methods. 相似文献
9.
Roberto Trotta 《Contemporary Physics》2013,54(2):71-104
The application of Bayesian methods in cosmology and astrophysics has flourished over the past decade, spurred by data sets of increasing size and complexity. In many respects, Bayesian methods have proven to be vastly superior to more traditional statistical tools, offering the advantage of higher efficiency and of a consistent conceptual basis for dealing with the problem of induction in the presence of uncertainty. This trend is likely to continue in the future, when the way we collect, manipulate and analyse observations and compare them with theoretical models will assume an even more central role in cosmology. This review is an introduction to Bayesian methods in cosmology and astrophysics and recent results in the field. I first present Bayesian probability theory and its conceptual underpinnings, Bayes' Theorem and the role of priors. I discuss the problem of parameter inference and its general solution, along with numerical techniques such as Monte Carlo Markov Chain methods. I then review the theory and application of Bayesian model comparison, discussing the notions of Bayesian evidence and effective model complexity, and how to compute and interpret those quantities. Recent developments in cosmological parameter extraction and Bayesian cosmological model building are summarised, highlighting the challenges that lie ahead. 相似文献
10.
The paper considers a time-efficient implementation of the k nearest neighbours (kNN) algorithm. A well-known approach for accelerating the kNN algorithm is to utilise dimensionality reduction methods based on the use of space-filling curves. In this paper, we take this approach further and propose an algorithm that employs multiple space-filling curves and is faster (with comparable quality) compared with the kNN algorithm, which uses kd-trees to determine the nearest neighbours. A specific method for constructing multiple Peano curves is outlined, and statements are given about the preservation of object proximity information in the course of dimensionality reduction. An experimental comparison with known kNN implementations using kd-trees was performed using test and real-life data. 相似文献
11.
Manfred Opper 《Annalen der Physik》2019,531(3)
The statistical inference of the state variable and the drift function of stochastic differential equations (SDE) from sparsely sampled observations are discussed herein. A variational approach is used to approximate the distribution over the unknown path of the SDE conditioned on the observations. This approach also provides approximations for the intractable likelihood of the drift. The method is combined with a nonparametric Bayesian approach which is based on a Gaussian process prior over drift functions. 相似文献
12.
近红外光谱是热门的食品检测方法之一,对于这种高维光谱数据的分析常常需采用数据降维算法提取其中的特征,然而绝大多数算法都只能针对单个数据集进行分析。虽然已有基于对比学习的对比主成分分析成功应用于不同水果表面农残的近红外光谱检测中,但是该方法只能以线性的方式组合原有特征,特征提取效果存在局限性,并且需要调节对比参数来控制背景集影响,需要消耗更大的时间成本。cVAE(contrastive variational autoencoder)是一种基于对比学习和变分自编码器的改进算法,被用于图像去噪和RNA序列分析中,它仍然具备分析多个数据集的特点,同时因为组合了神经网络的概率生成模型而具备了提取非线性隐含特征的能力。将cVAE算法应用于近红外光谱分析,建立了准确的近红外光谱数据降维模型。在实际验证中,使用cVAE算法对购买的不同品牌和批次纯牛奶中掺假三聚氰胺进行检测。结果表明,使用VAE算法只能区分出不同品牌和批次的纯牛奶,而其中是否掺假三聚氰胺这一重要信息无法表现出来;而使用cVAE算法进行数据分析时,由于添加了背景数据集分离了无关变量,能够清晰的将有无掺假三聚氰胺的样本分类。这说明了,cV... 相似文献
13.
基于广义判别分析的光谱分类 总被引:5,自引:4,他引:1
提出了基于广义判别分析(generalized discriminant analysis, GDA)方法对恒星(Star)、星系(Galaxy)和类星体(Quasars)的光谱进行分类。广义判别分析将核技巧与Fisher判别分析结合起来,通过非线性映射将样本集映射到高维特征空间F,在F空间中进行线性判别分析。实验对比了LDA, GDA, PCA, KPCA算法对于恒星、星系和类星体的光谱分类性能。结果表明基于GDA的算法对于这3种类型光谱的分类正确率最高,LDA次之;尽管KPCA也是一种基于核的方法,但是选择主成分个数较少时效果较差,甚至低于LDA;基于PCA的分类效果最差。 相似文献
14.
近红外光谱(NIR)分析具有测试方便、不破坏样本、响应快速等优势,但是,由于在谱带分布和结构分析中存在着许多复杂因素,使得在提取特征光谱信息时存在许多困难。现阶段,虽然已经有多种光谱数据降维方式被广泛使用,但是这些传统的数据降维方式都有一个局限性,就是数据的降维仅仅针对于一个数据集,当数据集中有多个关键因素形成干扰时,数据降维和分类的结果往往不是很理想,得不到想要分析的信息。这一问题造成了在分析近红外光谱时建立的数据降维模型极差,无法正确的对样品进行预测分类。对比主成分分析(contrastive principle component analysis, cPCA)是一种基于主成分分析(PCA)的改进算法,起源于对比学习,并应用于基因组信息解析。cPCA算法的优势就是能够将一个数据集中的降维推广到两个相关联数据集之间的降维,从而能够得到数据集中的关键信息。将cPCA算法应用于近红外光谱处理中,建立了准确的近红外光谱数据降维模型。在实验验证中,使用cPCA算法对不同类型水果(苹果和梨)表面农药残留进行分析。结果表明,在对不同类型的水果进行农药残留分析时,使用PCA算法进行数据降维只能区分出不同的水果类型,而水果表面是否喷洒农药这一关键的特征信息并不能分析出来;而使用cPCA算法进行数据降维分析时,由于对背景光谱的约束作用,能够清晰的将有无喷洒农药的样本分类。这说明了,cPCA在近红外光谱数据降维中有着明显的优势,解决了近红外光谱数据降维模型中数据集受限和特征信息的提取问题,进而建立准确的近红外光谱数据降维模型。 相似文献
15.
Time-series generated by complex systems (CS) are often characterized by phenomena such as chaoticity, fractality and memory effects, which pose difficulties in their analysis. The paper explores the dynamics of multidimensional data generated by a CS. The Dow Jones Industrial Average (DJIA) index is selected as a test-bed. The DJIA time-series is normalized and segmented into several time window vectors. These vectors are treated as objects that characterize the DJIA dynamical behavior. The objects are then compared by means of different distances to generate proper inputs to dimensionality reduction and information visualization algorithms. These computational techniques produce meaningful representations of the original dataset according to the (dis)similarities between the objects. The time is displayed as a parametric variable and the non-locality can be visualized by the corresponding evolution of points and the formation of clusters. The generated portraits reveal a complex nature, which is further analyzed in terms of the emerging patterns. The results show that the adoption of dimensionality reduction and visualization tools for processing complex data is a key modeling option with the current computational resources. 相似文献
16.
In professional soccer, the choices made in forming a team lineup are crucial for achieving good results. Players are characterized by different skills and their relevance depends on the position that they occupy on the pitch. Experts can recognize similarities between players and their styles, but the procedures adopted are often subjective and prone to misclassification. The automatic recognition of players’ styles based on their diversity of skills can help coaches and technical directors to prepare a team for a competition, to substitute injured players during a season, or to hire players to fill gaps created by teammates that leave. The paper adopts dimensionality reduction, clustering and computer visualization tools to compare soccer players based on a set of attributes. The players are characterized by numerical vectors embedding their particular skills and these objects are then compared by means of suitable distances. The intermediate data is processed to generate meaningful representations of the original dataset according to the (dis)similarities between the objects. The results show that the adoption of dimensionality reduction, clustering and visualization tools for processing complex datasets is a key modeling option with current computational resources. 相似文献
17.
高光谱数据具有图谱合一和数据量大的特点,数据降维是主要的研究方向。波段选择和特征提取是目前高光谱降维的主要方法,就高光谱数据图像岩性特征提取的方法进行了试验和探讨。基于高光谱影像的自相似特征, 探索了分形信号算法在CASI高光谱数据岩性特征提取上的应用研究。以CASI高光谱影像数据为研究对象, 将基于地毯的方法进行修正后用于计算高光谱影像中每一像元的分形信号值。试验结果表明, 与其他分类算法相比分形信号算法增强高光谱图像的影像特征从另一个侧面更细致的描述了不同光谱的可区分性。分形信号影像在一定程度上可以更好地突出基岩裸露地区岩性特征, 从而可以实现影像地表岩性特征提取的目的。原始光谱曲线自身形态特征、初始尺度的选择以及迭代步长等对分形信号和分形特征尺度均有影响。目前,光谱曲线的分形信号特征研究还不多,对其物理意义和定量分析尚需要深入研究。 相似文献
18.
Johann Summhammer 《Foundations of Physics Letters》1988,1(2):113-137
In quantum physics all experimental information is discrete and stochastic. But the values of physical quantities are considered to depict definite properties of the physical world. Thus physical quantities should be identified with mathematical variables which are derived from the experimental data, but which exhibit as little randomness as possible. We look for such variables in two examples by investigating how it is possible to arrive at a value of a physical quantity from intrinsically stochastic data. With the aid of standard probability calculus and elementary information theory, we are necessarily led to the quantum theoretical phases and state vectors as the first candidates for physical quantities. 相似文献
19.
20.
高光谱参数和逐步判别的苎麻品种识别 总被引:2,自引:0,他引:2
为了探讨基于高光谱的苎麻品种识别和分类的方法,在大田栽培条件下,采集了4个不同基因型苎麻品种共927个叶片高光谱数据。根据苎麻叶片高光谱反射曲线,选择了2组特征参数: 基于高光谱波形峰谷反射率和位置参数(V1组)、基于偏度和峰度参数(V2组)。运用逐步判别的方法,通过设置不同F值筛选不同个数的变量,分别建立基于2组特征参数的多个Fisher线性判别函数,并从计算量、正确率和稳定性三方面对所建立的判别函数进行分析比较。结论: (1)所有组合的判别函数总体平均正确率为91.1%,标准差总体均值为1.2%;(2)综合权衡,在所有组合中,V2组且14≥变量个数n≥8判别效果最好--计算量中等,正确率和稳定性均高于平均值,其中,13个变量的Fisher判定函数平均正确率最高有94.2%,标准差最低为0%;(3)若优先考虑正确率,V1组且22≥变量个数≥15正确率最高,平均正确率最大有95.5%,但计算量比较大,稳定性中等,标准差最低为0.9%。研究表明,利用高光谱参数结合逐步判别方法识别苎麻品种是可行的。 相似文献