首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 218 毫秒
1.
All features of any data type are universally equipped with categorical nature revealed through histograms. A contingency table framed by two histograms affords directional and mutual associations based on rescaled conditional Shannon entropies for any feature-pair. The heatmap of the mutual association matrix of all features becomes a roadmap showing which features are highly associative with which features. We develop our data analysis paradigm called categorical exploratory data analysis (CEDA) with this heatmap as a foundation. CEDA is demonstrated to provide new resolutions for two topics: multiclass classification (MCC) with one single categorical response variable and response manifold analytics (RMA) with multiple response variables. We compute visible and explainable information contents with multiscale and heterogeneous deterministic and stochastic structures in both topics. MCC involves all feature-group specific mixing geometries of labeled high-dimensional point-clouds. Upon each identified feature-group, we devise an indirect distance measure, a robust label embedding tree (LET), and a series of tree-based binary competitions to discover and present asymmetric mixing geometries. Then, a chain of complementary feature-groups offers a collection of mixing geometric pattern-categories with multiple perspective views. RMA studies a system’s regulating principles via multiple dimensional manifolds jointly constituted by targeted multiple response features and selected major covariate features. This manifold is marked with categorical localities reflecting major effects. Diverse minor effects are checked and identified across all localities for heterogeneity. Both MCC and RMA information contents are computed for data’s information content with predictive inferences as by-products. We illustrate CEDA developments via Iris data and demonstrate its applications on data taken from the PITCHf/x database.  相似文献   

2.
We present computer simulation and theoretical results for a system of N Quantum Hard Spheres (QHS) particles of diameter σ and mass m at temperature T, confined between parallel hard walls separated by a distance Hσ, within the range 1H. Semiclassical Monte Carlo computer simulations were performed adapted to a confined space, considering effects in terms of the density of particles ρ*=N/V, where V is the accessible volume, the inverse length H1 and the de Broglie’s thermal wavelength λB=h/2πmkT, where k and h are the Boltzmann’s and Planck’s constants, respectively. For the case of extreme and maximum confinement, 0.5<H1<1 and H1=1, respectively, analytical results can be given based on an extension for quantum systems of the Helmholtz free energies for the corresponding classical systems.  相似文献   

3.
This article deals with compression of binary sequences with a given number of ones, which can also be considered as a list of indexes of a given length. The first part of the article shows that the entropy H of random n-element binary sequences with exactly k elements equal one satisfies the inequalities klog2(0.48·n/k)<H<klog2(2.72·n/k). Based on this result, we propose a simple coding using fixed length words. Its main application is the compression of random binary sequences with a large disproportion between the number of zeros and the number of ones. Importantly, the proposed solution allows for a much faster decompression compared with the Golomb-Rice coding with a relatively small decrease in the efficiency of compression. The proposed algorithm can be particularly useful for database applications for which the speed of decompression is much more important than the degree of index list compression.  相似文献   

4.
The q-exponential form eqx[1+(1q)x]1/(1q)(e1x=ex) is obtained by optimizing the nonadditive entropy Sqk1ipiqq1 (with S1=SBGkipilnpi, where BG stands for Boltzmann–Gibbs) under simple constraints, and emerges in wide classes of natural, artificial and social complex systems. However, in experiments, observations and numerical calculations, it rarely appears in its pure mathematical form. It appears instead exhibiting crossovers to, or mixed with, other similar forms. We first discuss departures from q-exponentials within crossover statistics, or by linearly combining them, or by linearly combining the corresponding q-entropies. Then, we discuss departures originated by double-index nonadditive entropies containing Sq as particular case.  相似文献   

5.
Based on orthogonal Latin cubes, an image cryptosystem with confusion–diffusion–confusion cipher architecture has been proposed recently (Inf. Sci. 2019, 478, 1–14). However, we find that there are four fatal vulnerabilities in this image cryptosystem, which leave open doors for cryptanalysis. In this paper, we propose a reference-validation inference algorithm and design screening-based rules to efficiently break the image cryptosystem. Compared with an existing cryptanalysis algorithm, the proposed method requires fewer pairs of chosen plain-cipher images, and behaves stably since different keys, positions of chosen bits and contents of plain images will not affect the cryptanalysis performance. Experimental results show that our cryptanalysis algorithm only requires  8×H×W3+3 pairs of chosen plain-cipher images, where H×W represents the image’s resolution. Comparative studies demonstrate effectiveness and superiority of the proposed cryptanalysis algorithm.  相似文献   

6.
In order to study the spread of an epidemic over a region as a function of time, we introduce an entropy ratio U describing the uniformity of infections over various states and their districts, and an entropy concentration coefficient C=1U. The latter is a multiplicative version of the Kullback-Leibler distance, with values between 0 and 1. For product measures and self-similar phenomena, it does not depend on the measurement level. Hence, C is an alternative to Gini’s concentration coefficient for measures with variation on different levels. Simple examples concern population density and gross domestic product. Application to time series patterns is indicated with a Markov chain. For the Covid-19 pandemic, entropy ratios indicate a homogeneous distribution of infections and the potential of local action when compared to measures for a whole region.  相似文献   

7.
We present a finite-order system of recurrence relations for the permanent of circulant matrices containing a band of k any-value diagonals on top of a uniform matrix (for k=1,2 and 3) and the method for deriving such recurrence relations, which is based on the permanents of the matrices with defects. The proposed system of linear recurrence equations with variable coefficients provides a powerful tool for the analysis of the circulant permanents, their fast, linear-time computing; and finding their asymptotics in a large-matrix-size limit. The latter problem is an open fundamental problem. Its solution would be tremendously important for a unified analysis of a wide range of the nature’s P-hard problems, including problems in the physics of many-body systems, critical phenomena, quantum computing, quantum field theory, theory of chaos, fractals, theory of graphs, number theory, combinatorics, cryptography, etc.  相似文献   

8.
Path integral Monte Carlo and closure computations are utilized to study real space triplet correlations in the quantum hard-sphere system. The conditions cover from the normal fluid phase to the solid phases face-centered cubic (FCC) and cI16 (de Broglie wavelengths 0.2λB*<2, densities 0.1ρN*0.925). The focus is on the equilateral and isosceles features of the path-integral centroid and instantaneous structures. Complementary calculations of the associated pair structures are also carried out to strengthen structural identifications and facilitate closure evaluations. The three closures employed are Kirkwood superposition, Jackson–Feenberg convolution, and their average (AV3). A large quantity of new data are reported, and conclusions are drawn regarding (i) the remarkable performance of AV3 for the centroid and instantaneous correlations, (ii) the correspondences between the fluid and FCC salient features on the coexistence line, and (iii) the most conspicuous differences between FCC and cI16 at the pair and the triplet levels at moderately high densities (ρN*=0.9, 0.925). This research is expected to provide low-temperature insights useful for the future related studies of properties of real systems (e.g., helium, alkali metals, and general colloidal systems).  相似文献   

9.
We develop a simple Quantile Spacing (QS) method for accurate probabilistic estimation of one-dimensional entropy from equiprobable random samples, and compare it with the popular Bin-Counting (BC) and Kernel Density (KD) methods. In contrast to BC, which uses equal-width bins with varying probability mass, the QS method uses estimates of the quantiles that divide the support of the data generating probability density function (pdf) into equal-probability-mass intervals. And, whereas BC and KD each require optimal tuning of a hyper-parameter whose value varies with sample size and shape of the pdf, QS only requires specification of the number of quantiles to be used. Results indicate, for the class of distributions tested, that the optimal number of quantiles is a fixed fraction of the sample size (empirically determined to be ~0.250.35), and that this value is relatively insensitive to distributional form or sample size. This provides a clear advantage over BC and KD since hyper-parameter tuning is not required. Further, unlike KD, there is no need to select an appropriate kernel-type, and so QS is applicable to pdfs of arbitrary shape, including those with discontinuous slope and/or magnitude. Bootstrapping is used to approximate the sampling variability distribution of the resulting entropy estimate, and is shown to accurately reflect the true uncertainty. For the four distributional forms studied (Gaussian, Log-Normal, Exponential and Bimodal Gaussian Mixture), expected estimation bias is less than 1% and uncertainty is low even for samples of as few as 100 data points; in contrast, for KD the small sample bias can be as large as 10% and for BC as large as 50%. We speculate that estimating quantile locations, rather than bin-probabilities, results in more efficient use of the information in the data to approximate the underlying shape of an unknown data generating pdf.  相似文献   

10.
11.
In this paper, we study the entropy functions on extreme rays of the polymatroidal region which contain a matroid, i.e., matroidal entropy functions. We introduce variable strength orthogonal arrays indexed by a connected matroid M and positive integer v which can be regarded as expanding the classic combinatorial structure orthogonal arrays. It is interesting that they are equivalent to the partition-representations of the matroid M with degree v and the (M,v) almost affine codes. Thus, a synergy among four fields, i.e., information theory, matroid theory, combinatorial design, and coding theory is developed, which may lead to potential applications in information problems such as network coding and secret-sharing. Leveraging the construction of variable strength orthogonal arrays, we characterize all matroidal entropy functions of order n5 with the exception of log10·U2,5 and logv·U3,5 for some v.  相似文献   

12.
13.
This paper investigates the achievable per-user degrees-of-freedom (DoF) in multi-cloud based sectored hexagonal cellular networks (M-CRAN) at uplink. The network consists of N base stations (BS) and KN base band unit pools (BBUP), which function as independent cloud centers. The communication between BSs and BBUPs occurs by means of finite-capacity fronthaul links of capacities CF=μF·12log(1+P) with P denoting transmit power. In the system model, BBUPs have limited processing capacity CBBU=μBBU·12log(1+P). We propose two different achievability schemes based on dividing the network into non-interfering parallelogram and hexagonal clusters, respectively. The minimum number of users in a cluster is determined by the ratio of BBUPs to BSs, r=K/N. Both of the parallelogram and hexagonal schemes are based on practically implementable beamforming and adapt the way of forming clusters to the sectorization of the cells. Proposed coding schemes improve the sum-rate over naive approaches that ignore cell sectorization, both at finite signal-to-noise ratio (SNR) and in the high-SNR limit. We derive a lower bound on per-user DoF which is a function of μBBU, μF, and r. We show that cut-set bound are attained for several cases, the achievability gap between lower and cut-set bounds decreases with the inverse of BBUP-BS ratio 1r for μF2M irrespective of μBBU, and that per-user DoF achieved through hexagonal clustering can not exceed the per-user DoF of parallelogram clustering for any value of μBBU and r as long as μF2M. Since the achievability gap decreases with inverse of the BBUP-BS ratio for small and moderate fronthaul capacities, the cut-set bound is almost achieved even for small cluster sizes for this range of fronthaul capacities. For higher fronthaul capacities, the achievability gap is not always tight but decreases with processing capacity. However, the cut-set bound, e.g., at 5M6, can be achieved with a moderate clustering size.  相似文献   

14.
In this paper, the high-dimensional linear regression model is considered, where the covariates are measured with additive noise. Different from most of the other methods, which are based on the assumption that the true covariates are fully obtained, results in this paper only require that the corrupted covariate matrix is observed. Then, by the application of information theory, the minimax rates of convergence for estimation are investigated in terms of the p(1p<)-losses under the general sparsity assumption on the underlying regression parameter and some regularity conditions on the observed covariate matrix. The established lower and upper bounds on minimax risks agree up to constant factors when p=2, which together provide the information-theoretic limits of estimating a sparse vector in the high-dimensional linear errors-in-variables model. An estimator for the underlying parameter is also proposed and shown to be minimax optimal in the 2-loss.  相似文献   

15.
We explore the quadratic form of the f(R)=R+bR2 gravitational theory to derive rotating N-dimensions black hole solutions with ai,i1 rotation parameters. Here, R is the Ricci scalar and b is the dimensional parameter. We assumed that the N-dimensional spacetime is static and it has flat horizons with a zero curvature boundary. We investigated the physics of black holes by calculating the relations of physical quantities such as the horizon radius and mass. We also demonstrate that, in the four-dimensional case, the higher-order curvature does not contribute to the black hole, i.e., black hole does not depend on the dimensional parameter b, whereas, in the case of N>4, it depends on parameter b, owing to the contribution of the correction R2 term. We analyze the conserved quantities, energy, and angular-momentum, of black hole solutions by applying the relocalization method. Additionally, we calculate the thermodynamic quantities, such as temperature and entropy, and examine the stability of black hole solutions locally and show that they have thermodynamic stability. Moreover, the calculations of entropy put a constraint on the parameter b to be b<116Λ to obtain a positive entropy.  相似文献   

16.
Recently, it has been shown that the information flow and causality between two time series can be inferred in a rigorous and quantitative sense, and, besides, the resulting causality can be normalized. A corollary that follows is, in the linear limit, causation implies correlation, while correlation does not imply causation. Now suppose there is an event A taking a harmonic form (sine/cosine), and it generates through some process another event B so that B always lags A by a phase of π/2. Here the causality is obviously seen, while by computation the correlation is, however, zero. This apparent contradiction is rooted in the fact that a harmonic system always leaves a single point on the Poincaré section; it does not add information. That is to say, though the absolute information flow from A to B is zero, i.e., TAB=0, the total information increase of B is also zero, so the normalized TAB, denoted as τAB, takes the form of 00. By slightly perturbing the system with some noise, solving a stochastic differential equation, and letting the perturbation go to zero, it can be shown that τAB approaches 100%, just as one would have expected.  相似文献   

17.
KamLAND measured the ν̄e’s flux from distant nuclear reactors, and found fewer events than expected from standard assumptions about ν̄e propagation at the 99.998% confidence level (C.L.). The observed energy spectrum disagrees with the expected spectral shape at 99.6% C.L., and prefers the distortion from neutrino oscillation effects. A two-flavor oscillation analysis of the data from KamLAND and solar neutrino experiments with CPT invariance, yields Δm2=7.90.5+0.6×105 eV2 and tan2θ=0.400.07+0.10. All solutions to the solar neutrino problem except for the large mixing angle (LMA) region are excluded. KamLAND succeeded in detecting geoneutrinos produced by the decays of 238U and 232Th within the Earth. The total observed number of 4.5 to 54.2, assuming a Th/U mass concentration ratio of 3.9 is consistent with 19 predicted by geophysical models. This detection allows better estimation of the abundances and distributions of radioactive elements in the Earth, and of the Earth’s overall heat budget.  相似文献   

18.
19.
We propose a novel framework to describe the time-evolution of dilute classical and quantum gases, initially out of equilibrium and with spatial inhomogeneities, towards equilibrium. Briefly, we divide the system into small cells and consider the local equilibrium hypothesis. We subsequently define a global functional that is the sum of cell H-functionals. Each cell functional recovers the corresponding Maxwell–Boltzmann, Fermi–Dirac, or Bose–Einstein distribution function, depending on the classical or quantum nature of the gas. The time-evolution of the system is described by the relationship dH/dt0, and the equality condition occurs if the system is in the equilibrium state. Via the variational method, proof of the previous relationship, which might be an extension of the H-theorem for inhomogeneous systems, is presented for both classical and quantum gases. Furthermore, the H-functionals are in agreement with the correspondence principle. We discuss how the H-functionals can be identified with the system’s entropy and analyze the relaxation processes of out-of-equilibrium systems.  相似文献   

20.
This paper focuses on K-receiver discrete-time memoryless broadcast channels (DM-BCs) with private messages, where the transmitter wishes to convey K private messages to K receivers. A general inner bound on the capacity region is proposed based on an exhaustive message splitting and a K-level modified Marton’s coding. The key idea is to split every message into j=1KKj1 submessages each corresponding to a set of users who are assigned to recover them, and then send these submessages via codewords chosen from a K-level structure codebooks. To guarantee the joint typicality among all transmitted codewords, a sufficient condition on the subcodebooks’ sizes is derived through a newly establishing hierarchical covering lemma, which extends the 2-level multivariate covering lemma to the K-level case with more intricate dependences. As the number of auxiliary random variables and rate conditions both increase exponentially with K, the standard Fourier–Motzkin elimination procedure becomes infeasible when K is large. To tackle this problem, we obtain a closed form of achievable rate region with a special observation of disjoint unions of sets that constitute the power set of {1,,K}. The proposed achievable rate region allows arbitrary input probability mass functions and improves over previously known achievable (closed form) rate regions for K-receiver (K3) BCs.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号