首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A simple algorithm to find out a collective distance between arbitrary assemblies of points in some vector space is defined at various levels of complexity. The most straightforward procedure is related to a sum of Euclidian distances, which can be easily obtained from any Gram matrix of the point vectors. Similar but more involved formulation can be obtained with the tensor products of an indefinite number of vectors. Simple and elaborate examples are provided to illustrate the procedures along the text. The use of collective distances in reference to quantum similarity is discussed as an important application issue and a numerical example given.  相似文献   

2.
A novel representation of proteins was introduced. It is independent of arbitrary decisions with respect to the choice of labels to be assigned to the 20 natural amino acids. The approach is based on an assignment of 20 unit vectors in 20-dimensional vector space to the 20 natural amino acids. Proteins are then represented by a walk, that is, a sequence of steps in the 20-dimensional space analogous to a walk in the (x, y) plane in the case of binary strings. A straightforward numerical characterization of proteins is obtained from the distance matrix associated with the walk representing the protein in 20-dimensional space combining the information on the Euclidean distance between various amino acids in protein sequence. The Line Distance matrix offers additional numerical characterization of proteins, while the lengths of steps of the walk in 20-D space allow construction of a "protein profile," which represents distribution of average lengths of the steps and their powers.  相似文献   

3.
A digitized fluorescence spectrum containing n points may be considered as a vector in n-dimensional space. This concept is used to develop a vector model employing angular distance in n-space as parameter for comparison of fluorescence spectra of weathered oil. The vector model is applied to the problem of forensic oil-identification, by comparing the spectra of oil-spill samples with the spectra of unweathered and laboratory-weathered oils through likely suspects. The vector representing the oil-spill spectrum is projected on the hyperlane formed by the spectra of the unweathered and corresponding laboratory-weathered oils to locate the vector which has the smallest angular deviation from the spill vector ("best fit").  相似文献   

4.
A new method, based on generalized Fourier analysis, is described that utilizes the concept of "molecular basis sets" to represent chemical space within an abstract vector space. The basis vectors in this space are abstract molecular vectors. Inner products among the basis vectors are determined using an ansatz that associates molecular similarities between pairs of molecules with their corresponding inner products. Moreover, the fact that similarities between pairs of molecules are, in essentially all cases, nonzero implies that the abstract molecular basis vectors are nonorthogonal, but since the similarity of a molecule with itself is unity, the molecular vectors are normalized to unity. A symmetric orthogonalization procedure, which optimally preserves the character of the original set of molecular basis vectors, is used to construct appropriate orthonormal basis sets. Molecules can then be represented, in general, by sets of orthonormal "molecule-like" basis vectors within a proper Euclidean vector space. However, the dimension of the space can become quite large. Thus, the work presented here assesses the effect of basis set size on a number of properties including the average squared error and average norm of molecular vectors represented in the space-the results clearly show the expected reduction in average squared error and increase in average norm as the basis set size is increased. Several distance-based statistics are also considered. These include the distribution of distances and their differences with respect to basis sets of differing size and several comparative distance measures such as Spearman rank correlation and Kruscal stress. All of the measures show that, even though the dimension can be high, the chemical spaces they represent, nonetheless, behave in a well-controlled and reasonable manner. Other abstract vector spaces analogous to that described here can also be constructed providing that the appropriate inner products can be directly evaluated as is the case in this work, a problem that is well-known in kernel-based machine learning.  相似文献   

5.
The superdirect configuration interaction (Sup-CI ) method has the usual versatility and stability of the CI methods with computational efficiency typical to that of the many-body methods, such as the many-body perturbation theory (MBPT ). The Hamilton operator is projected into a space of a few trial vectors, such as Krylov, Nesbet, or Møller–Plesset correction vectors. In this space, Hamiltonian matrix elements may be directly computed in the many-body fashion, as weighted sums of integral products over orbital indices. The variation-perturbation method based on the first-order wave function is equivalent to the Sup-CI method with a single correction vector of the Møller–Plesset type. Different points of view on the superdirect CI method are discussed and a version in which third-order contributions are computed for a relatively small (10–100) space of reference and correction vectors is tested. Selection of the best “effective first-order spaces” and size-extensivity corrections in Sup-CI are briefly discussed. Møoller–Plesset, Epstein–Nesbet, and other correction vectors are included in the model calculations on the symmetric stretch of bonds in water, acetylene, and the NH2 molecule. Errors are almost independent of molecular geometry and the method appears to be superior than the multireference second-order perturbation methods. © 1994 John Wiley & Sons, Inc.  相似文献   

6.
7.
A theorem concerning the Gram matrix of a general set of vectors: a vector polyhedron, is described. Generally speaking, the demonstration is performed by means of a non-negative vector definition: the variance vector, which can be associated to any vector polyhedron belonging to a vector space where a scalar product can be defined. The complete sum of the variance vector, the condensed variance index of any polyhedron, permits to construct the Gram matrix theorem.  相似文献   

8.
Two numerical codes, a complex face vector F and a real face vector D are developed for the characterization of square-cell configurations (lattice animals), used for representing the shapes of molecular monolayers and cross sections of molecular surfaces. The real face vector D represents all the intrinsic properties, size, and shape of the lattice animal. The complex face vector F contains complete information about the size, the shape, and also the placement of the particular lattice animal with respect to the lattice. Based on the properties of the face vectors, a method is developed for the classification of similar animals into equivalence classes. The face vector method is proposed for an algorithmic, nonvisual computer analysis of similarity of shapes of molecular monolayers and planar domains of cross sections of molecular surfaces, approximated by lattice animals.  相似文献   

9.
In this paper we propose a new algorithm for subgraph isomorphism based on the representation of molecular structures as colored graphs and the representation of these graphs as vectors in n-dimensional spaces. The presented process that obtains all maximum common substructures is based on the solution of a constraint satisfaction problem defined as the common m-dimensional space (m< or =n) in which the vectors representing the matched graphs can be defined.  相似文献   

10.
This paper describes the program ASSAM, which has been developed to search for patterns of amino acid side-chains in the 3D structures in the Protein Data Bank. ASSAM represents an amino acid by a vector drawn from the main chain towards the functional part of the amino acid and then computes a graph representation of a protein in which the individual side-chain vectors are the nodes and the intervector distances are the edges. The presence of a query pattern in a Protein Data Bank structure can then be searched for by means of a subgraph isomorphism algorithm. Recent enhancements to ASSAM allow searches to include the following: the main-chain structure in addition to the side-chains; the secondary structure and solvent accessibility of side-chains; allowable distances from a known binding-site; disulfide bridges; and improved generic and wild-card queries. The effectiveness of these approaches is demonstrated by extensive searches of the Protein Data Bank for typical 3D query patterns.  相似文献   

11.
We introduce a graphical representation of DNA primary sequences by taking four special vectors in a 3-D space to represent the four nucleic acid bases in DNA sequences, so that a DNA primary sequence is denoted in a 3-D space by a successive vector sequence which is a directed walk on the space. It is demonstrated that this representation has no overlap and intersection and allows numerical characterization.  相似文献   

12.
13.
Protein function is related to its chemical reaction to the surrounding environment including other proteins. On the other hand, this depends on the spatial shape and tertiary structure of protein and folding of its constituent components in space. The correct identification of protein domain fold solely using extracted information from protein sequence is a complicated and controversial task in the current computational biology. In this article a combined classifier based on the information content of extracted features from the primary structure of protein has been introduced to face this challenging problem. In the first stage of our proposed two-tier architecture, there are several classifiers each of which is trained with a different sequence based feature vector. Apart from the application of the predicted secondary structure, hydrophobicity, van der Waals volume, polarity, polarizability, and different dimensions of pseudo-amino acid composition vectors in similar studies, the position specific scoring matrix (PSSM) has also been used to improve the correct classification rate (CCR) in this study. Using K-fold cross validation on training dataset related to 27 famous folds of SCOP, the 28 dimensional probability output vector from each evidence theoretic K-NN classifier is used to determine the information content or expertness of corresponding feature for discrimination in each fold class. In the second stage, the outputs of classifiers for test dataset are fused using Sugeno fuzzy integral operator to make better decision for target fold class. The expertness factor of each classifier in each fold class has been used to calculate the fuzzy integral operator weights. Results make it possible to provide deeper interpretation about the effectiveness of each feature for discrimination in target classes for query proteins.  相似文献   

14.
It is well established that the photoacoustic effect based on absorption of electromagnetic radiation into thermal waves allows surface depth profiling. However, limited knowledge exists concerning its spatial resolution. The spiral-stepwise (SSW) approach combined with phase rotational analysis is utilized to determine surface depth profiling of homogeneous and nonhomogeneous multilayered polymeric surfaces in a step-scan photoacoustic FT-IR experiment. In this approach, the thermal wave propagating to the surface is represented as the integral of all heat wave vectors propagating across the sampling depth xn, and the spiral function K'beta(lambda)e(-beta)(lambda)xne(-x)n/mu(th)e(i)(omegat-(xn/mu(th))) represents the amplitude and phase of the heat wave vector propagating to the surface. The SSW approach can be applied to heterogeneous surfaces by representing thermal waves propagating to the surface as the sum of the thermal waves propagating through homogeneous layers that are integrals of all heat vectors from a given sampling depth. The proposed model is tested on multilayered polymeric surfaces and shows that the SSW approach allows semiquantitative surface imaging with the spatial resolution ranging from micrometer to 500 nm levels, and the spatial resolution is a function of the penetration depth.  相似文献   

15.
A new procedure with high ability to enhance prediction of multivariate calibration models with a small number of interpretable variables is presented. The core of this methodology is to sort the variables from an informative vector, followed by a systematic investigation of PLS regression models with the aim of finding the most relevant set of variables by comparing the cross‐validation parameters of the models obtained. In this work, seven main informative vectors i.e. regression vector, correlation vector, residual vector, variable influence on projection (VIP), net analyte signal (NAS), covariance procedures vector (CovProc), signal‐to‐noise ratios vector (StN) and their combinations were automated and tested with the main purpose of feature selection. Six data sets from different sources were employed to validate this methodology. They originated from: near‐Infrared (NIR) spectroscopy, Raman spectroscopy, gas chromatography (GC), fluorescence spectroscopy, quantitative structure‐activity relationships (QSAR) and computer simulation. The results indicate that all vectors and their combinations were able to enhance prediction capability with respect to the full data sets. However, regression and NAS informative vectors from partial least squares (PLS) regression, both built using more latent variables than when building the model presented in most of tested data sets, were the best informative vectors for variable selection. In all the applications, the selected variables were quite effective and useful for interpretation. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

16.
A coarse graining procedure aimed at reproducing both the chain structure and dynamics in melts of linear monodisperse polymers is presented. The reference system is a bead-spring-type representation of the melt. The level of coarse graining is selected equal to the number of beads in the entanglement segment, Ne. The coarse model is still discrete and contains blobs each representing Ne consecutive beads in the fine scale model. The mapping is defined by the following conditions: the probability of given state of the coarse system is equal to that of all fine system states compatible with the respective coarse state, the dissipation per coarse grained object is similar in the two systems, constraints to the motion of a representative chain exist in the fine phase space, and the coarse phase space is adjusted such to represent them. Specifically, the chain inner blobs are constrained to move along the backbone of the coarse grained chain, while the end blobs move in the three-dimensional embedding space. The end blobs continuously redefine the diffusion path for the inner blobs. The input parameters governing the dynamics of the coarse grained system are calibrated based on the fine scale model behavior. Although the coarse model cannot reproduce the whole thermodynamics of the fine system, it ensures that the pair and end-to-end distribution functions, the rate of relaxation of segmental and end-to-end vectors, the Rouse modes, and the diffusion dynamics are properly represented.  相似文献   

17.
18.
With projection based calibration approaches, such as partial least squares (PLS) and principal component regression (PCR), the calibration space is spanned by respective basis vectors (latent vectors). Up to rank k basis vectors are formed where k ≤ min(m,n) with m and n denoting the number of calibration samples and measured variables. The user needs to decide how many and which respective basis vectors (tuning parameters). To avoid the second issue, basis vectors are selected top‐down starting with the first and sequentially adding until model criteria are satisfied. Ridge regression (RR) avoids the issues by using the full set of basis vectors. Another approach is to select a subset from the total available. The presented work develops a process based on the L1 vector norm to select basis vectors. Specifically, the L1 norm is used to select singular value decomposition (SVD) basis set vectors for PCR (LPCR). Because PCR, PLS, RR, and others can be expressed as linear combination of the SVD basis vectors, the focus is on selection and comparison using the SVD basis set. Results based on respective tuning parameter selections and weights applied to the SVD basis vectors for LPCR, top‐down PCR, correlation PCR (CPCR), PLS, and RR are compared for calibration and calibration updating using spectroscopic data sets. The methods are found to predict equivalently. In particular, the L1 norm produces similar results to those obtained by the well‐studied CPCR process. Thus, the new method provides a different theoretical framework than CPCR for selecting basis vectors. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

19.
Index-based search algorithms are an important part of a genomic search, and how to construct indices is the key to an index-based search algorithm to compute similarities between two DNA sequences. In this paper, we propose an efficient query processing method that uses special transformations to construct an index. It uses small storage and it rapidly finds the similarity between two sequences in a DNA sequence database. At first, a sequence is partitioned into equal length windows. We select the likely subsequences by computing Hamming distance to query sequence. The algorithm then transforms the subsequences in each window into a multidimensional vector space by indexing the frequencies of the characters, including the positional information of the characters in the subsequences. The result of our experiments shows that the algorithm has faster run time than other heuristic algorithms based on index structure. Also, the algorithm is as accurate as those heuristic algorithms.  相似文献   

20.
多元分析的误差传递需要一种简单、准确、数值化的表达方法.向量空间中,线性多元混合信号的随机误差可表述成真值子空间中随机向量的表现;由体系多元变量对应的向量构成的真值子空间中,被关注向量和其他向量子空间的空间角θ是描述多元体系的重要参数.如果被关注向量和其他向量子空间关系确定,体系总体误差呈正态分布,那么,被关注向量上误差也是正态分布,其多元统计分析结果的标准差与体系误差标准差的比值为1/(2·sin(θ/2),结论在构造算例和邻、间、对苯二酚混合体系的紫外光度分析中得到验证.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号