首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 864 毫秒
1.
Gene and protein names follow few, if any, true naming conventions and are subject to great variation in different occurrences of the same name. This gives rise to two important problems in natural language processing. First, can one locate the names of genes or proteins in free text, and second, can one determine when two names denote the same gene or protein? The first of these problems is a special case of the problem of named entity recognition, while the second is a special case of the problem of automatic term recognition (ATR). We study the second problem, that of gene or protein name variation. Here we describe a system which, given a query gene or protein name, identifies related gene or protein names in a large list. The system is based on a dynamic programming algorithm for sequence alignment in which the mutation matrix is allowed to vary under the control of a fully trainable hidden Markov model.  相似文献   

2.
We have produced an open source, freely available, algorithm (Open Parser for Systematic IUPAC Nomenclature, OPSIN) that interprets the majority of organic chemical nomenclature in a fast and precise manner. This has been achieved using an approach based on a regular grammar. This grammar is used to guide tokenization, a potentially difficult problem in chemical names. From the parsed chemical name, an XML parse tree is constructed that is operated on in a stepwise manner until the structure has been reconstructed from the name. Results from OPSIN on various computer generated name/structure pair sets are presented. These show exceptionally high precision (99.8%+) and, when using general organic chemical nomenclature, high recall (98.7-99.2%). This software can serve as the basis for future open source developments of chemical name interpretation.  相似文献   

3.
具有体积小、功耗低、灵敏度高、硅工艺兼容性好等优点的金属氧化物半导体(MOS)气体传感器现已广泛地应用于军事、科研和国民经济的各个领域。然而MOS传感器的低选择性阻碍了其在物联网(IoT)时代的应用前景。为此,本文综述了解决MOS传感器选择性的研究进展,主要介绍了敏感材料性能提升、电子鼻和热调制三种改善MOS传感器选择性的技术方法,阐述了三种方法目前所存在的问题及其未来的发展趋势。同时,本文还对比介绍了机器嗅觉领域主流的主成分分析(PCA)、线性判别分析(LDA)和神经网络(NN)模式识别/机器学习算法。最后,本综述展望了具有数据降维、特征提取和鲁棒性识别分类性能的卷积神经网络(CNN)深度学习算法在气体识别领域的应用前景。基于敏感材料性能的提升、多种调制手段与阵列技术的结合以及人工智能(AI)领域深度学习算法的最新进展,将会极大地增强非选择性MOS传感器的挥发性有机化合物(VOCs)分子识别能力。  相似文献   

4.
The enumeration of polyhexes can be easily carried out by utilizing a compact name (CN) approach to code chemical structures. The Fortran program performing generation of benzenoid hydrocarbons with 1–10 rings is presented. The computed structures are divided into various classes according to their cata- or peri- as well and non- or radicaloid character. Use of the additive nodal increments (ANI) approach leads to algorithm producing representative samples of the polyhexes' sets which can be applicable in testing of various topological formulae.  相似文献   

5.
A novel approach for CE data analysis based on pattern recognition techniques in the wavelet domain is presented. Low-resolution, denoised electropherograms are obtained by applying several preprocessing algorithms including denoising, baseline correction, and detection of the region of interest in the wavelet domain. The resultant signals are mapped into character sequences using first derivative information and multilevel peak height quantization. Next, a local alignment algorithm is applied on the coded sequences for peak pattern recognition. We also propose 2-D and 3-D representations of the found patterns for fast visual evaluation of the variability of chemical substances concentration in the analyzed samples. The proposed approach is tested on the analysis of intracerebral microdialysate data obtained by CE and LIF detection, achieving a correct detection rate of about 85% with a processing time of less than 0.3 s per 25,000-point electropherogram. Using a local alignment algorithm on low-resolution denoised electropherograms might have a great impact on high-throughput CE since the proposed methodology will substitute automatic fast pattern recognition analysis for slow, human based time-consuming visual pattern recognition methods.  相似文献   

6.
Nonlinear unmixing of hyperspectral reflectance data is one of the key problems in quantitative imaging of painted works of art. The approach presented is to interrogate a hyperspectral image cube by first decomposing it into a set of reflectance curves representing pure basis pigments and second to estimate the scattering and absorption coefficients of each pigment in a given pixel to produce estimates of the component fractions. This two‐step algorithm uses a deep neural network to qualitatively identify the constituent pigments in any unknown spectrum and, based on the pigment(s) present and Kubelka–Munk theory to estimate the pigment concentration on a per‐pixel basis. Using hyperspectral data acquired on a set of mock‐up paintings and a well‐characterized illuminated folio from the 15th century, the performance of the proposed algorithm is demonstrated for pigment recognition and quantitative estimation of concentration.  相似文献   

7.
A new strategy to solve the Kohn-Sham equations of density functional theory is presented which avoids diagonalization within a finite basis-set expansion. The implementation is based on an expansion of orbitals in terms of Gaussian functions and it is shown that the algorithm is competitive with more conventional approaches. The new approach is based on conjugated gradients optimization augmented by an approximate second-order update together with convergence acceleration. Computational advantages of the new algorithm are discussed under the special aspect of parallel computing. © 1997 John Wiley & Sons, Inc.  相似文献   

8.
A rigorous and practical approach for simulations of nonadiabatic quantum dynamics is introduced. The algorithm involves a natural extension of the matching-pursuitsplit-operator Fourier-transform (MPSOFT) method [Y. Wu and V. S. Batista, J. Chem. Phys. 121, 1676 (2004)] recently developed for simulations of adiabatic quantum dynamics in multidimensional systems. The MPSOFT propagation scheme, extended to nonadiabatic dynamics, recursively applies the time-evolution operator as defined by the standard perturbation expansion to first-, or second-order, accuracy. The expansion is implemented in dynamically adaptive coherent-state representations, generated by an approach that combines the matching-pursuit algorithm with a gradient-based optimization method. The accuracy and efficiency of the resulting propagation method are demonstrated as applied to the canonical model systems introduced by Tully for testing simulations of dual curve-crossing nonadiabatic dynamics.  相似文献   

9.
The application of the Σ-separation method to the calculation of multicenter two-electron molecular integrals with Slater-type basis functions is reported. The approach is based on the approximation of a scalar component of the two-center atomic density by a two-center expansion over Slater-type functions. A least-squares fit was used to determine the coefficients of the expansion. The angular multipliers of the atomic density were treated exactly. It is shown that this approach can serve as a sufficiently accurate and fast algorithm for the calculation of multicenter two-electron molecular integrals with Slater-type basis functions. © 1995 John Wiley & Sons, Inc.  相似文献   

10.
Biological networks are powerful representations of topological features in biological systems. Finding network motifs in biological networks is a computationally hard problem due to their huge size and abrupt increase of search space with the increase of motif size. Motivated by the computational challenges of network motif discovery and considering the importance of this topic, an efficient and scalable network motif discovery algorithm based on induced subgraphs in a dynamic expansion tree is proposed. This algorithm uses a pruning strategy to overcome the space limitation of the static expansion tree. The proposed algorithm can identify large network motifs up to size 15 by significantly reducing the computationally expensive subgraph isomorphism checks. Further, the present work avoids the unnecessary growth of patterns that do not have any statistical significance. The runtime performance of the proposed algorithm outperforms most of the existing algorithms for large network motifs.  相似文献   

11.
We present a new approach to automatically define a quasi-optimal minimal set of pharmacophoric points mapping the interaction properties of a user-defined ligand binding site. The method is based on a fitting algorithm where a grid of sampled interaction energies of the target protein with small chemical fragments in the binding site is approximated by a linear expansion of Gaussian functions. A heuristic approximation selects from this expansion the smallest possible set of Gaussians required to describe the interaction properties of the binding site within a prespecified accuracy. We have evaluated the performance of the approach by comparing the computed Gaussians with the positions of aromatic sites found in experimental protein-ligand complexes. For a set of 53 complexes, good correspondence is found in general. At a 95% significance level, approximately 65% of the predicted interaction points have an aromatic binding site within 1.5 A. We then studied the utility of these points in docking using the program DOCK. Short docking times, with an average of approximately 0.18 s per conformer, are obtained, while retaining, both for rigid and flexible docking, the ability to sample native-like binding modes for the ligand. An average 4-5-fold speed-up in docking times and a similar success rate is estimated with respect to the standard DOCK protocol.  相似文献   

12.
The simultaneous determination of NH4+ and K+ in solution has been attempted using a potentiometric sensor array and multivariate calibration. The sensors used are rather non-specific and of all-solid-state type, employing polymeric (PVC) membranes. The subsequent data processing is based on the use of a multilayer artificial neural network (ANN). This approach is given the name "electronic tongue" because it mimics the sense of taste in animals. The sensors incorporate, as recognition elements, neutral carriers belonging to the family of the ionophoric antibiotics. In this work the ANN type is optimized by studying its topology, the training algorithm, and the transfer functions. Also, different pretreatments of the starting data are evaluated. The chosen ANN is formed by 8 input neurons, 20 neurons in the hidden layer and 2 neurons in the output layer. The transfer function selected for the hidden layer was sigmoidal and linear for the output layer. It is also recommended to scale the starting data before training. A correct fit for the test data set is obtained when it is trained with the Bayesian regularization algorithm. The viability for the determination of ammonium and potassium ions in synthetic samples was evaluated; cumulative prediction errors of approximately 1% (relative values) were obtained. These results were comparable with those obtained with a generalized regression ANN as a reference algorithm. In a final application, results close to the expected values were obtained for the two considered ions, with concentrations between 0 and 40 mmol L–1.  相似文献   

13.
Molecular recognition events in solution are affected by many different factors that have hampered the development of an understanding of intermolecular interactions at a quantitative level. Our tendency is to partition these effects into discrete phenomenological fields that are classified, named, and divorced: aromatic interactions, cation-pi interactions, CH-O hydrogen bonds, short strong hydrogen bonds, and hydrophobic interactions to name a few.1 To progress in the field, we need to develop an integrated quantitative appreciation of the relative magnitudes of all of the different effects that might influence the molecular recognition behavior of a given system. In an effort to navigate undergraduates through the vast and sometimes contradictory literature on the subject, I have developed an approach that treats theoretical ideas and experimental observations about intermolecular interactions in the gas phase, the solid state, and solution from a single simplistic viewpoint. The key features are outlined here, and although many of the ideas will be familiar, the aim is to provide a semiquantitative thermodynamic ranking of these effects in solution at room temperature.  相似文献   

14.
Paired-permanent approach for VB theory is extensively developed. Canonical expansion of a paired-permanent is deduced. Furthermore, it is shown that a paired-permanent may be expressed in terms of the products of sub-paired-permanents of any given order and their corresponding minors. An ab initio spin-free valence bond program, called Xiamen, is implemented by using paired-permanent approach. Test calculation shows that Xiamen package is more efficient than some other programs based on the traditional VB algorithm, and it provides a new practical tool for quantum chemistry.  相似文献   

15.
Paired-permanent approach for VB theory is extensively developed. Canonical expansion of a paired-permanent is deduced. Furthermore, it is shown that a paired-permanent may be expressed in terms of the products of sub-paired-permanents of any given order and their corresponding minors. Anab initio spin-free valence bond program, called Xiamen, is implemented by using paired-permanent approach. Test calculation shows that Xiamen package is more efficient than some other programs based on the traditional VB algorithm, and it provides a new practical tool for quantum chemistry.  相似文献   

16.
Linearized mixed quantum-classical simulations are a promising approach for calculating time-correlation functions. At the moment, however, they suffer from some numerical problems that may compromise their efficiency and reliability in applications to realistic condensed-phase systems. In this paper, we present a method that improves upon the convergence properties of the standard algorithm for linearized calculations by implementing a cumulant expansion of the relevant averages. The effectiveness of the new approach is tested by applying it to the challenging computation of the diffusion of an excess electron in a metal-molten salt solution.  相似文献   

17.
Practical application of pattern recognition techniques is often limited because of the way in which variables are selected and the need for supplementation of missing data. The software package comprising the MICA transformation technique and the SOLOMON classification program is extended to include an algorithm for estimating and supplementing missing data. The performance of this package is compared with that of known classification techniques on simulated data sets. The supplementation algorithm is tested with a data set for hyperlipoproteinemia patients.  相似文献   

18.
Protein function prediction is one of the central problems in computational biology. We present a novel automated protein structure-based function prediction method using libraries of local residue packing patterns that are common to most proteins in a known functional family. Critical to this approach is the representation of a protein structure as a graph where residue vertices (residue name used as a vertex label) are connected by geometrical proximity edges. The approach employs two steps. First, it uses a fast subgraph mining algorithm to find all occurrences of family-specific labeled subgraphs for all well characterized protein structural and functional families. Second, it queries a new structure for occurrences of a set of motifs characteristic of a known family, using a graph index to speed up Ullman’s subgraph isomorphism algorithm. The confidence of function inference from structure depends on the number of family-specific motifs found in the query structure compared with their distribution in a large non-redundant database of proteins. This method can assign a new structure to a specific functional family in cases where sequence alignments, sequence patterns, structural superposition and active site templates fail to provide accurate annotation.  相似文献   

19.
Protein function prediction is one of the central problems in computational biology. We present a novel automated protein structure-based function prediction method using libraries of local residue packing patterns that are common to most proteins in a known functional family. Critical to this approach is the representation of a protein structure as a graph where residue vertices (residue name used as a vertex label) are connected by geometrical proximity edges. The approach employs two steps. First, it uses a fast subgraph mining algorithm to find all occurrences of family-specific labeled subgraphs for all well characterized protein structural and functional families. Second, it queries a new structure for occurrences of a set of motifs characteristic of a known family, using a graph index to speed up Ullman’s subgraph isomorphism algorithm. The confidence of function inference from structure depends on the number of family-specific motifs found in the query structure compared with their distribution in a large non-redundant database of proteins. This method can assign a new structure to a specific functional family in cases where sequence alignments, sequence patterns, structural superposition and active site templates fail to provide accurate annotation.  相似文献   

20.
Most electronic structure methods express the wavefunction as an expansion of N‐electron basis functions that are chosen to be either Slater determinants or configuration state functions. Although the expansion coefficient of a single determinant may be readily computed from configuration state function coefficients for small wavefunction expansions, traditional algorithms are impractical for systems with a large number of electrons and spatial orbitals. In this work, we describe an efficient algorithm for the evaluation of a single determinant expansion coefficient for wavefunctions expanded as a linear combination of graphically contracted functions. Each graphically contracted function has significant multiconfigurational character and depends on a relatively small number of variational parameters called arc factors. Because the graphically contracted function approach expresses the configuration state function coefficients as products of arc factors, a determinant expansion coefficient may be computed recursively more efficiently than with traditional configuration interaction methods. Although the cost of computing determinant coefficients scales exponentially with the number of spatial orbitals for traditional methods, the algorithm presented here exploits two levels of recursion and scales polynomially with system size. Hence, as demonstrated through applications to systems with hundreds of electrons and orbitals, it may readily be applied to very large systems. © 2009 Wiley Periodicals, Inc. J Comput Chem, 2009  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号