首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 546 毫秒
1.
2.
A novel shape-based method has been developed for overlaying a series of molecule surfaces into a common reference frame. The surfaces are represented by a set of circular patches of approximately constant curvature. Two molecules are overlaid using a clique-detection algorithm to find a set of patches in the two surfaces that correspond, and overlaying the molecules so that the similar patches on the two surfaces are coincident. The method is thus able to detect areas of local, rather than global, similarity. A consensus overlay for a group of molecules is performed by examining the scores of all pairwise overlays and performing a set of overlays with the highest scores. The utility of the method has been examined by comparing the overlaid and experimental configurations of 4 sets of molecules for which there are X-ray crystal structures of the molecules bound to a protein active site. Results for the overlays are generally encouraging. Of particular note is the correct prediction of the `reverse orientation' for ligands binding to human rhinovirus coat protein HRV14.  相似文献   

3.
Determining a one-to-one atom correspondence between two chemical compounds is important to measure molecular similarities and to find compounds with similar biological activities. This calculation can be formalized as the maximum common substructure (MCS) problem, which is well-studied and has been shown to be NP-complete. Although many rigorous and heuristic algorithms have been developed, none of these algorithms is sufficiently fast and accurate. We developed a new program, called "kcombu" using a build-up algorithm, which is a type of the greedy heuristic algorithms. The program can search connected and disconnected MCSs as well as topologically constrained disconnected MCS (TD-MCS), which is introduced in this study. To evaluate the performance of our program, we prepared two correct standards: the exact correspondences generated by the maximum clique algorithms and the 3D correspondences obtained from superimposed 3D structure of the molecules in a complex 3D structure with the same protein. For the five sets of molecules taken from the protein structure database, the agreement value between the build-up and the exact correspondences for the connected MCS is sufficiently high, but the computation time of the build-up algorithm is much smaller than that of the exact algorithm. The comparison between the build-up and the 3D correspondences shows that the TD-MCS has the best agreement value among the other types of MCS. Additionally, we observed a strong correlation between the molecular similarity and the agreement with the correct and 3D correspondences; more similar molecule pairs are more correctly matched. Molecular pairs with more than 40% Tanimoto similarities can be correctly matched for more than half of the atoms with the 3D correspondences.  相似文献   

4.
确定蛋白质-短肽复合物结构的新方法   总被引:1,自引:1,他引:0  
大部分蛋白质 -蛋白质复合物的三维结构在接触表面都显示出很好的几何匹配 .由于蛋白质的表面几何形状和其它的一些物理化学性质在分子的专一性相互作用中起了主要作用 ,所以 ,接触表面几何形状的互补常常被认为是蛋白质分子识别的基础 .一般来说 ,蛋白质接触表面的几何匹配只涉及 5到 1 0几个紧密堆积的氨基酸残基 ,因此 ,蛋白质与蛋白质配体之间的识别计算可以通过蛋白质与突变周围的或与蛋白质表面紧密接触的配体肽段的识别计算来实现 . Stoddard等 [1] 已经利用从 MBP上选取的八肽成功地计算出接近晶体结构的 MBP-受体复合物 .许多研…  相似文献   

5.
A new method for the computerized search and identification of infrared spectra has been developed and evaluated. Based on cross-correlation, the search system utilizes all spectral information in a digitized spectrum when it attempts to match an unknown spectrum to one in a small library of known spectra. To evaluate a spectral match, the search program calculates the cross-correlation function between the unknown and known (library) spectra which indicates their degree of similarity and allows library spectra to be ranked in order of probability of match to the unknown spectrum. In this study, several small infrared spectral libraries of structurally similar compounds were searched under conditions which examined the sensitivity of the search method to chemical and instrumental variations. Because the correlation technique is slower than conventional file-searching methods, it will probably find greatest use in the search of small collections of similar spectra or as a match-ranking procedure following preliminary selection by a faster search method.  相似文献   

6.
The performance of the algorithm COMPLX for detecting protein-ligand or other macromolecular complexes has been tested for highly complex data sets. These data contain m/z values for ions of proteins of the SWISS-PROT database within simulated biological mixtures where each component shares a similar molecular weight and/or isoelectric point (pI). As many as 1600 ion signals were entered to challenge the algorithm to identify ion signals associated with a single protein complex that has been ionised and detected within a mass spectrometer. Despite the complexity of such data sets, the algorithm is shown to be able to identify the presence of individual bimolecular complexes. The output data can be re-evaluated by the user as necessary in light of any additional information that is known concerning the nature of predicted associations, as well as the quality of the data-set in terms of errors in m/z values as a direct consequence of the mass calibration or resolution achieved. The data presented illustrates that the best results are obtained when output results are ranked according to the largest continuous series of ion pairs detected for a protein or macromolecule and its complex for which the ligand mass is assigned the lowest mass error.  相似文献   

7.
A new algorithm has been designed and tested to identify protein, or any other macromolecular, complexes that have been widely reported in mass spectral data. The program takes advantage of the appearance of multiply charged ions that are common to both electrospray ionization and, to a lesser extent, matrix-assisted laser desorption/ionization (MALDI) mass spectra. The algorithm, known as COMPLX for the COMposition of Protein-Ligand compleXes, is capable of identifying complexes for any protein or macromolecule with a binding partner of molecular mass up to 100 000 Da. It does so by identifying ion pairs present in a mass spectrum that, when they share a common charge, have an m/z value difference that is an integer fraction of a ligand or binding partner molecular mass. Several additional criteria must be met in order for the result to be ranked in the output file including that all m/z values for ions of the protein or complex have progressively lower values as their assigned charge increases, the difference between the m/z values for adjacent charge states (z, z + 1) decrease as the assigned charge state increases, and the ratio of any two m/z values assigned to a protein or complex is equal to the inverse ratio of their charge. The entries that satisfy these criteria are then ranked according to the appearance of ions in the mass spectrum associated with the binding partner, the length of a continuous series of charges across any set of ions for a protein and complex and the lowest error recorded for the molecular mass of the ligand or binding partner. A diverse range of hypothetical and experimental mass spectral data were used to implement and test the program, including those recorded for antibody-peptide, protein-peptide and protein-heme complexes. Spectra of increasing complexity, in terms of the number of ions input, were also successfully analysed in which the number of input m/z values far exceeds the few associated with a macromolecular complex. Thus the program will be of value in a future goal of proteomics, where mass spectrometry already plays a central role, for the direct analysis of protein and other associations within biological extracts.  相似文献   

8.
No universally accepted score is currently available to determine when a matrix-assisted laser desorption ionization (MALDI) peptide mass fingerprint (PMF) experiment has been successfully carried out. We describe a software program (ChemApplex) based on a calculated parameter (Combined Protein Score) that takes into account (1) peak intensity, (2) the mass accuracy of the match, and (3) ChemScore, a theoretical intensity factor that estimates the probability of observing a particular peptide based on a combination of chemical considerations, in particular the amino acid composition of the peptide and the amino acid sequence of the amino acids that span the cleavage site. When these three factors are taken into account both at the level of individual peptides and at the protein level, protein components in mixtures whose peptides contribute less than 1% of the total intensity can often be correctly identified, as is demonstrated for mixtures of standard proteins. Moreover, it is possible to make robust database identifications that are nearly independent of the number of masses submitted and the mass error threshold used for matching. Protein scoring based on Combined Protein Score is orthogonal to many of the commonly used probability-based scoring schemes, and makes it possible to archive a more complete set of parameters that more thoroughly characterize the validity of the database match, which increases the confidence in the identifications.  相似文献   

9.
Combinatorial chemistry is widely used in drug discovery. Once a lead compound has been identified, a series of R-groups and reagents can be selected and combined to generate new potential drugs. The combinatorial nature of this problem leads to chemical libraries containing usually a very large number of virtual compounds, far too large to permit their chemical synthesis. Therefore, one often wants to select a subset of "good" reagents for each R-group of reagents and synthesize all their possible combinations. In this research, one encounters some difficulties. First, the selection of reagents has to be done such that the compounds of the resulting sublibrary simultaneously optimize a series of chemical properties. For each compound, a desirability index, a concept proposed by Harrington,(20) is used to summarize those properties in one fitness value. Then a loss function is used as objective criteria to globally quantify the quality of a sublibrary. Second, there are a huge number of possible sublibraries, and the solutions space has to be explored as fast as possible. The WEALD algorithm proposed in this paper starts with a random solution and iterates by applying exchanges, a simple method proposed by Fedorov(13) and often used in the generation of optimal designs. Those exchanges are guided by a weighting of the reagents adapted recursively as the solutions space is explored. The algorithm is applied on a real database and reveals to converge rapidly. It is compared to results given by two other algorithms presented in the combinatorial chemistry literature: the Ultrafast algorithm of D. Agrafiotis and V. Lobanov and the Piccolo algorithm of W. Zheng et al.  相似文献   

10.
In many modern chemoinformatics systems, molecules are represented by long binary fingerprint vectors recording the presence or absence of particular features or substructures, such as labeled paths or trees, in the molecular graphs. These long fingerprints are often compressed to much shorter fingerprints using a simple modulo operation. As the length of the fingerprints decreases, their typical density and overlap tend to increase, and so does any similarity measure based on overlap, such as the widely used Tanimoto similarity. Here we show that this correlation between shorter fingerprints and higher similarity can be thought of as a systematic error introduced by the fingerprint folding algorithm and that this systematic error can be corrected mathematically. More precisely, given two molecules and their compressed fingerprints of a given length, we show how a better estimate of their uncompressed overlap, hence of their similarity, can be derived to correct for this bias. We show how the correction can be implemented not only for the Tanimoto measure but also for all other commonly used measures. Experiments on various data sets and fingerprint sizes demonstrate how, with a negligible computational overhead, the correction noticeably improves the sensitivity and specificity of chemical retrieval.  相似文献   

11.
Nonlinear least-squares regression is a valuable tool for gaining chemical insights into complex systems. Yet, the success of nonlinear regression as measured by residual sum of squares (RSS), correlation, and reproducibility of fit parameters strongly depends on the availability of a good initial solution. Without such, iterative algorithms quickly become trapped in an unfavorable local RSS-minimum. For determining an initial solution, a high-dimensional parameter space needs to be screened, a process that is very time-consuming but can be parallelized. Another advantage of parallelization is equally important: After determining initial solutions, the used processors can be tasked to each optimize an initial guess. Even if several of these optimizations become stuck in a shallow local RSS-minimum, other processors continue and improve the regression outcome. A software package for parallel processing-based constrained nonlinear regression (RegressionLab) has been developed, implemented, and tested on a variety of hardware configurations. As proof-of-principle, microalgae to environment interactions have been studied by infrared attenuated total reflection spectroscopy. Additionally, light microscopy has been used to monitor cell production. It is shown that spectroscopic data sets with 10,000?s of data points and >1000 nonlinear model parameters as well as imaging data with 100,000s of data points and >2000 nonlinear model parameters may now be investigated by constrained nonlinear regression. Acceleration factors of up to 8.1 have been obtained which is of high practical relevance when computations take weeks on single-processor machines. Solely using parallel processing, the RSS values may be improved up to a factor of 5.5.  相似文献   

12.
Herbal medicines are becoming again more popular in the developed countries because being “natural” and people thus often assume that they are inherently safe. Herbs have also been used worldwide for many centuries in the traditional medicines. The concern of their safety and efficacy has grown since increasing western interest. Herbal materials and their extracts are very complex, often including hundreds of compounds. A thorough understanding of their chemical composition is essential for conducting a safety risk assessment. However, herbal material can show considerable variability. The chemical constituents and their amounts in a herb can be different, due to growing conditions, such as climate and soil, the drying process, the harvest season, etc. Among the analytical methods, chromatographic fingerprinting has been recommended as a potential and reliable methodology for the identification and quality control of herbal medicines. Identification is needed to avoid fraud and adulteration. Currently, analyzing chromatographic herbal fingerprint data sets has become one of the most applied tools in quality assessment of herbal materials. Mostly, the entire chromatographic profiles are used to identify or to evaluate the quality of the herbs investigated. Occasionally only a limited number of compounds are considered. One approach to the safety risk assessment is to determine whether the herbal material is substantially equivalent to that which is either readily consumed in the diet, has a history of application or has earlier been commercialized i.e. to what is considered as reference material. In order to help determining substantial equivalence using fingerprint approaches, a quantitative measurement of similarity is required. In this paper, different (dis)similarity approaches, such as (dis)similarity metrics or exploratory analysis approaches applied on herbal medicinal fingerprints, are discussed and illustrated with several case studies.  相似文献   

13.
As far as more complex systems are being accessible for quantum chemical calculations, the reliability of the algorithms used becomes increasingly important. Trust-region strategies comprise a large family of optimization algorithms that incorporates both robustness and applicability for a great variety of problems. The objective of this work is to provide a basic algorithm and an adequate theoretical framework for the application of globally convergent trust-region methods to electronic structure calculations. Closed shell restricted Hartree-Fock calculations are addressed as finite-dimensional nonlinear programming problems with weighted orthogonality constraints. A Levenberg-Marquardt-like modification of a trust-region algorithm for constrained optimization is developed for solving this problem. It is proved that this algorithm is globally convergent. The subproblems that ensure global convergence are easy-to-compute projections and are dependent only on the structure of the constraints, thus being extendable to other problems. Numerical experiments are presented, which confirm the theoretical predictions. The structure of the algorithm is such that accelerations can be easily associated without affecting the convergence properties.  相似文献   

14.
Instead of usual rationale for chromatographic fingerprint based sample identification which relies upon visual inspection or principal component analysis of raw or aligned chromatograms novel nonparametric statistical measure of fingerprint set homogeneity is proposed. Randomization test is applied for significance analysis of fingerprint set homogeneity while average maximum crosscorrelation is used as a merit function. Chromatogram sets generated by random selection from standard and unknown sample chromatogram collections are compared with respect to merit function values with set of chromatograms that represents standard and/or unknown sample. In that instance fingerprint homogeneity significance is represented by the fraction of random chromatogram sets that have higher merit values than the standard and/or unknown sample sets. A set of peptide maps corresponding to different haemoglobin variants has been selected for evaluation of proposed test. This approach is compared to chromatogram alignment based on correlation optimized warping coupled with principal component or cluster analysis. Proposed method is simple i.e. straightforward sample identification procedure which reliability has been evaluated here. Impact of this approach on peptide mapping validation and system suitability analysis is discussed.  相似文献   

15.
The large sizes of today's chemical databases require efficient algorithms to perform similarity searches. It can be very time consuming to compare two large chemical databases. This paper seeks to build upon existing research efforts by describing a novel strategy for accelerating existing search algorithms for comparing large chemical collections. The quest for efficiency has focused on developing better indexing algorithms by creating heuristics for searching individual chemical against a chemical library by detecting and eliminating needless similarity calculations. For comparing two chemical collections, these algorithms simply execute searches for each chemical in the query set sequentially. The strategy presented in this paper achieves a speedup upon these algorithms by indexing the set of all query chemicals so redundant calculations that arise in the case of sequential searches are eliminated. We implement this novel algorithm by developing a similarity search program called Symmetric inDexing or SymDex. SymDex shows over a 232% maximum speedup compared to the state-of-the-art single query search algorithm over real data for various fingerprint lengths. Considerable speedup is even seen for batch searches where query set sizes are relatively small compared to typical database sizes. To the best of our knowledge, SymDex is the first search algorithm designed specifically for comparing chemical libraries. It can be adapted to most, if not all, existing indexing algorithms and shows potential for accelerating future similarity search algorithms for comparing chemical databases.  相似文献   

16.
We describe an algorithm for the automated generation of molecular structures subject to geometric and connectivity constraints. The method relies on simulated annealing and simplex optimization of a penalty function that contains a variety of conditions and can be useful in structure-based drug design projects. The procedure controls the diversity and complexity of the generated molecules. Structure selection filters are an integral part and drive the algorithm. Several procedures have been developed to achieve reliable control. A number of template sets can be defined and combined to control the range of molecules which are searched. Ring systems are predefined. Normally, the ring-system complexity is one of the most elusive and difficult factors to control when fusion-, bridge- and spiro-structures are built by joining templates. Here this is not an issue; the decision about which systems are acceptable, and which are not, is made before the run is initiated. Queries for inclusion and exclusion spheres are incorporated into the objective function, and, by using a flexible notation, the structure generation can be directed and more focused. Simulated annealing is a reliable optimizer and converges asymptotically to the global minimum. The objective functions used here are degenerate, so it is likely that each run will produce a different set of good solutions.  相似文献   

17.
18.
提出一种用连续波长的紫外光谱吸光度数据对两组分防腐剂(苯甲酸钠、山梨酸钾)混合物体系的定量分析方法。该方法利用连续波长信息将摩尔吸光系数和待测浓度同时作为自变量,建立非线性优化模型,对于组分浓度差较小的混合物体系经一次优化分析计算可得待测浓度;混合物中浓度差较大的体系经四次左右优化迭代,逐步降低误差得分析结果,相对误差可控制在1.52%之内。分析结果表明该法稳定、准确、简便快速、实用灵活,可对食品防腐剂进行定量测定。  相似文献   

19.
Summary Mutual binding between a ligand of low molecular weight and its macromolecular receptor demands structural complementarity of both species at the recognition site. To predict binding properties of new molecules before synthesis, information about possible conformations of drug molecules at the active site is required, especially if the 3D structure of the receptor is not known. The statistical analysis of small-molecule crystal data allows one to elucidate conformational preferences of molecular fragments and accordingly to compile libraries of putative ligand conformations. A comparison of geometries adopted by corresponding fragments in ligands bound to proteins shows similar distributions in conformation space. We have developed an automatic procedure that generates different conformers of a given ligand. The entire molecule is decomposed into its individual ring and open-chain torsional fragments, each used in a variety of favorable conformations. The latter ones are produced according to the library information about conformational preferences. During this building process, an extensive energy ranking is applied. Conformers ranked as energetically favorable are subjected to an optimization in torsion angle space. During minimization, unfavorable van der Waals interactions are removed while keeping the open-chain torsion angles as close as possible to the experimentally most frequently observed values. In order to assess how well the generated conformers map conformation space, a comparison with experimental data has been performed. This comparison gives some confidence in the efficiency and completeness of this approach. For some ligands that had been structurally characterized by protein crystallography, the program was used to generate sets of some 10 to 100 conformers. Among these, geometries are found that fall convincingly close to the conformations actually adopted by these ligands at the binding site.  相似文献   

20.
Four different two-dimensional fingerprint types (MACCS, Unity, BCI, and Daylight) and nine methods of selecting optimal cluster levels from the output of a hierarchical clustering algorithm were evaluated for their ability to select clusters that represent chemical series present in some typical examples of chemical compound data sets. The methods were evaluated using a Ward's clustering algorithm on subsets of the publicly available National Cancer Institute HIV data set, as well as with compounds from our corporate data set. We make a number of observations and recommendations about the choice of fingerprint type and cluster level selection methods for use in this type of clustering  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号