首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
In small molecule drug discovery projects, the receptor structure is not always available. In such cases it is enormously useful to be able to align known ligands in the way they bind in the receptor. Here we shall present an algorithm for the alignment of multiple small molecule ligands. This algorithm takes pre-generated conformers as input, and proposes aligned assemblies of the ligands. The algorithm consists of two stages: the first stage is to perform alignments for each pair of ligands, the second stage makes use of the results from the first stage to build up multiple ligand alignment assemblies using a novel iterative procedure. The scoring functions are improved versions of the one mentioned in our previous work. We have compared our results with some recent publications. While an exact comparison is impossible, it is clear that our algorithm is fast and produces very competitive results.  相似文献   

2.
We propose an algorithm of global multiple sequence alignment that is based on a measure of what we call information discrepancy. The algorithm follows a progressive alignment iteration strategy that makes use of what we call a function of degree of disagreement (FDOD). MSAID begins with distance calculation of pairwise sequences, based on FDOD as a numerical scoring measure. In the next step, the resulting distance matrix is used to construct a guide tree via the neighbor-joining method. The tree is then used to produce a multiple alignment. Current alignment is next used to produce a new matrix and a new tree (with FDOD scoring measure again). This iterative process continues until convergence criteria (or a stopping rule) are satisfied. MSAID was tested and compared with other prior methods by using reference alignments from BAliBASE 2.01. For the alignments with no large N/C-terminal extensions or internal insertions MSAID received the top overall average in the tests. Moreover, the results of testing indicate that MSAID performs as well as other alignment methods with an occasional tendency to perform better than these prior techniques. We, therefore, believe that MSAID is a solid and reliable method of choice, which is often (if not always) superior to other global alignment techniques.  相似文献   

3.
Multiple sequence alignment (MSA) is one of the fundamental research topics in computational biology. The alignments help us to find functional assignment, evolutionary history and conserved region. Previous methods use a substitution matrix and do not incorporate knowledge of the sequences being aligned. Therefore, they do not assure the alignment of similar structures and common patterns in the sequences. We have been investigating into the solution to the problem in multiple and making use of knowledge of the sequences being aligned, including patterns in the Prosite databank, Blocks+, eBlocks databases, as well as motif and structural information. A pattern-constrained algorithm has been developed. Experiments with protein sequences have shown more accurate alignments with incorporation of the domain knowledge available in the sequences.  相似文献   

4.
All currently leading protein secondary structure prediction methods use a multiple protein sequence alignment to predict the secondary structure of the top sequence. In most of these methods, prior to prediction, alignment positions showing a gap in the top sequence are deleted, consequently leading to shrinking of the alignment and loss of position-specific information. In this paper we investigate the effect of this removal of information on secondary structure prediction accuracy. To this end, we have designed SymSSP, an algorithm that post-processes the predicted secondary structure of all sequences in a multiple sequence alignment by (i) making use of the alignment's evolutionary information and (ii) re-introducing most of the information that would otherwise be lost. The post-processed information is then given to a new dynamic programming routine that produces an optimally segmented consensus secondary structure for each of the multiple alignment sequences. We have tested our method on the state-of-the-art secondary structure prediction methods PHD, PROFsec, SSPro2 and JNET using the HOMSTRAD database of reference alignments. Our consensus-deriving dynamic programming strategy is consistently better at improving the segmentation quality of the predictions compared to the commonly used majority voting technique. In addition, we have applied several weighting schemes from the literature to our novel consensus-deriving dynamic programming routine. Finally, we have investigated the level of noise introduced by prediction errors into the consensus and show that predictions of edges of helices and strands are half the time wrong for all the four tested prediction methods.  相似文献   

5.
For biological applications, sequence alignment is an important strategy to analyze DNA and protein sequences. Multiple sequence alignment is an essential methodology to study biological data, such as homology modeling, phylogenetic reconstruction and etc. However, multiple sequence alignment is a NP-hard problem. In the past decades, progressive approach has been proposed to successfully align multiple sequences by adopting iterative pairwise alignments. Due to rapid growth of the next generation sequencing technologies, a large number of sequences can be produced in a short period of time. When the problem instance is large, progressive alignment will be time consuming. Parallel computing is a suitable solution for such applications, and GPU is one of the important architectures for contemporary parallel computing researches. Therefore, we proposed a GPU version of ClustalW v2.0.11, called CUDA ClustalW v1.0, in this work. From the experiment results, it can be seen that the CUDA ClustalW v1.0 can achieve more than 33× speedups for overall execution time by comparing to ClustalW v2.0.11.  相似文献   

6.
Summary A new database of conserved amino acid residues is derived from the multiple sequence alignment of over 84 families of protein sequences that have been reported in the literature. This database contains sequences of conserved hydrophobic core patterns which are probably important for structure and function, since they are conserved for most sequences in that family. This database differs from other single-motif or signature databases reported previously, since it contains multiple patterns for each family. The new database is used to align a new sequence with the conserved regions of a family. This is analogous to reports in the literature where multiple sequence alignments are used to improve a sequence alignment. A program called Homology-Plot (suitable for IBM or compatible computers) uses this database to find homology of a new sequence to a family of protein sequences. There are several advantages to using multiple patterns. First, the program correctly identifies a new sequence as a member of a known family. Second, the search of the entire database is rapid and requires less than one minute. This is similar to performing a multiple sequence alignment of a new sequence to all of the known protein family sequences. Third, the alignment of a new sequence to family members is reliable and can reproduce the alignment of conserved regions already described in the literature. The speed and efficiency of this method is enhanced, since there is no need to score for insertions or deletions as is done in the more commonly used sequence alignment methods. In this method only the patterns are aligned. HomologyPlot also provides general information on each family, as well as a listing of patterns in a family.  相似文献   

7.
This study focuses on improving the multi-objective memetic algorithm for protein–protein interaction (PPI) network alignment, Optimizing Network Aligner – OptNetAlign, via integration with other existing network alignment methods such as SPINAL, NETAL and HubAlign. The output of this algorithm is an elite set of aligned networks all of which are optimal with respect to multiple user-defined criteria. However, OptNetAlign is an unsupervised genetic algorithm that initiates its search with completely random solutions and it requires substantial running times to generate an elite set of solutions that have high scores with respect to the given criteria. In order to improve running time, the search space of the algorithm can be narrowed down by focusing on remarkably qualified alignments and trying to optimize the most desired criteria on a more limited set of solutions. The method presented in this study improves OptNetAlign in a supervised fashion by utilizing the alignment results of different network alignment algorithms with varying parameters that depend upon user preferences. Therefore, the user can prioritize certain objectives upon others and achieve better running time performance while optimizing the secondary objectives.  相似文献   

8.
Modern protein secondary structure prediction methods are based on exploiting evolutionary information contained in multiple sequence alignments. Critical steps in the secondary structure prediction process are (i) the selection of a set of sequences that are homologous to a given query sequence, (ii) the choice of the multiple sequence alignment method, and (iii) the choice of the secondary structure prediction method. Because of the close relationship between these three steps and their critical influence on the prediction results, secondary structure prediction has received increased attention from the bioinformatics community over the last few years. In this treatise, we discuss recent developments in computational methods for protein secondary structure prediction and multiple sequence alignment, focus on the integration of these methods, and provide some recommendations for state-of-the-art secondary structure prediction in practice.  相似文献   

9.
Point Accepted Mutation (PAM) is the Markov model of amino acid replacements in proteins introduced by Dayhoff and her co-workers (Dayhoff et al., 1978). The PAM matrices and other matrices based on the PAM model have been widely accepted as the standard scoring system of protein sequence similarity in protein sequence alignment tools. Here, we present Contact Accepted mutatiOn (CAO), a Markov model of protein residue contact mutations. The CAO model simulates the interchanging of structurally defined side-chain contacts, and introduces additional structural information into protein sequence alignments. Therefore, similarities between structurally conserved sequences can be detected even without apparent sequence similarity. CAO has been benchmarked on the HOMSTRAD database and a subset of the CATH database, by comparing sequence alignments with reference alignments derived from structural superposition. CAO yields scores that reflect coherently the structural quality of sequence alignments, which has implications particularly for homology modelling and threading techniques.  相似文献   

10.
Herein, we describe a method to flexibly align molecules (FLAME = FLexibly Align MolEcules). FLAME aligns two molecules by first finding maximum common pharmacophores between them using a genetic algorithm. The resulting alignments are then subjected to simultaneous optimizations of their internal energies and an alignment score. The utility of the method in pairwise alignment, multiple molecule flexible alignment, and database searching was examined. For pairwise alignment, two carboxypeptidase ligands (Protein Data Bank codes and ), two estrogen receptor ligands ( and ), and two thrombin ligands ( and ) were used as test sets. Alignments generated by FLAME starting from CONCORD structures compared very well to the X-ray structures (average root-mean-square deviation = 0.36 A) even without further minimization in the presence of the protein. For multiple flexible alignments, five structurally diverse D3 receptor ligands were used as a test set. The FLAME alignment automatically identified three common pharmacophores: a base, a hydrogen-bond acceptor, and a hydrophobe/aromatic ring. The best alignment was then used to search the MDDR database. The search results were compared to the results using atom pair and Daylight fingerprint similarity. A similar database search comparison was also performed using estrogen receptor modulators. In both cases, hits identified by FLAME were structurally more diverse compared to those from the atom pair and Daylight fingerprint methods.  相似文献   

11.
Despite recent advances in fold recognition algorithms that identify template structures with distant homology to the target sequence, the quality of the target-template alignment can be a major problem for distantly related proteins in comparative modeling. Here we report for the first time on the use of ensembles of pairwise alignments obtained by stochastic backtracking as a means to improve three-dimensional comparative protein models. In every one of the 35 cases, the ensemble produced by the program probA resulted in alignments that were closer to the structural alignment than those obtained from the optimal alignment. In addition, we examined the lowest energy structure among these ensembles from four different structural assessment methods and compared these with the optimal and structural alignment model. The structural assessment methods consisted of the DFIRE, DOPE, and ProsaII statistical potential energies and the potential energy from the CHARMM protein force field coupled to a Generalized Born implicit solvent model. The results demonstrate that the generation of alignment ensembles through stochastic backtracking using probA combined with one of the statistical potentials for assessing three-dimensional structures can be used to improve comparative models.  相似文献   

12.
A new type of molecular representation is introduced that is based on activity class characteristic substructures extracted from random fragment populations. Mapping of characteristic substructures is used to determine atom match rates in active molecules. Comparison of match rates of bonded atoms defines a hierarchical molecular fragmentation scheme. Active compounds are encoded as fragmentation pathways isolated from core trees. These paths are amenable to biological sequence alignment methods in combination with substructure-based scoring functions. From multiple core path alignments, consensus fragment sequences are derived that represent compound activity classes. Consensus fragment sequences weighted by increasing structural specificity can also be used to map molecules and search databases for active compounds.  相似文献   

13.
We describe and demonstrate a method for the simultaneous, fully flexible alignment of multiple molecules with a common biological activity. The key aspect of the algorithm is that the alignment problem is first solved in a lower dimensional space, in this case using the one-dimensional representations of the molecules. The three-dimensional alignment is then guided by constraints derived from the one-dimensional alignment. We demonstrate using 10 hERG channel blockers, with a total of 72 rotatable bonds, that the one-dimensional alignment is able to effectively isolate key conserved pharmacophoric features and that these conserved features can effectively guide the three-dimensional alignment. Further using 10 estrogen receptor agonists and 5 estrogen receptor antagonists with publicly available cocrystal structures we show that the method is able to produce superpositions comparable to those derived from crystal structures. Finally, we demonstrate, using examples from peptidic CXCR3 agonists, that the method is able to generate reasonable binding hypotheses.  相似文献   

14.
The superfamily of ligand-gated ion channels (LGICs) has been implicated in anesthetic and alcohol responses. Mutations within glycine and GABA receptors have demonstrated that possible sites of anesthetic action exist within the transmembrane subunits of these receptors. The exact molecular arrangement of this transmembrane region remains at intermediate resolution with current experimental techniques. Homology modeling methods were therefore combined with experimental data to produce a more exact model of this region. A consensus from multiple bioinformatics techniques predicted the topology within the transmembrane domain of a glycine alpha one receptor (GlyRa1) to be alpha helical. This fold information was combined with sequence information using the SeqFold algorithm to search for modeling templates. Independently, the FoldMiner algorithm was used to search for templates that had structural folds similar to published coordinates of the homologous nAChR (1OED). Both SeqFold and Foldminer identified the same modeling template. The GlyRa1 sequence was aligned with this template using multiple scoring criteria. Refinement of the alignment closed gaps to produce agreement with labeling studies carried out on the homologous receptors of the superfamily. Structural assignment and refinement was achieved using Modeler. The final structure demonstrated a cavity within the core of a four-helix bundle. Residues known to be involved in modulating anesthetic potency converge on and line this cavity. This suggests that the binding sites for volatile anesthetics in the LGICs are the cavities formed within the core of transmembrane four-helix bundles.  相似文献   

15.
X-ray-based alignments of bioactive compounds are commonly used to correlate structural changes with changes in potencies, ultimately leading to three-dimensional quantitative structure–activity relationships such as CoMFA or CoMSIA models that can provide further guidance for the design of new compounds. We have analyzed data sets where the alignment of the compounds is entirely based on experimentally derived ligand poses from X-ray-crystallography. We developed CoMFA and CoMSIA models from these X-ray-determined receptor-bound conformations and compared the results with models generated from ligand-centric Template CoMFA, finding that the fluctuations in the positions and conformations of compounds dominate X-ray-based alignments can yield poorer predictions than those from the self-consistent template CoMFA alignments. Also, when there exist multiple different binding modes, structural interpretation in terms of binding site constraints can often be simpler with template-based alignments than with X-ray-based alignments.  相似文献   

16.
17.
Triptycenes have general applicability for increasing the alignment of fluorescent and dichroic dyes in LC hosts. Dyes containing varying numbers of triptycenes were synthesized to study the effect of free-volume alignment of triptycenes on the alignment of dyes. These dyes were designed such that multiple triptycenes could be incorporated and the triptycene-free volume is coincident to the aspect ratio of the dye, allowing a cooperative effect to increase their overall average alignment. With increasing triptycene incorporation, a stepwise increase in the alignment parameters of each dye was seen. It was also found that the attachment of one triptycene group has a negligible effect on the optical switching response times of the dyes. This can be a powerful tool for designing dyes with higher alignments for a variety of applications including guest-host reflective LCDs and holographic data storage.  相似文献   

18.
蛋白质折叠类型的分类建模与识别   总被引:2,自引:0,他引:2  
刘岳  李晓琴  徐海松  乔辉 《物理化学学报》2009,25(12):2558-2564
蛋白质的氨基酸序列如何决定空间结构是当今生命科学研究中的核心问题之一. 折叠类型反映了蛋白质核心结构的拓扑模式, 折叠识别是蛋白质序列-结构研究的重要内容. 我们以占Astral 1.65序列数据库中α, β和α/β三类蛋白质总量41.8%的36个无法独立建模的折叠类型为研究对象, 选取其中序列一致性小于25%的样本作为训练集, 以均方根偏差(RMSD)为指标分别进行系统聚类, 生成若干折叠子类, 并对各子类建立基于多结构比对算法(MUSTANG)结构比对的概形隐马尔科夫模型(profile-HMM). 将Astral 1.65中序列一致性小于95%的9505个样本作为检验集, 36个折叠类型的平均识别敏感性为90%, 特异性为99%, 马修斯相关系数(MCC)为0.95. 结果表明: 对于成员较多, 无法建立统一模型的折叠类型, 基于RMSD的系统分类建模均可实现较高准确率的识别, 为蛋白质折叠识别拓展了新的方法和思路, 为进一步研究奠定了基础.  相似文献   

19.
Summary P450SU1 and P450SU2 are herbicide-inducible bacterial cytochrome P450 enzymes from Streptomyces griseolus. They have two of the highest sequence identities to camphor hydroxylase (P450cam from Pseudomonas putida), the cytochrome P450 with the first known crystal structure. We have built several models of these two proteins to investigate the variability in the structures that can occur from using different modeling protocols. We looked at variability due to alignment methods, backbone loop conformations and refinement methods. We have constructed two models for each protein using two alignment algorithms, and then an additional model using an identical alignment but different loop conformations for both buried and surface loops. The alignments used to build the models were created using the Needleman-Wunsch method, adapted for multiple sequences, and a manual method that utilized both a dotmatrix search matrix and the Needleman-Wunsch method. After constructing the initial models, several energy minimization methods were used to explore the variability in the final models caused by the choice of minimization techniques. Features of cytochrome P450cam and the cytochrome P450 superfamily, such as the ferredoxin binding site, the heme binding site and the substrate binding site were used to evaluate the validity of the models. Although the final structures were very similar between the models with different alignments, active-site residues were found to be dependent on the conformations of buried loops and early stages of energy minimization. We show which regions of the active site are the most dependent on the particular methods used, and which parts of the structures seem to be independent of the methods.  相似文献   

20.
Profile-profile alignment algorithms have proven powerful for recognizing remote homologs and generating alignments by effectively integrating sequence evolutionary information into scoring functions. In comparison to scoring function, the development of gap penalty functions has rarely been addressed in profile-profile alignment algorithms. Although indel frequency profiles have been used to construct profile-based variable gap penalties in some profile-profile alignment algorithms, there is still no fair comparison between variable gap penalties and traditional linear gap penalties to quantify the improvement of alignment accuracy. We compared two linear gap penalty functions, the traditional affine gap penalty (AGP) and the bilinear gap penalty (BGP), with two profile-based variable gap penalty functions, the Profile-based Gap Penalty used in SP(5) (SPGP) and a new Weighted Profile-based Gap Penalty (WPGP) developed by us, on some well-established benchmark datasets. Our results show that profile-based variable gap penalties get limited improvements than linear gap penalties, whether incorporated with secondary structure information or not. Secondary structure information appears less powerful to be incorporated into gap penalties than into scoring functions. Analysis of gap length distributions indicates that gap penalties could stably maintain corresponding distributions of gap lengths in their alignments, but the distribution difference from reference alignments does not reflect the performance of gap penalties. There is useful information in indel frequency profiles, but it is still not good enough for improving alignment accuracy when used in profile-based variable gap penalties. All of the methods tested in this work are freely accessible at http://protein.cau.edu.cn/gppat/.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号