首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
It is often difficult to differentiate effectively between related G-protein coupled receptors and their subtypes when doing ligand-based drug design. GALAHAD uses a multi-objective scoring system to generate multiple alignments involving alternative trade-offs between the conflicting desires to minimize internal strain while maximizing pharmacophoric and steric (pharmacomorphic) concordance between ligands. The various overlays obtained can be associated with different subtypes by examination, even when the ligands available do not discriminate completely between receptors and when no specificity information has been used to bias the alignment process. This makes GALAHAD a potentially powerful tool for identifying discriminating models, as is illustrated here using a set of dopaminergic agonists that vary in their D1 vs. D2 receptor selectivity.  相似文献   

2.
We propose an algorithm of global multiple sequence alignment that is based on a measure of what we call information discrepancy. The algorithm follows a progressive alignment iteration strategy that makes use of what we call a function of degree of disagreement (FDOD). MSAID begins with distance calculation of pairwise sequences, based on FDOD as a numerical scoring measure. In the next step, the resulting distance matrix is used to construct a guide tree via the neighbor-joining method. The tree is then used to produce a multiple alignment. Current alignment is next used to produce a new matrix and a new tree (with FDOD scoring measure again). This iterative process continues until convergence criteria (or a stopping rule) are satisfied. MSAID was tested and compared with other prior methods by using reference alignments from BAliBASE 2.01. For the alignments with no large N/C-terminal extensions or internal insertions MSAID received the top overall average in the tests. Moreover, the results of testing indicate that MSAID performs as well as other alignment methods with an occasional tendency to perform better than these prior techniques. We, therefore, believe that MSAID is a solid and reliable method of choice, which is often (if not always) superior to other global alignment techniques.  相似文献   

3.
Triptycenes have general applicability for increasing the alignment of fluorescent and dichroic dyes in LC hosts. Dyes containing varying numbers of triptycenes were synthesized to study the effect of free-volume alignment of triptycenes on the alignment of dyes. These dyes were designed such that multiple triptycenes could be incorporated and the triptycene-free volume is coincident to the aspect ratio of the dye, allowing a cooperative effect to increase their overall average alignment. With increasing triptycene incorporation, a stepwise increase in the alignment parameters of each dye was seen. It was also found that the attachment of one triptycene group has a negligible effect on the optical switching response times of the dyes. This can be a powerful tool for designing dyes with higher alignments for a variety of applications including guest-host reflective LCDs and holographic data storage.  相似文献   

4.
Point Accepted Mutation (PAM) is the Markov model of amino acid replacements in proteins introduced by Dayhoff and her co-workers (Dayhoff et al., 1978). The PAM matrices and other matrices based on the PAM model have been widely accepted as the standard scoring system of protein sequence similarity in protein sequence alignment tools. Here, we present Contact Accepted mutatiOn (CAO), a Markov model of protein residue contact mutations. The CAO model simulates the interchanging of structurally defined side-chain contacts, and introduces additional structural information into protein sequence alignments. Therefore, similarities between structurally conserved sequences can be detected even without apparent sequence similarity. CAO has been benchmarked on the HOMSTRAD database and a subset of the CATH database, by comparing sequence alignments with reference alignments derived from structural superposition. CAO yields scores that reflect coherently the structural quality of sequence alignments, which has implications particularly for homology modelling and threading techniques.  相似文献   

5.
Capillary electrophoresis–mass spectrometry (CE–MS) is a powerful technique for the analysis of small soluble compounds in biological fluids. A major drawback of CE is the poor migration time reproducibility, which makes it difficult to combine data from different experiments and correctly assign compounds. A number of alignment algorithms have been developed but not all of them can cope with large and irregular time shifts between CE–MS runs. Here we present a genetic algorithm designed for alignment of CE–MS data using accurate mass information. The utility of the algorithm was demonstrated on real data, and the results were compared with one of the existing packages. The new algorithm showed a significant reduction of elution time variation in the aligned datasets. The importance of mass accuracy for the performance of the algorithm was also demonstrated by comparing alignments of datasets from a standard time-of-flight (TOF) instrument with those from the new ultrahigh resolution TOF maXis (Bruker Daltonics).  相似文献   

6.
New empirical scoring functions have been developed to estimate the binding affinity of a given protein-ligand complex with known three-dimensional structure. These scoring functions include terms accounting for van der Waals interaction, hydrogen bonding, deformation penalty, and hydrophobic effect. A special feature is that three different algorithms have been implemented to calculate the hydrophobic effect term, which results in three parallel scoring functions. All three scoring functions are calibrated through multivariate regression analysis of a set of 200 protein-ligand complexes and they reproduce the binding free energies of the entire training set with standard deviations of 2.2 kcal/mol, 2.1 kcal/mol, and 2.0 kcal/mol, respectively. These three scoring functions are further combined into a consensus scoring function, X-CSCORE. When tested on an independent set of 30 protein-ligand complexes, X-CSCORE is able to predict their binding free energies with a standard deviation of 2.2 kcal/mol. The potential application of X-CSCORE to molecular docking is also investigated. Our results show that this consensus scoring function improves the docking accuracy considerably when compared to the conventional force field computation used for molecular docking.  相似文献   

7.
We have investigated the performance of five well known scoring functions in predicting the binding affinities of a diverse set of 205 protein-ligand complexes with known experimental binding constants, and also on subsets of mutually similar complexes. We have found that the overall performance of the scoring functions on the diverse set is disappointing, with none of the functions achieving r(2) values above 0.32 on the whole dataset. Performance on the subsets was mixed, with four of the five functions predicting fairly well the binding affinities of 35 proteinases, but none of the functions producing any useful correlation on a set of 38 aspartic proteinases. We consider two algorithms for producing consensus scoring functions, one based on a linear combination of scores from the five individual functions and the other on averaging the rankings produced by the five functions. We find that both algorithms produce consensus functions that generally perform slightly better than the best individual scoring function on a given dataset.  相似文献   

8.
In small molecule drug discovery projects, the receptor structure is not always available. In such cases it is enormously useful to be able to align known ligands in the way they bind in the receptor. Here we shall present an algorithm for the alignment of multiple small molecule ligands. This algorithm takes pre-generated conformers as input, and proposes aligned assemblies of the ligands. The algorithm consists of two stages: the first stage is to perform alignments for each pair of ligands, the second stage makes use of the results from the first stage to build up multiple ligand alignment assemblies using a novel iterative procedure. The scoring functions are improved versions of the one mentioned in our previous work. We have compared our results with some recent publications. While an exact comparison is impossible, it is clear that our algorithm is fast and produces very competitive results.  相似文献   

9.
Summary Today, more than 40 protein amino acid (AA) sequences of membrane receptors coupled to guanine nucleotide binding proteins (G-proteins) are available. For those working in the field of medicinal chemistry, these sequences present a new type of information that should be taken into consideration. To make maximal use of sequence data it is essential to be able to compare different protein sequences in a similar way to that used for small molecules. A prerequisite, however, is the availability of a processing environment that enables one to handle sequences in an easy way, both by hand and by computer. In order to meet these ends, the package CGEMA (Colour Graphics Editor for Multiple Alignment) was developed in our laboratory. The programme uses a user-definable colour coding for the different AAs. Sequences can be aligned by hand or by computer, using VGAP, and both approaches can be combined. VGAP is a novel in-house written alignment programme with a variable gap penalty that also handles consecutive alignments using one sequence as a probe. In addition, secondary structure prediction tools are available.From the 20 protein sequences, available for the muscarinic acetylcholine receptor, 13 different sequences were selected, covering the subtypes m1 to m5. By comparing the sequences, two major groups are revealed that correspond to those found by considering the transducing system coupled to the various receptor subtypes. Different parts of the protein sequences are identified as characterizing the subtype and binding the ligands, respectively.  相似文献   

10.
A Genetic Algorithm (GA) is a stochastic optimization technique based on the mechanisms of biological evolution. These algorithms have been successfully applied in many fields to solve a variety of complex nonlinear problems. While they have been used with some success in chemical problems such as fitting spectroscopic and kinetic data, many have avoided their use due to the unconstrained nature of the fitting process. In engineering, this problem is now being addressed through incorporation of adaptive penalty functions, but their transfer to other fields has been slow. This study updates the Nanakorrn Adaptive Penalty function theory, expanding its validity beyond maximization problems to minimization as well. The expanded theory, using a hybrid genetic algorithm with an adaptive penalty function, was applied to analyze variable temperature variable field magnetic circular dichroism (VTVH MCD) spectroscopic data collected on exchange coupled Fe(II)Fe(II) enzyme active sites. The data obtained are described by a complex nonlinear multimodal solution space with at least 6 to 13 interdependent variables and are costly to search efficiently. The use of the hybrid GA is shown to improve the probability of detecting the global optimum. It also provides large gains in computational and user efficiency. This method allows a full search of a multimodal solution space, greatly improving the quality and confidence in the final solution obtained, and can be applied to other complex systems such as fitting of other spectroscopic or kinetics data.  相似文献   

11.
Single-strand conformation polymorphism (SSCP) analysis was employed to screen for sequence heterogeneity in the second internal transcribed spacer (ITS-2) of ribosomal (r) DNA of Labiostrongylus longispicularis, a parasitic strongylid nematode occuring in some species of kangaroo in different geographical regions of Australia. The results showed that most of the nematodes screened had different SSCP profiles, which were subsequently shown to correspond to polymorphisms and/or an indel in the ITS-2 sequence. These variable sites related mainly to unpaired regions of the predicted secondary structure of the precursor rRNA molecule. SSCP profiles could be used to distinguish L. longispicularis in Macropus robustus robustus (New South Wales) from L. longispicularis in Macropus robustus erubescens and Macropus rufus (South Australia). This difference corresponded to a transversional change in the ITS-2 sequence at alignment position 82. The study demonstrated clearly the effectiveness of SSCP analysis for future large-scale population genetic studies of L. longispicularis in order to test the hypothesis that L. longispicularis from different geographical regions represents multiple sibling species.  相似文献   

12.
Molecular docking is a powerful computational method that has been widely used in many biomolecular studies to predict geometry of a protein-ligand complex. However, while its conformational search algorithms are usually able to generate correct conformation of a ligand in the binding site, the scoring methods often fail to discriminate it among many false variants. We propose to treat this problem by applying more precise ligand-specific scoring filters to re-rank docking solutions. In this way specific features of interactions between protein and different types of compounds can be implicitly taken into account. New scoring functions were constructed including hydrogen bonds, hydrophobic and hydrophilic complementarity terms. These scoring functions also discriminate ligands by the size of the molecule, the total hydrophobicity, and the number of peptide bonds for peptide ligands. Weighting coefficients of the scoring functions were adjusted using a training set of 60 protein-ligand complexes. The proposed method was then tested on the results of docking obtained for an additional 70 complexes. In both cases the success rate was 5-8% better compared to the standard functions implemented in popular docking software.  相似文献   

13.
Molecular docking is a powerful computational method that has been widely used in many biomolecular studies to predict geometry of a protein-ligand complex. However, while its conformational search algorithms are usually able to generate correct conformation of a ligand in the binding site, the scoring methods often fail to discriminate it among many false variants. We propose to treat this problem by applying more precise ligand-specific scoring filters to re-rank docking solutions. In this way specific features of interactions between protein and different types of compounds can be implicitly taken into account. New scoring functions were constructed including hydrogen bonds, hydrophobic and hydrophilic complementarity terms. These scoring functions also discriminate ligands by the size of the molecule, the total hydrophobicity, and the number of peptide bonds for peptide ligands. Weighting coefficients of the scoring functions were adjusted using a training set of 60 protein–ligand complexes. The proposed method was then tested on the results of docking obtained for an additional 70 complexes. In both cases the success rate was 5–8% better compared to the standard functions implemented in popular docking software.  相似文献   

14.
MOTIVATION: Virtual screening of molecular compound libraries is a potentially powerful and inexpensive method for the discovery of novel lead compounds for drug development. The major weakness of virtual screening-the inability to consistently identify true positives (leads)-is likely due to our incomplete understanding of the chemistry involved in ligand binding and the subsequently imprecise scoring algorithms. It has been demonstrated that combining multiple scoring functions (consensus scoring) improves the enrichment of true positives. Previous efforts at consensus scoring have largely focused on empirical results, but they have yet to provide a theoretical analysis that gives insight into real features of combinations and data fusion for virtual screening. RESULTS: We demonstrate that combining multiple scoring functions improves the enrichment of true positives only if (a) each of the individual scoring functions has relatively high performance and (b) the individual scoring functions are distinctive. Notably, these two prediction variables are previously established criteria for the performance of data fusion approaches using either rank or score combinations. This work, thus, establishes a potential theoretical basis for the probable success of data fusion approaches to improve yields in in silico screening experiments. Furthermore, it is similarly established that the second criterion (b) can, in at least some cases, be functionally defined as the area between the rank versus score plots generated by the two (or more) algorithms. Because rank-score plots are independent of the performance of the individual scoring function, this establishes a second theoretically defined approach to determining the likely success of combining data from different predictive algorithms. This approach is, thus, useful in practical settings in the virtual screening process when the performance of at least two individual scoring functions (such as in criterion a) can be estimated as having a high likelihood of having high performance, even if no training sets are available. We provide initial validation of this theoretical approach using data from five scoring systems with two evolutionary docking algorithms on four targets, thymidine kinase, human dihydrofolate reductase, and estrogen receptors of antagonists and agonists. Our procedure is computationally efficient, able to adapt to different situations, and scalable to a large number of compounds as well as to a greater number of combinations. Results of the experiment show a fairly significant improvement (vs single algorithms) in several measures of scoring quality, specifically "goodness-of-hit" scores, false positive rates, and "enrichment". This approach (available online at http://gemdock.life. nctu.edu.tw/dock/download.php) has practical utility for cases where the basic tools are known or believed to be generally applicable, but where specific training sets are absent.  相似文献   

15.
All currently leading protein secondary structure prediction methods use a multiple protein sequence alignment to predict the secondary structure of the top sequence. In most of these methods, prior to prediction, alignment positions showing a gap in the top sequence are deleted, consequently leading to shrinking of the alignment and loss of position-specific information. In this paper we investigate the effect of this removal of information on secondary structure prediction accuracy. To this end, we have designed SymSSP, an algorithm that post-processes the predicted secondary structure of all sequences in a multiple sequence alignment by (i) making use of the alignment's evolutionary information and (ii) re-introducing most of the information that would otherwise be lost. The post-processed information is then given to a new dynamic programming routine that produces an optimally segmented consensus secondary structure for each of the multiple alignment sequences. We have tested our method on the state-of-the-art secondary structure prediction methods PHD, PROFsec, SSPro2 and JNET using the HOMSTRAD database of reference alignments. Our consensus-deriving dynamic programming strategy is consistently better at improving the segmentation quality of the predictions compared to the commonly used majority voting technique. In addition, we have applied several weighting schemes from the literature to our novel consensus-deriving dynamic programming routine. Finally, we have investigated the level of noise introduced by prediction errors into the consensus and show that predictions of edges of helices and strands are half the time wrong for all the four tested prediction methods.  相似文献   

16.
17.
The peaks of magnetic resonance (MR) spectra can be shifted due to variations in physiological and experimental conditions, and correcting for misaligned peaks is an important part of data processing prior to multivariate analysis. In this paper, five warping algorithms (icoshift, COW, fastpa, VPdtw and PTW) are compared for their feasibility in aligning spectral peaks in three sets of high resolution magic angle spinning (HR-MAS) MR spectra with different degrees of misalignments, and their merits are discussed. In addition, extraction of information that might be present in the shifts is examined, both for simulated data and the real MR spectra. The generic evaluation methodology employs a number of frequently used quality criteria for evaluation of the alignments, together with PLS-DA to assess the influence of alignment on the classification outcome.Peak alignment greatly improved the internal similarity of the data sets. Especially icoshift and COW seem suitable for aligning HR-MAS MR spectra, possibly because they perform alignment segment-wise. The choice of reference spectrum can influence the alignment result, and it is advisable to test several references. Information from the peak shifts was extracted, and in one case cancer samples were successfully discriminated from normal tissue based on shift information only. Based on these findings, general recommendations for alignment of HR-MAS MRS data are presented. Where possible, observations are generalized to other data types (e.g. chromatographic data).  相似文献   

18.
A first step toward predicting the structure of a protein is to determine its secondary structure. The secondary structure information is generally used as starting point to solve protein crystal structures. In the present study, a machine learning approach based on a complete set of two-class scoring functions was used. Such functions discriminate between two specific structural classes or between a single specific class and the rest. The approach uses a hierarchical scheme of scoring functions and a neural network. The parameters are determined by optimizing the recall of learning data. Quality control is performed by predicting separate independent test data. A first set of scoring functions is trained to correlate the secondary structures of residues with profiles of sequence windows of width 15, centered at these residues. The sequence profiles are obtained by multiple sequence alignment with PSI-BLAST. A second set of scoring functions is trained to correlate the secondary structures of the center residues with the secondary structures of all other residues in the sequence windows used in the first step. Finally, a neural network is trained using the results from the second set of scoring functions as input to make a decision on the secondary structure class of the residue in the center of the sequence window. Here, we consider the three-class problem of helix, strand, and other secondary structures. The corresponding prediction scheme "SPARROW" was trained with the ASTRAL40 database, which contains protein domain structures with less than 40% sequence identity. The secondary structures were determined with DSSP. In a loose assignment, the helix class contains all DSSP helix types (α, 3-10, π), the strand class contains β-strand and β-bridge, and the third class contains the other structures. In a tight assignment, the helix and strand classes contain only α-helix and β-strand classes, respectively. A 10-fold cross validation showed less than 0.8% deviation in the fraction of correct structure assignments between true prediction and recall of data used for training. Using sequences of 140,000 residues as a test data set, 80.46% ± 0.35% of secondary structures are predicted correctly in the loose assignment, a prediction performance, which is very close to the best results in the field. Most applications are done with the loose assignment. However, the tight assignment yields 2.25% better prediction performance. With each individual prediction, we also provide a confidence measure providing the probability that the prediction is correct. The SPARROW software can be used and downloaded on the Web page http://agknapp.chemie.fu-berlin.de/sparrow/ .  相似文献   

19.
Sequence–structure alignment for protein sequences is an important task for the template-based modeling of 3D structures of proteins. Building a reliable sequence–structure alignment is a challenging problem, especially for remote homologue target proteins. We built a method of sequence–structure alignment called CRFalign, which improves upon a base alignment model based on HMM-HMM comparison by employing pairwise conditional random fields in combination with nonlinear scoring functions of structural and sequence features. Nonlinear scoring part is implemented by a set of gradient boosted regression trees. In addition to sequence profile features, various position-dependent structural features are employed including secondary structures and solvent accessibilities. Training is performed on reference alignments at superfamily levels or twilight zone chosen from the SABmark benchmark set. We found that CRFalign method produces relative improvement in terms of average alignment accuracies for validation sets of SABmark benchmark. We also tested CRFalign on 51 sequence–structure pairs involving 15 FM target domains of CASP14, where we could see that CRFalign leads to an improvement in average modeling accuracies in these hard targets (TM-CRFalign 42.94%) compared with that of HHalign (TM-HHalign 39.05%) and also that of MRFalign (TM-MRFalign 36.93%). CRFalign was incorporated to our template search framework called CRFpred and was tested for a random target set of 300 target proteins consisting of Easy, Medium and Hard sets which showed a reasonable template search performance.  相似文献   

20.
A combination of singular value decomposition, entropy minimization, and simulated annealing was applied to a synthetic 7-species spectroscopic data set with added white noise. The pure spectra were highly overlapping. Global minima for selected objective functions were obtained for the transformation of the first seven right singular vectors. Simple Shannon type entropy functions were used in the objective functions and realistic physical constraints were imposed in the penalties. It was found that good first approximations for the pure component spectra could be obtained without the use of any a priori information. The present method out performed the two widely used routines, namely Simplisma and OPA-ALS, as well as IPCA. These results indicate that a combination of SVD, entropy minimization, and simulated annealing is a potentially powerful tool for spectral reconstructions from large real experimental systems.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号