首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
The protein structure prediction problem is a classical NP hard problem in bioinformatics. The lack of an effective global optimization method is the key obstacle in solving this problem. As one of the global optimization algorithms, tabu search (TS) algorithm has been successfully applied in many optimization problems. We define the new neighborhood conformation, tabu object and acceptance criteria of current conformation based on the original TS algorithm and put forward an improved TS algorithm. By integrating the heuristic initialization mechanism, the heuristic conformation updating mechanism, and the gradient method into the improved TS algorithm, a heuristic-based tabu search (HTS) algorithm is presented for predicting the two-dimensional (2D) protein folding structure in AB off-lattice model which consists of hydrophobic (A) and hydrophilic (B) monomers. The tabu search minimization leads to the basins of local minima, near which a local search mechanism is then proposed to further search for lower-energy conformations. To test the performance of the proposed algorithm, experiments are performed on four Fibonacci sequences and two real protein sequences. The experimental results show that the proposed algorithm has found the lowest-energy conformations so far for three shorter Fibonacci sequences and renewed the results for the longest one, as well as two real protein sequences, demonstrating that the HTS algorithm is quite promising in finding the ground states for AB off-lattice model proteins.  相似文献   

2.
Index-based search algorithms are an important part of a genomic search, and how to construct indices is the key to an index-based search algorithm to compute similarities between two DNA sequences. In this paper, we propose an efficient query processing method that uses special transformations to construct an index. It uses small storage and it rapidly finds the similarity between two sequences in a DNA sequence database. At first, a sequence is partitioned into equal length windows. We select the likely subsequences by computing Hamming distance to query sequence. The algorithm then transforms the subsequences in each window into a multidimensional vector space by indexing the frequencies of the characters, including the positional information of the characters in the subsequences. The result of our experiments shows that the algorithm has faster run time than other heuristic algorithms based on index structure. Also, the algorithm is as accurate as those heuristic algorithms.  相似文献   

3.
The shortest common supersequence (SCS) problem is a classical NP-hard problem, which is normally solved by heuristic algorithms. One important heuristic that is inspired by the process of chemical reactions in nature is the chemical reaction optimization (CRO) and its algorithm known as CRO_SCS. In this paper we propose a novel CRO algorithm, dubbed IMCRO, to solve the SCS problem efficiently. Two new operators are introduced in two of the four reactions of the CRO: a new circular shift operator is added to the decomposition reaction, and a new two-step crossover operator is included in the inter-molecular ineffective collision reaction. Experimental results show that IMCRO achieves better performance on random and real sequences than well-known heuristic algorithms such as the ant colony optimization, deposition and reduction, enhanced beam search, and CRO_SCS. Additionally, it outperforms its baseline CRO_SCS for DNA instances, averaging a SCS length reduction of 1.02, with a maximum length reduction of up to 2.1.  相似文献   

4.
Computational protein design depends on an energy function and an algorithm to search the sequence/conformation space. We compare three stochastic search algorithms: a heuristic, Monte Carlo (MC), and a Replica Exchange Monte Carlo method (REMC). The heuristic performs a steepest‐descent minimization starting from thousands of random starting points. The methods are applied to nine test proteins from three structural families, with a fixed backbone structure, a molecular mechanics energy function, and with 1, 5, 10, 20, 30, or all amino acids allowed to mutate. Results are compared to an exact, “Cost Function Network” method that identifies the global minimum energy conformation (GMEC) in favorable cases. The designed sequences accurately reproduce experimental sequences in the hydrophobic core. The heuristic and REMC agree closely and reproduce the GMEC when it is known, with a few exceptions. Plain MC performs well for most cases, occasionally departing from the GMEC by 3–4 kcal/mol. With REMC, the diversity of the sequences sampled agrees with exact enumeration where the latter is possible: up to 2 kcal/mol above the GMEC. Beyond, room temperature replicas sample sequences up to 10 kcal/mol above the GMEC, providing thermal averages and a solution to the inverse protein folding problem. © 2016 Wiley Periodicals, Inc.  相似文献   

5.
De novo and inverse folding predictions of protein structure and dynamics   总被引:6,自引:0,他引:6  
Summary In the last two years, the use of simplified models has facilitated major progress in the globular protein folding problem, viz., the prediction of the three-dimensional (3D) structure of a globular protein from its amino acid sequence. A number of groups have addressed the inverse folding problem where one examines the compatibility of a given sequence with a given (and already determined) structure. A comparison of extant inverse protein-folding algorithms is presented, and methodologies for identifying sequences likely to adopt identical folding topologies, even when they lack sequence homology, are described. Extension to produce structural templates or fingerprints from idealized structures is discussed, and for eight-membered β-barrel proteins, it is shown that idealized fingerprints constructed from simple topology diagrams can correctly identify sequences having the appropriate topology. Furthermore, this inverse folding algorithm is generalized to predict elements of supersecondary structure including β-hairpins, helical hairpins and α/β/α fragments. Then, we describe a very high coordination number lattice model that can predict the 3D structure of a number of globular proteins de novo; i.e. using just the amino acid sequence. Applications to sequences designed by DeGrado and co-workers [Biophys. J., 61 (1992) A265] predict folding intermediates, native states and relative stabilities in accord with experiment. The methodology has also been applied to the four-helix bundle designed by Richardson and co-workers [Science, 249 (1990) 884] and a redesigned monomeric version of a naturally occurring four-helix dimer, rop. Based on comparison to the rop dimer, the simulations predict conformations with rms values of 3–4 ? from native. Furthermore, the de novo algorithms can asses the stability of the folds predicted from the inverse algorithm, while the inverse folding algorithms can assess the quality of the de novo models. Thus, the synergism of the de novo and inverse folding algorthhm approaches provides a set of complementary tools that will facilitate further progress on the protein-folding problem.  相似文献   

6.
The identification of protein complexes in protein–protein interaction (PPI) networks has greatly advanced our understanding of biological organisms. Existing computational methods to detect protein complexes are usually based on specific network topological properties of PPI networks. However, due to the inherent complexity of the network structures, the identification of protein complexes may not be fully addressed by using single network topological property. In this study, we propose a novel MultiObjective Evolutionary Programming Genetic Algorithm (MOEPGA) which integrates multiple network topological features to detect biologically meaningful protein complexes. Our approach first systematically analyzes the multiobjective problem in terms of identifying protein complexes from PPI networks, and then constructs the objective function of the iterative algorithm based on three common topological properties of protein complexes from the benchmark dataset, finally we describe our algorithm, which mainly consists of three steps, population initialization, subgraph mutation and subgraph selection operation. To show the utility of our method, we compared MOEPGA with several state-of-the-art algorithms on two yeast PPI datasets. The experiment results demonstrate that the proposed method can not only find more protein complexes but also achieve higher accuracy in terms of fscore. Moreover, our approach can cover a certain number of proteins in the input PPI network in terms of the normalized clustering score. Taken together, our method can serve as a powerful framework to detect protein complexes in yeast PPI networks, thereby facilitating the identification of the underlying biological functions.  相似文献   

7.
The classical sequencing by hybridization takes into account a binary information about sequence composition. A given element from an oligonucleotide library is or is not a part of the target sequence. However, the DNA chip technology has been developed and it enables to receive a partial information about multiplicity of each oligonucleotide the analyzed sequence consist of. Currently, it is not possible to assess the exact data of such type but even partial information should be very useful.Two realistic multiplicity information models are taken into consideration in this paper. The first one, called “one and many” assumes that it is possible to obtain information if a given oligonucleotide occurs in a reconstructed sequence once or more than once. According to the second model, called “one, two and many”, one is able to receive from biochemical experiment information if a given oligonucleotide is present in an analyzed sequence once, twice or at least three times.An ant colony optimization algorithm has been implemented to verify the above models and to compare with existing algorithms for sequencing by hybridization which utilize the additional information. The proposed algorithm solves the problem with any kind of hybridization errors. Computational experiment results confirm that using even the partial information about multiplicity leads to increased quality of reconstructed sequences. Moreover, they also show that the more precise model enables to obtain better solutions and the ant colony optimization algorithm outperforms the existing ones.Test data sets and the proposed ant colony optimization algorithm are available on: http://bioserver.cs.put.poznan.pl/download/ACO4mSBH.zip.  相似文献   

8.
Determining a one-to-one atom correspondence between two chemical compounds is important to measure molecular similarities and to find compounds with similar biological activities. This calculation can be formalized as the maximum common substructure (MCS) problem, which is well-studied and has been shown to be NP-complete. Although many rigorous and heuristic algorithms have been developed, none of these algorithms is sufficiently fast and accurate. We developed a new program, called "kcombu" using a build-up algorithm, which is a type of the greedy heuristic algorithms. The program can search connected and disconnected MCSs as well as topologically constrained disconnected MCS (TD-MCS), which is introduced in this study. To evaluate the performance of our program, we prepared two correct standards: the exact correspondences generated by the maximum clique algorithms and the 3D correspondences obtained from superimposed 3D structure of the molecules in a complex 3D structure with the same protein. For the five sets of molecules taken from the protein structure database, the agreement value between the build-up and the exact correspondences for the connected MCS is sufficiently high, but the computation time of the build-up algorithm is much smaller than that of the exact algorithm. The comparison between the build-up and the 3D correspondences shows that the TD-MCS has the best agreement value among the other types of MCS. Additionally, we observed a strong correlation between the molecular similarity and the agreement with the correct and 3D correspondences; more similar molecule pairs are more correctly matched. Molecular pairs with more than 40% Tanimoto similarities can be correctly matched for more than half of the atoms with the 3D correspondences.  相似文献   

9.
Naturally inspired evolutionary algorithms prove effectiveness when used for solving feature selection and classification problems. Artificial Bee Colony (ABC) is a relatively new swarm intelligence method. In this paper, we propose a new hybrid gene selection method, namely Genetic Bee Colony (GBC) algorithm. The proposed algorithm combines the used of a Genetic Algorithm (GA) along with Artificial Bee Colony (ABC) algorithm. The goal is to integrate the advantages of both algorithms. The proposed algorithm is applied to a microarray gene expression profile in order to select the most predictive and informative genes for cancer classification. In order to test the accuracy performance of the proposed algorithm, extensive experiments were conducted. Three binary microarray datasets are use, which include: colon, leukemia, and lung. In addition, another three multi-class microarray datasets are used, which are: SRBCT, lymphoma, and leukemia. Results of the GBC algorithm are compared with our recently proposed technique: mRMR when combined with the Artificial Bee Colony algorithm (mRMR-ABC). We also compared the combination of mRMR with GA (mRMR-GA) and Particle Swarm Optimization (mRMR-PSO) algorithms. In addition, we compared the GBC algorithm with other related algorithms that have been recently published in the literature, using all benchmark datasets. The GBC algorithm shows superior performance as it achieved the highest classification accuracy along with the lowest average number of selected genes. This proves that the GBC algorithm is a promising approach for solving the gene selection problem in both binary and multi-class cancer classification.  相似文献   

10.
In this paper, a problem of isothermic DNA sequencing by hybridization (SBH) is considered. In isothermic SBH a new type of oligonucleotide libraries is used. The library consists of oligonucleotides of different lengths depending on an oligonucleotide content. It is assumed that every oligonucleotide in such a library has an equal melting temperature. Each nucleotide adds its increment to the oligonucleotide temperature and it is assumed that A and T add 2 degrees C and C and G add 4 degrees C. The hybridization experiment using isothermic libraries should provide data with a lower number of errors due to an expected similarity of melting temperatures. From the computational point of view the problem of isothermic DNA sequencing with errors is hard, similarly like its classical counterpart. Hence, there is a need for developing heuristic algorithms that construct good suboptimal solutions. The aim of the paper is to propose a heuristic algorithm based on tabu search approach. The algorithm solves the problem with both positive and negative errors. Results of an extensive computational experiment are presented, which prove the high quality of the proposed method.  相似文献   

11.
The protein folding problem, i.e., the prediction of the tertiary structures of protein molecules from their amino acid sequences is one of the most important problems in computational biology and biochemistry. However, the extremely difficult optimization problem arising from energy function is a key challenge in protein folding simulation. The energy landscape paving (ELP) method has already been applied very successfully to off-lattice protein models and other optimization problems with complex energy landscape in continuous space. By improving the ELP method, and subsequently incorporating the neighborhood strategy with the pull-move set into the improved ELP method, a heuristic ELP algorithm is proposed to find low-energy conformations of 3D HP lattice model proteins in the discrete space. The algorithm is tested on three sets of 3D HP benchmark instances consisting 31 sequences. For eleven sequences with 27 monomers, the proposed method explores the conformation surfaces more efficiently than other methods, and finds new lower energies in several cases. For ten 48-monomer sequences, we find the lowest energies so far. With the achieved results, the algorithm converges rapidly and efficiently. For all ten 64-monomer sequences, the algorithm finds lower energies within comparable computation times than previous methods. Numeric results show that the heuristic ELP method is a competitive tool for protein folding simulation in 3D lattice model. To the best of our knowledge, this is the first application of ELP to the 3D discrete space.  相似文献   

12.
13.
The 3-SAT problem is an NP-complete problem, and many algorithms based on DNA computing have been proposed for solving it since Adleman's pioneering work. This paper presents a new algorithm based on the literal string strategy proposed by Sakamoto et al. Simulation results show that the maximal number of literal strings produced during the computing process is greatly reduced. Moreover, the length of the literal strings is also reduced from m to n at most.  相似文献   

14.
A new biomimetic algorithm, Chemical Ant Colony Algorithm, has been developed, which has the characteristics of intelligent search, global optimization, robustness, distributed computation and easy combination with other heuristic. The proposed method has been successfully applied to the spectroscopy analysis of the Zn2+, Cd2+, Pb(2+)-porphyin tribasic color system with supramolecular properties; the errors are within +/-8.0%.  相似文献   

15.
Protein structure prediction is a fundamental issue in the field of computational molecular biology. In this paper, the AB off-lattice model is adopted to transform the original protein structure prediction scheme into a numerical optimization problem. We present a balance-evolution artificial bee colony (BE-ABC) algorithm to address the problem, with the aim of finding the structure for a given protein sequence with the minimal free-energy value. This is achieved through the use of convergence information during the optimization process to adaptively manipulate the search intensity. Besides that, an overall degradation procedure is introduced as part of the BE-ABC algorithm to prevent premature convergence. Comprehensive simulation experiments based on the well-known artificial Fibonacci sequence set and several real sequences from the database of Protein Data Bank have been carried out to compare the performance of BE-ABC against other algorithms. Our numerical results show that the BE-ABC algorithm is able to outperform many state-of-the-art approaches and can be effectively employed for protein structure optimization.  相似文献   

16.
Several efficient correspondence graph-based algorithms for determining the maximum common substructure (MCS) of a pair of molecules have been published in the literature. The extension of the problem to three or more molecules is however nontrivial; heuristics used to increase the efficiency in the two-molecule case are either inapplicable to the many-molecule case or do not provide significant speedups. Our specific algorithmic contribution is two-fold. First, we show how the correspondence graph approach for the two-molecule case can be generalized to obtain an algorithm that is guaranteed to find the optimum connected MCS of multiple molecules, and that runs fast on most families of molecules using a new divide-and-conquer strategy that has hitherto not been reported in this context. Second, we provide a characterization of those compound families for which the algorithm might run slowly, along with a heuristic for speeding up computations on these families. We also extend the above algorithm to a heuristic algorithm to find the disconnected MCS of multiple molecules and to an algorithm for clustering molecules into groups, with each group sharing a substantial MCS. Our methods are flexible in that they provide exquisite control on various matching criteria used to define a common substructure.  相似文献   

17.
Liangliang Liu 《Tetrahedron》2008,64(25):5885-5890
Expansion of DNA repeat sequences is associated with many human genetic diseases. Bulged DNA structures have been implicated as intermediates in DNA slippage within the DNA repeat regions. Two new binaphthol aminosugars were first synthesized as DNA bulge binders to study the triplet repeat expansion due to the wedge-shaped structure of 1,1′-bi-2-naphthol. Both compounds were structurally characterized by 1- and 2-D NMR. They showed remarkable fluorescence enhancement when binding with bulge DNA and they exhibited stimulation for ATT·AAT trinucleotide repeat DNA sequence slippage synthesis.  相似文献   

18.
Thed0ckingpr0cedurebetweeninhibitorsandproteinisaverys0phisticatedoptimizati0npr0blem;itisverydifficuIttocarry0utminimizati0nusinggradientmeth0dssuchasthesteepestdescentmeth0d,Gauss-Newtonmethod,whichareveryeasytofallint0theI0calpotentialwellsandverydifficultt0escapefr0mthem.S0s0meheuristicmeth0dshavebeenintr0ducedint0thestudies0fmolecuIarrec0gniti0n.Howt0chooseadequate0ptin1izationmeth0dinthed0ckingprocedureiscriticaIt0thecalcuIati0nresults.Inthispaper,wedescribetheimplementati0nandc0mpari…  相似文献   

19.
A common requirement in conformational analysis is the identification of a molecule's lowest energy conformations. The application of the A* algorithm to this problem is examined. The algorithm uses heuristic information about the problem domain to direct the search and has been implemented in a system for performing automated conformational analysis. The method is detailed and sample results presented. Some limitations of the approach are identified.  相似文献   

20.
An automated NMR chemical shift assignment algorithm was developed using multi-objective optimization techniques. The problem is modeled as a combinatorial optimization problem and its objective parameters are defined separately in different score functions. Some of the heuristic approaches of evolutionary optimization are employed in this problem model. Both, a conventional genetic algorithm and multi-objective methods, i.e., the non-dominated sorting genetic algorithms II and III (NSGA2 and NSGA3), are applied to the problem. The multi-objective approaches consider each objective parameter separately, whereas the genetic algorithm followed a conventional way, where all objectives are combined in one score function. Several improvement steps and repetitions on these algorithms are performed and their combinations are also created as a hyper-heuristic approach to the problem. Additionally, a hill-climbing algorithm is also applied after the evolutionary algorithm steps. The algorithms are tested on several different datasets with a set of 11 commonly used spectra. The test results showed that our algorithm could assign both sidechain and backbone atoms fully automatically without any manual interactions. Our approaches could provide around a 65% success rate and could assign some of the atoms that could not be assigned by other methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号