首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
For biological applications, sequence alignment is an important strategy to analyze DNA and protein sequences. Multiple sequence alignment is an essential methodology to study biological data, such as homology modeling, phylogenetic reconstruction and etc. However, multiple sequence alignment is a NP-hard problem. In the past decades, progressive approach has been proposed to successfully align multiple sequences by adopting iterative pairwise alignments. Due to rapid growth of the next generation sequencing technologies, a large number of sequences can be produced in a short period of time. When the problem instance is large, progressive alignment will be time consuming. Parallel computing is a suitable solution for such applications, and GPU is one of the important architectures for contemporary parallel computing researches. Therefore, we proposed a GPU version of ClustalW v2.0.11, called CUDA ClustalW v1.0, in this work. From the experiment results, it can be seen that the CUDA ClustalW v1.0 can achieve more than 33× speedups for overall execution time by comparing to ClustalW v2.0.11.  相似文献   

2.
Multiple sequence alignment is a basic tool in computational genomics. The art of multiple sequence alignment is about placing gaps. This paper presents a heuristic algorithm that improves multiple protein sequences alignment iteratively. A consistency-based objective function is used to evaluate the candidate moves. During the iterative optimization, well-aligned regions can be detected and kept intact. Columns of gaps will be inserted to assist the algorithm to escape from local optimal alignments. The algorithm has been evaluated using the BAliBASE benchmark alignment database. Results show that the performance of the algorithm does not depend on initial or seed alignments much. Given a perfect consistency library, the algorithm is able to produce alignments that are close to the global optimum. We demonstrate that the algorithm is able to refine alignments produced by other software, including ClustalW, SAGA and T-COFFEE. The program is available upon request.  相似文献   

3.
This study focuses on improving the multi-objective memetic algorithm for protein–protein interaction (PPI) network alignment, Optimizing Network Aligner – OptNetAlign, via integration with other existing network alignment methods such as SPINAL, NETAL and HubAlign. The output of this algorithm is an elite set of aligned networks all of which are optimal with respect to multiple user-defined criteria. However, OptNetAlign is an unsupervised genetic algorithm that initiates its search with completely random solutions and it requires substantial running times to generate an elite set of solutions that have high scores with respect to the given criteria. In order to improve running time, the search space of the algorithm can be narrowed down by focusing on remarkably qualified alignments and trying to optimize the most desired criteria on a more limited set of solutions. The method presented in this study improves OptNetAlign in a supervised fashion by utilizing the alignment results of different network alignment algorithms with varying parameters that depend upon user preferences. Therefore, the user can prioritize certain objectives upon others and achieve better running time performance while optimizing the secondary objectives.  相似文献   

4.
Nuclear magnetic resonance (NMR) analysis of complex samples, such as biofluid samples is accompanied by variations in peak position and peak shape not directly related to the sample. This is due to variations in the background matrix of the sample and to instrumental instabilities. These variations complicate and limit the interpretation and analysis of NMR data by multivariate methods. Alignment of the NMR signals may circumvent these limitations and is an important preprocessing step prior to multivariate analysis. Previous aligning methods reduce the spectral resolution, are very computer-intensive for this kind of data (65k data points in one spectrum), or rely on peak detection. The method presented in this work requires neither data reduction nor preprocessing, e.g. peak detection. The alignment is achieved by taking each segment of the spectrum individually, shifting it sidewise, and linearly interpolating it to stretch or shrink until the best correlation with a corresponding reference spectrum segment is obtained. The segments are automatically picked out with a routine, which avoids cutting in a peak, and the optimization process is accomplished by means of a genetic algorithm (GA). The peak alignment routine is applied to NMR metabonomic data.1  相似文献   

5.
The Multiple Sequence Alignment (MSA) is a key task in bioinformatics, because it is used in different important biological analysis, such as function and structure prediction of unknown proteins. There are several approaches to perform MSA and the use of metaheuristics stands out because of the search ability of these methods, which generally leads to good results in a reasonable amount of time. This paper presents a Systematic Literature Review (SLR) on metaheuristics for MSA, compiling relevant works published between 2014 and 2019. The results of our SLR show the constant interest in this subject, due to the several recent publications that use different metaheuristics to obtain more accurate alignments. Moreover, the final results of our SLR show a multi-objective and hybrid approaches trends, which generally leads these methods to achieve even better results. Thus, we show in this work how the use of metaheuristics to perform MSA still remains an important and promising open research field.  相似文献   

6.
We propose an algorithm of global multiple sequence alignment that is based on a measure of what we call information discrepancy. The algorithm follows a progressive alignment iteration strategy that makes use of what we call a function of degree of disagreement (FDOD). MSAID begins with distance calculation of pairwise sequences, based on FDOD as a numerical scoring measure. In the next step, the resulting distance matrix is used to construct a guide tree via the neighbor-joining method. The tree is then used to produce a multiple alignment. Current alignment is next used to produce a new matrix and a new tree (with FDOD scoring measure again). This iterative process continues until convergence criteria (or a stopping rule) are satisfied. MSAID was tested and compared with other prior methods by using reference alignments from BAliBASE 2.01. For the alignments with no large N/C-terminal extensions or internal insertions MSAID received the top overall average in the tests. Moreover, the results of testing indicate that MSAID performs as well as other alignment methods with an occasional tendency to perform better than these prior techniques. We, therefore, believe that MSAID is a solid and reliable method of choice, which is often (if not always) superior to other global alignment techniques.  相似文献   

7.
We present a ligand-based virtual screening technique (PhAST) for rapid hit and lead structure searching in large compound databases. Molecules are represented as strings encoding the distribution of pharmacophoric features on the molecular graph. In contrast to other text-based methods using SMILES strings, we introduce a new form of text representation that describes the pharmacophore of molecules. This string representation opens the opportunity for revealing functional similarity between molecules by sequence alignment techniques in analogy to homology searching in protein or nucleic acid sequence databases. We favorably compared PhAST with other current ligand-based virtual screening methods in a retrospective analysis using the BEDROC metric. In a prospective application, PhAST identified two novel inhibitors of 5-lipoxygenase product formation with minimal experimental effort. This outcome demonstrates the applicability of PhAST to drug discovery projects and provides an innovative concept of sequence-based compound screening with substantial scaffold hopping potential.  相似文献   

8.
In this article, we describe a representation for the processes of multiple sequences alignment (MSA) and used it to solve the problem of MSA. By this representation, we took every possible aligning result into account by defining the representation of gap insertion, the value of heuristic information in every optional path and scoring rule. On the basis of the proposed multidimensional graph, we used the ant colony algorithm to find the better path that denotes a better aligning result. In our article, we proposed the instance of three‐dimensional graph and four‐dimensional graph and advanced a special ichnographic representation to analyze MSA. It is yet only an experimental software, and we gave an example for finding the best aligning result by three‐dimensional graph and ant colony algorithm. Experimental results show that our method can improve the solution quality on MSA benchmarks. © 2009 Wiley Periodicals, Inc. J Comput Chem 2009  相似文献   

9.
In this paper, we present a simple and efficient whole genome alignment method using maximal exact match (MEM). The major problem with the use of MEM anchor is that the number of hits in non-homologous regions increases exponentially when shorter MEM anchors are used to detect more homologous regions. To deal with this problem, we have developed a fast and accurate anchor filtering scheme based on simple match extension with minimum percent identity and extension length criteria. Due to its simplicity and accuracy, all MEM anchors in a pair of genomes can be exhaustively tested and filtered. In addition, by incorporating the translation technique, the alignment quality and speed of our genome alignment algorithm have been further improved. As a result, our genome alignment algorithm, GAME (Genome Alignment by Match Extension), performs competitively over existing algorithms and can align large whole genomes, e.g., A. thaliana, without the requirement of typical large memory and parallel processors. This is shown using an experiment which compares the performance of BLAST, BLASTZ, PatternHunter, MUMmer and our algorithm in aligning all 45 pairs of 10 microbial genomes. The scalability of our algorithm is shown in another experiment where all pairs of five chromosomes in A. thaliana were compared.  相似文献   

10.
The preprocessing of chromatograms, such as the alignment of retention time shifts, is often a crucial step in the proper data analysis chain. Here, an efficient approach to align shifted chromatographic signals, longest distance shifting, is presented and highlighted. The performance of this novel strategy was demonstrated by using both simulated chromatograms that covered the different kinds of retention time shifts and the real experimental chromatograms of Pudilan Xiaoyan Tablets obtained by high‐performance liquid chromatography with photodiode array detection. The averaged correlation coefficient for experimental chromatograms were in the range of 0.9517–0.9840 and the peak factor was 0.9989. As a comparison, all the chromatograms have also been aligned using correlation optimized warping and Interval Correlation Optimized Shifting algorithms. The obtained results indicate that the longest distance shifting algorithm is simpler, faster and more effective, and will be potentially suitable for the alignment of other types of signals.  相似文献   

11.
Highly efficient and rapid proteolytic digestion of proteins into peptides is a crucial step in shotgun-based proteome-analysis strategy.Tandem digestion by two or more proteases is demonstrated to be helpful for increasing digestion efficiency and decreasing missed cleavages,which results in more peptides that are compatible with mass-spectrometry analysis.Compared to conventional solution digestion,immobilized protease digestion has the obvious advantages of short digestion time,no self-proteolysis,and reusability.We proposed a multiple-immobilized proteases-digestion strategy that combines the advantages of the two digestion strategies mentioned above.Graphene-oxide(GO)-based immobilized trypsin and endoproteinase Glu-C were prepared by covalently attaching them onto the GO surface.The prepared GO-trypsin and GO-Glu-C were successfully applied in standard protein digestion and multiple immobilized proteases digestion of total proteins of Thermoanaerobacter tengcongensis.Compared to 12-hour solution digestion using trypsin or Glu-C,14%and 7%improvement were obtained,respectively,in the sequence coverage of BSA by one-minute digestion using GO-trypsin and GO-Glu-C.Multiple immobilized-proteases digestion of the total proteins of Thermoanaerobacter tengcongensis showed 24.3%and 48.7%enhancement in the numbers of identified proteins than was obtained using GO-trypsin or GO-Glu-C alone.The ultra-fast and highly efficient digestion can be contributed to the high loading capacity of protease on GO,which leads to fewer missed cleavages and more complete digestion.As a result,improved protein identification and sequence coverage can be expected.  相似文献   

12.
We introduce the online server for PRALINE (http://ibium.cs.vu.nl/programs/pralinewww/), an iterative versatile progressive multiple sequence alignment (MSA) tool. PRALINE provides various MSA optimisation strategies including weighted global and local profile pre-processing, secondary structure-guided alignment and a reliability measure for aligned individual residue positions. The latter can also be used to optimise the alignment when the profile pre-processing strategies are iterated. In addition, we have modelled the server output to enable comprehensive visualisation of the generated alignment and easy figure generation for publications. The alignment is represented in five default colour schemes based on: residue type, position conservation, position reliability, residue hydrophobicity and secondary structure; depending on the options set. We have also implemented a custom colour scheme that allows the user to select which colour will represent one or more amino acids in the alignment. The grouping of sequences, on which the alignment is based, can also be visualised as a dendrogram. The PRALINE algorithm is designed to work more as a toolkit for MSA rather than a one step process.  相似文献   

13.
Protein phosphorylation is a post-translational modification performed by a group of enzymes known as the protein kinases or phosphotransferases (Enzyme Commission classification 2.7). It is essential to the correct functioning of both proteins and cells, being involved with enzyme control, cell signalling and apoptosis. The major problem when attempting prediction of these sites is the broad substrate specificity of the enzymes. This study employs back-propagation neural networks (BPNNs), the decision tree algorithm C4.5 and the reduced bio-basis function neural network (rBBFNN) to predict phosphorylation sites. The aim is to compare prediction efficiency of the three algorithms for this problem, and examine knowledge extraction capability. All three algorithms are effective for phosphorylation site prediction. Results indicate that rBBFNN is the fastest and most sensitive of the algorithms. BPNN has the highest area under the ROC curve and is therefore the most robust, and C4.5 has the highest prediction accuracy. C4.5 also reveals the amino acid 2 residues upstream from the phosporylation site is important for serine/threonine phosphorylation, whilst the amino acid 3 residues upstream is important for tyrosine phosphorylation.  相似文献   

14.
The design and application of a scaffolding ligand that promotes branch and diastereoselective hydroformylation of terminal olefins as well as the regio- and diastereoselective hydroformylation of disubstituted olefins is reported. It is shown that the ligand covalently and reversibly bonds to the substrate, allowing for directed hydroformylation. As the substrate ligand interaction is dynamic, hydroformylations are catalytic in ligand and do not require any additional synthetic steps to add or remove the directing group. Using a catalytic quantity of a scaffolding ligand (20-25 mol %), excellent regioselectivity for disubstituted olefins (up to 98:2) and high branch selectivity (up to 88:12) for terminal olefins were obtained.  相似文献   

15.
GCALIGNER 1.0 is a computer program designed to perform a preliminary data comparison matrix of chemical data obtained by GC without MS information. The alignment algorithm is based on the comparison between the retention times of each detected compound in a sample. In this paper, we test the GCALIGNER efficiency on three datasets of the chemical secretions of bumble bees. The algorithm performs the alignment with a low error rate (<3%). GCALIGNER 1.0 is a useful, simple and free program based on an algorithm that enables the alignment of table‐type data from GC.  相似文献   

16.
Summary A straightforward and simple strategy for the selection of the two systems in two-dimensional, high-performance thin-layer chromatography (HPTLC) is presented. The choice is based on the absolute values of the correlation matrix elements. The response function, expressed as a percentage, is also based on the correlation coefficient. The applicability and usefulness of this approach is demonstrated by the separation of fourteen local anesthetics. After identification, one-dimensional HPTLC can be used for quantisation by reflectance scanning by UV at the optimum wavelength.  相似文献   

17.
A novel method for the efficient discovery of new types of minor actinide (MA) ligands is based on the unique combination of "tea bag" split pool combinatorial chemistry and screening based on the inherent radioactivity of the complexed cations. Four multicoordinating Am(3+) chelating groups, such as CMPO (diphenylcarbamoylmethyl)phosphine oxide), PICO (picolinamide), DGA (N,N'-dimethyldiglycoldiamide), and MPMA (N-methyl-N-phenylmalonamide), on a trityl platform immobilized on TentaGelS served as a model library for the development of the screening method. This model library was screened under various conditions (i.e., 0.001 M < or = [HNO3] < or = 3 M, NaNO3 < or = 4 M, and [Eu] < or = 10 x [ligand]) showing competitive extraction of the four ligands. Other libraries of 9 and 72 members were synthesized by functionalization of the trityl platform with ligating groups that are composed of four building blocks (including at least one amide and one (phosphoric) hydrazone moiety). The screening of these two libraries resulted in the discovery of two multicoordinate ligands that contain ligating groups previously not known to complex Am(3+). Both are N-isopropyl amides terminated with a p-methoxyphenyl hydrazide (A2B1C1D10 K(D(Am)) = 2197) or a p-nitrophenyl hydrazide (A2B1C1D11 K(D(Am)) =1989) moiety, respectively. They are more efficient than the immobilized tritylCMPO ligand (K(D(Am)) = 1280) at 3 M HNO3. This method has the advantages of a high analytical sensitivity and the direct comparison of the extraction results. The method also allows the competitive screening of multiple nuclides which can be quantified by their radioactive emission spectrum.  相似文献   

18.
The Interval Correlation Optimised Shifting algorithm (icoshift) has recently been introduced for the alignment of nuclear magnetic resonance spectra. The method is based on an insertion/deletion model to shift intervals of spectra/chromatograms and relies on an efficient Fast Fourier Transform based computation core that allows the alignment of large data sets in a few seconds on a standard personal computer. The potential of this programme for the alignment of chromatographic data is outlined with focus on the model used for the correction function. The efficacy of the algorithm is demonstrated on a chromatographic data set with 45 chromatograms of 64,000 data points. Computation time is significantly reduced compared to the Correlation Optimised Warping (COW) algorithm, which is widely used for the alignment of chromatographic signals. Moreover, icoshift proved to perform better than COW in terms of quality of the alignment (viz. of simplicity and peak factor), but without the need for computationally expensive optimisations of the warping meta-parameters required by COW. Principal component analysis (PCA) is used to show how a significant reduction on data complexity was achieved, improving the ability to highlight chemical differences amongst the samples.  相似文献   

19.
The effects of four solvents, hexane, dichloromethane, ethyl acetate, methanol, and their mixtures on the separation of metabolites in crude extracts of Erythrina speciosa Andrews leaves were investigated using two strategies for open column chromatography. The classical extraction procedure was compared with mobile phases prepared according to a mixture design in order to explore the effects of solvent interactions on metabolite separations. Principal component analysis was used to compare the UV spectra obtained from RP-HPLC-DAD and to estimate the number of independent factors contained in the chromatographic data of the extracts. The results showed that, in addition to solvent polarity, solvent mixtures play an important role in metabolite separation. When pure solvents are used, larger groups of similar spectra are observed in the factor analysis score graphs indicating the same or a limited number of metabolite classes. In contrast solvent mixtures produced score graphs with a larger number of clusters indicating greater metabolic diversity. Besides resulting in more peaks than the pure solvents the chromatographic data of the design mixtures resulted in larger numbers of significant principal components confirming the greater chemical diversity of their extracts. Thus, if the objective of an analysis is to obtain metabolites of the same class, one should use pure solvents. On the other hand, binary and ternary solvent mixtures are recommended for more efficient investigations of class diversity and richer metabolite fingerprints.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号