首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
With the accelerated accumulation of genomic sequence data, there is a pressing need to develop computational methods and advanced bioinformatics infrastructure for reliable and large-scale protein annotation and biological knowledge discovery. The Protein Information Resource (PIR) provides an integrated public resource of protein informatics to support genomic and proteomic research. PIR produces the Protein Sequence Database of functionally annotated protein sequences. The annotation problems are addressed by a classification-driven and rule-based method with evidence attribution, coupled with an integrated knowledge base system being developed. The approach allows sensitive identification, consistent and rich annotation, and systematic detection of annotation errors, as well as distinction of experimentally verified and computationally predicted features. The knowledge base consists of two new databases, sequence analysis tools, and graphical interfaces. PIR-NREF, a non-redundant reference database, provides a timely and comprehensive collection of all protein sequences, totaling more than 1,000,000 entries. iProClass, an integrated database of protein family, function, and structure information, provides extensive value-added features for about 830,000 proteins with rich links to over 50 molecular databases. This paper describes our approach to protein functional annotation with case studies and examines common identification errors. It also illustrates that data integration in PIR supports exploration of protein relationships and may reveal protein functional associations beyond sequence homology.  相似文献   

2.
李勋  王任小 《中国化学》2009,27(1):23-28
我们发展了一种名为KIAb(Keyword-based Identification of Antibodies)的方法用于自动识别Protein Data Bank(PDB)中的抗体结构。该方法通过读取PDB格式的文件,查找与抗体相关的特定关键词并做出判断。我们使用该方法从PDB中识别出780个结构文件,经人工检查其中767个为抗体,成功率高达98.3%。结果基本包括了抗体结构数据库Summary of Antibody Crystal Structures(SACS)中收录的所有条目,而且还包括该数据库没有收录的34个抗体结构。因此该方法对PDB数据库中抗体的识别更为完备而且具有很低的假阳性率。  相似文献   

3.
Melting points and fusion enthalpies are predicted for a series of 81 compounds by combining experimental solubilities in a variety of solvents and analyzed according to the theory of mobile order and disorder (MOD) and using the total phase change entropy estimated by a group additivity method. The error associated in predicting melting points is dependent on the magnitude of the temperature predicted. An error of +/- 12 K (+/- 1 sigma) was obtained for compounds melting between ambient temperature and 350 K (24 entries). This error increased to +/- 23 K when the temperature range was expanded to 400 K (46 entries) and +/- 39 K for the temperature range 298-555 K (79 entries). Fusion enthalpies were predicted within +/- 2sigma of the experimental values (+/- 6.4 kJ mol(-1)) for 79 entries. The uncertainty in the fusion enthalpy did not appear dependent on the magnitude of the melting point. Two outliers, adamantane and camphor, have significant phase transitions that occur below room temperature. Estimates of melting temperature and fusion enthalpy for these compounds were characterized by significantly larger errors.  相似文献   

4.
MODULEWRITER is a PERL object relational mapping (ORM) tool that automatically generates database specific application programming interfaces (APIs) for SQL databases. The APIs consist of a package of modules providing access to each table row and column. Methods for retrieving, updating and saving entries are provided, as well as other generally useful methods (such as retrieval of the highest numbered entry in a table). MODULEWRITER provides for the inclusion of user-written code, which can be preserved across multiple runs of the MODULEWRITER program.  相似文献   

5.
()New data, tools and services recently made available on the web server (http://cactus.nci.nih.gov) of the Computer-Aided Drug Design (CADD) Group, NCI, NIH, developed in the context of chemoinformatics and drug development work, are presented. These tools are designed for searching for structures in very large databases of small molecules. One of them is a web service-the Chemical Structure Look-up Service (CSLS)-for very rapid structure look-up in an aggregated collection of more than 80 databases comprising more than 27 million unique structures at the time of this writing. CSLS contains pointers to the entries in toxicology-related databases, catalogues of commercially available samples, drugs, assay results data sets, and databases in several other categories. CSLS allows the user to find out very rapidly in which one(s) of all these databases a given structure occurs independent of the representation of the input structure, by making use of InChIs as well as new CACTVS hashcode-based identifiers. These latter, calculable, identifiers are designed to take into account tautomerism, different resonance structures drawn for charged species, and presence of additional fragments. They make possible fine-tunable yet rapid compound identification and database overlap analyses in very large compound collections.  相似文献   

6.
Protein identification methods in proteomics   总被引:30,自引:0,他引:30  
A combination of high-resolution two-dimensional (2-D) polyacrylamide gel electrophoresis, highly sensitive biological mass spectrometry, and the rapidly growing protein and DNA databases has paved the way for high-throughput proteomics. This review concentrates on protein identification. We first discuss the use of protein electroblotting and Edman sequencing as tools for de novo sequencing and protein identification. In the second part, we highlight matrix-assisted laser desorption/ionization mass spectrometry (MALDI-MS) as one of the main contemporary analytical methods for linking gel-separated proteins to entries in sequence databases. In this context we describe the two main MALDI-MS-based identification methods: (i) peptide mass fingerprinting, and (ii) post-source decay (PSD) analysis. In the last part, we briefly emphasize the importance of sample preparation for obtaining highly sensitive and high-quality MALDI-MS spectra.  相似文献   

7.
Considering the uncertainty of measurement when assessing compliance with reference values given in compositional specifications and statutory limits is still a controversial matter. In theory, assessing compliance requires considering both type I (false positive) and type II (false negative) errors. The more the concentration of the analyte in the sample under investigation is close to the allowed concentration limit, the more critical it is to consider both types of errors. This paper describes how this could be done. The matter is discussed in the light of the most recent literature information.  相似文献   

8.
High-throughput DNA sequencing has resulted in increasing input in protein sequence databases. Today more than 20 genomes have been sequenced and many more will be completed in the near future, including the largest of them all, the human genome. Presently, sequence databases contain entries for more than 425.000 protein sequences. However, the cellular functions are determined by the set of proteins expressed in the cell--the proteome. Two-dimensional gel electrophoresis, mass spectrometry and bioinformatics have become important tools in correlating the proteome with the genome. The current dominant strategies for identification of proteins from gels based on peptide mass spectrometric fingerprinting and partial sequencing by mass spectrometry are described. After identification of the proteins the next challenge in proteome analysis is characterization of their post-translational modifications. The general problems associated with characterization of these directly from gel separated proteins are described and the current state of art for the determination of phosphorylation, glycosylation and proteolytic processing is illustrated.  相似文献   

9.
Protein-protein interactions are fundamental in mediating biological processes including metabolism, cell growth, and signaling. To be able to selectively inhibit or induce protein activity or complex formation is a key feature in controlling disease. For those situations in which protein-protein interactions derive substantial affinity from short linear peptide sequences, or motifs, we can develop search algorithms for peptidomimetic compounds that resemble the short peptide's structure but are not compromised by poor pharmacological properties. SAAMCO is a Web service ( http://bioware.ucd.ie/ approximately saamco) that facilitates the screening of motifs with known structures against bioactive compound databases. It is built on an algorithm that defines compound similarity based on the presence of appropriate amino acid side chain fragments and a favorable Root Mean Squared Deviation (RMSD) between compound and motif structure. The methodology is efficient as the available compound databases are preprocessed and fast regular expression searches filter potential matches before time-intensive 3D superposition is performed. The required input information is minimal, and the compound databases have been selected to maximize the availability of information on biological activity. "Hits" are accompanied with a visualization window and links to source database entries. Motif matching can be defined on partial or full similarity which will increase or reduce respectively the number of potential mimetic compounds. The Web server provides the functionality for rapid screening of known or putative interaction motifs against prepared compound libraries using a novel search algorithm. The tabulated results can be analyzed by linking to appropriate databases and by visualization.  相似文献   

10.
11.
High-throughput DNA sequencing has resulted in increasing input in protein sequence databases. Today more than 20 genomes have been sequenced and many more will be completed in the near future, including the largest of them all, the human genome. Presently, sequence databases contain entries for more than 425.000 protein sequences. However, the cellular functions are determined by the set of proteins expressed in the cell – the proteome. Two-dimensional gel electrophoresis, mass spectrometry and bioinformatics have become important tools in correlating the proteome with the genome. The current dominant strategies for identification of proteins from gels based on peptide mass spectrometric fingerprinting and partial sequencing by mass spectrometry are described. After identification of the proteins the next challenge in proteome analysis is characterization of their post-translational modifications. The general problems associated with characterization of these directly from gel separated proteins are described and the current state of art for the determination of phosphorylation, glycosylation and proteolytic processing is illustrated. Received: 16 December 1999 / Accepted: 17 December 1999  相似文献   

12.
Tandem mass spectrometry (MS/MS) has been widely used in proteomics studies. Multiple algorithms have been developed for assessing matches between MS/MS spectra and peptide sequences in databases. However, it is still a challenge to reduce false negative rates without compromising the high confidence of peptide identification. In this study, we developed the score, Oscore, by logistic regression using SEQUEST and AMASS variables to identify fully tryptic peptides. Since these variables showed complicated association with each other, combining them together rather than applying them to a threshold model improved the classification of correct and incorrect peptide identifications. Oscore achieved both a lower false negative rate and a lower false positive rate than PeptideProphet on datasets from 18 known protein mixtures and several proteome-scale samples of different complexity, database size and separation methods. By a three-way comparison among Oscore, PeptideProphet and another logistic regression model which made use of PeptideProphet's variables, the main contributor for the improvement made by Oscore is discussed.  相似文献   

13.
A new approach is developed for estimating the limit of detection in second-order bilinear calibration with the generalized rank annihilation method (GRAM). The proposed estimator is based on recently derived expressions for prediction variance and bias. It follows the latest IUPAC recommendations in the sense that it concisely accounts for the probabilities of committing both types I and II errors, i.e. false positive and false negative declarations, respectively. The estimator has been extensively validated with simulated data, yielding promising results.  相似文献   

14.
Near infrared (NIR) reflectance spectroscopy was used to develop a non-destructive and rapid qualitative method for the analysis of plastic films used by the pharmaceutical industry for blistering. Three types of films were investigated: 250 microm PVC [poly(vinyl chloride)] films, 250 microm PVC films coated with 40 g m(-2) of PVDC [poly(vinylidene dichloride)] and 250 microm PVC films coated with 5 g m(-2) of TE (Thermoelast) and 90 g m(-2) of PVDC. Three analyses were carried out using different pre-treatment options and a PLS (partial least squares) algorithm. Each analysis was aimed at identifying one type of film and rejecting all types of false sample (different thickness, colour or layer). True and false samples from four plastics manufacturers were included in the calibration sets in order to obtain robust methods that were suitable regardless of the supplier. Specificity was demonstrated by testing validation sets against the methods. The tests showed 0% of type I (false negative identification) and 1% of type II errors (false positive identification) for the PVC method, 13 and 3%, respectively, for the PVC-PVDC method and no error for the PVC-TE-PVDC method. Type II errors, mostly due to the slight sensitivity of the methods to film thickness, are easily corrected by simple thickness measurements. This study demonstrates that NIR spectroscopy is an excellent tool for the identification of PVC-based films. The three methods can be used by the pharmaceutical industry or plastics manufacturers for the quality control of films used in blister packaging.  相似文献   

15.
Summary A modular method for pursuing structure-based inhibitor design in the framework of a design cycle is presented. The approach entails four stages: (1) a design pathway is defined in the three-dimensional structure of a target protein; (2) this pathway is divided into subregions; (3) complementary building blocks, also called fragments, are designed in each subregion; complementarity is defined in terms of shape, hydrophobicity, hydrogen bond properties and electrostatics; and (4) fragments from different subregions are linked into potential lead compounds. Stages (3) and (4) are qualitatively guided by force-field calculations. In addition, the designed fragments serve as entries for retrieving existing compounds from chemical databases. This linked-fragment approach has been applied in the design of potentially selective inhibitors of triosephosphate isomerase from Trypanosoma brucei, the causative agent of sleeping sickness.  相似文献   

16.
Publicly available compound and bioactivity databases provide an essential basis for data-driven applications in life-science research and drug design. By analyzing several bioactivity repositories, we discovered differences in compound and target coverage advocating the combined use of data from multiple sources. Using data from ChEMBL, PubChem, IUPHAR/BPS, BindingDB, and Probes & Drugs, we assembled a consensus dataset focusing on small molecules with bioactivity on human macromolecular targets. This allowed an improved coverage of compound space and targets, and an automated comparison and curation of structural and bioactivity data to reveal potentially erroneous entries and increase confidence. The consensus dataset comprised of more than 1.1 million compounds with over 10.9 million bioactivity data points with annotations on assay type and bioactivity confidence, providing a useful ensemble for computational applications in drug design and chemogenomics.  相似文献   

17.
Loop mediated isothermal amplification (LAMP) is a nucleic acid amplification technique performed under isothermal conditions. The output of this amplification technique includes multiple different sizes of deoxyribonucleic acid (DNA) structures which are identified by a banding pattern on gel electrophoresis plots. Although this is a specific amplification technique, the complexity of the primer design and amplification still lead to the issue of obtaining false‐positive results, especially when a positive reading is determined solely by whether there is any banding pattern in the gel electrophoresis plot. Here, we first performed extensive LAMP experiments and evaluated the DNA structures using microchip electrophoresis. We then developed a mathematical model derived from the various components that make up an entire LAMP structure to predict the full LAMP structure size in base pairs. This model can be implemented by users to make predictions for specific, DNA size dependent, banding patterns on their gel electrophoresis plots. Each prediction is specific to the target sequence and primers used and therefore reduces incorrect diagnosis errors through identifying true‐positive and false‐positive results. This model was accurately tested with multiple primer sets in house and was also translatable to different DNA and RNA types in previously published literature. The mathematical model can ultimately be used to reduce false‐positive LAMP diagnosis errors for applications ranging from tuberculosis diagnostics to E. coli to numerous other infectious diseases.  相似文献   

18.
Production of chemical reaction databases is a multistep process, with the possibility of errors at each of these steps. VET is a tool developed to trap errors in the chemical reactions identified as a part of this process. VET has been designed to minimize the acceptance of incorrect reactions, while still supporting various common practices in reaction depiction, including unbalanced reactions, suppressed components, and reactions with alternative products. We discuss the assumptions made in its construction, a general overview of its structure, and some performance characteristics.  相似文献   

19.
Activity data for small molecules are invaluable in chemoinformatics. Various bioactivity databases exist containing detailed information of target proteins and quantitative binding data for small molecules extracted from journals and patents. In the current work, we have merged several public and commercial bioactivity databases into one bioactivity metabase. The molecular presentation, target information, and activity data of the vendor databases were standardized. The main motivation of the work was to create a single relational database which allows fast and simple data retrieval by in-house scientists. Second, we wanted to know the amount of overlap between databases by commercial and public vendors to see whether the former contain data complementing the latter. Third, we quantified the degree of inconsistency between data sources by comparing data points derived from the same scientific article cited by more than one vendor. We found that each data source contains unique data which is due to different scientific articles cited by the vendors. When comparing data derived from the same article we found that inconsistencies between the vendors are common. In conclusion, using databases of different vendors is still useful since the data overlap is not complete. It should be noted that this can be partially explained by the inconsistencies and errors in the source data.  相似文献   

20.
The identity of 45 protein spots representing 32 orthologues within the Ochrobactrum anthropi proteome within a gradient of pH 4-7, and mass range 5-90 kDa were determined across species boundaries. These proteins could be classified into 13 functional categories and establish metabolic, regulatory and translatory systems including amino acid biosynthesis, electron transport and the potential for plant symbiosis in a molecularly understudied organism. Amino acid composition and/or peptide mass fingerprinting were employed as a means to search the Swiss-Prot and OWL protein sequence databases for similarity within a broad taxonomic class of bacteria. Candidate matches from database searches could be compared and a simple multiplication matrix based on co-occurrence and rank within the top 96 most similar entries was used to provide statistical confidence. This mathematical matrix was evaluated with respect to the characterisation of O. anthropi, an unsequenced and understudied bacterium, in the light of the recent influx of DNA sequence information.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号