首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Efficient recognition of tautomeric compound forms in large corporate or commercially available compound databases is a difficult and labor intensive task. Our data indicate that up to 0.5% of commercially available compound collections for bioscreening contain tautomers. Though in the large registry databases, such as Beilstein and CAS, the tautomers are found in an automated fashion using high-performance computational technologies, their real-time recognition in the nonregistry corporate databases, as a rule, remains problematic. We have developed an effective algorithm for tautomer searching based on the proprietary chemoinformatics platform. This algorithm reduces the compound to a canonical structure. This feature enables rapid, automated computer searching of most of the known tautomeric transformations that occur in databases of organic compounds. Another useful extension of this methodology is related to the ability to effectively search for different forms of compounds that contain ionic and semipolar bonds. The computations are performed in the Windows environment on a standard personal computer, a very useful feature. The practical application of the proposed methodology is illustrated by several examples of successful recovery of tautomers and different forms of ionic compounds from real commercially available nonregistry databases.  相似文献   

2.
A challenging practical problem in medicinal chemistry is the transfer of SAR information from one chemical series to another. Currently, there are no computational methods available to rationalize or support this process. Herein, we present a data mining approach that enables the identification of alternative analog series with different core structures, corresponding substitution patterns, and comparable potency progression. Scaffolds can be exchanged between these series and new analogs suggested that incorporate preferred R-groups. The methodology can be applied to search for alternative analog series if one series is known or, alternatively, to systematically assess SAR transfer potential in compound databases.  相似文献   

3.
4.
Abstract

Due to the high rate of data production and the need of researchers to have rapid access to new data, public databases have become the major medium through which genome mapping and sequencing data as well as macromolecular structural data are published. There are now more than 250 databases of biomolecular, structural, genetic, or phenotypic data, many of which are doubling in size annually. These databases, many of which were created and are maintained by experimentalists for their own research use, provide valuable collections of organized, validated data. However, the very number and diversity of databases now make efficient data resource discovery as important as effective data resource use. Existing autonomous biological databases contain related data which are more valuable when interconnected than when isolated. Political and scientific realities dictate that these databases will be built by different teams, in different locations, for different purposes, and using different data models and supporting DBMSs. As a consequence, connecting the related data they contain is not straightforward. Experience with existing biological databases indicates that it is possible to form useful queries across these databases, but that doing so usually requires expertise in the semantic structure of each source database. Advancing to the next level of integration among biological information resources poses significant technical and sociological challenges.  相似文献   

5.
Abstract

One of the aims of the emerging glycomics projects is to create a cell‐by‐cell catalogue of detected glycan structures. Mass spectrometry (MS) and NMR in combination with separation techniques are the most intensively applied experimental methods for the analysis of carbohydrates. Unlike genome and proteome databases, development of carbohydrate databases has gained a broader attention only recently. However, no spectral libraries of suitable pure and homogeneous standards have been compiled so far. The difficulties to describe complex carbohydrate structures are discussed and an overview of currently available data collections and applications is given. The current situation in glycobiology is characterized by a nearly complete loss of all primary analytical data. The Internet has fundamentally changed the practical and economic realities to collect and distribute scientific data. Four suitable approaches how to organize the updating process for analytical data collections in the field of glycosciences are discussed. It is anticipated that open access data collections provide a better dissemination of scientific data, quicken scientific findings, guarantee better quality of data and initiate a number of new initiatives to explore the available experimental data under various scientific questions. Therefore, any new initiative in glycosciences should be established under the open access philosophy.  相似文献   

6.
In routine analysis, screening methods based on real-time PCR are most commonly used for the detection of genetically modified (GM) plant material in food and feed. In this paper, it is shown that the combination of five DNA target sequences can be used as a universal screening approach for at least 81 GM plant events authorised or unauthorised for placing on the market and described in publicly available databases. Except for maize event LY038, soybean events DP-305423 and BPS-CV127-9 and cotton event 281-24-236 × 3006-210-23, at least one of the five genetic elements has been inserted in these GM plants and is targeted by this screening approach. For the detection of these sequences, fully validated real-time PCR methods have been selected. A screening table is presented that describes the presence or absence of the target sequences for most of the listed GM plants. These data have been verified either theoretically according to available databases or experimentally using available reference materials. The screening table will be updated regularly by a network of German enforcement laboratories.  相似文献   

7.
This review focuses on the possibilities and limits of nontarget screening of emerging contaminants, with emphasis on recent applications and developments in data evaluation and compound identification by liquid chromatography-high-resolution mass spectrometry (HRMS). The general workflow includes determination of the elemental composition from accurate mass, a further search for the molecular formula in compound libraries or general chemical databases, and a ranking of the proposed structures using further information, e.g., from mass spectrometry (MS) fragmentation and retention times. The success of nontarget screening is in some way limited to the preselection of relevant compounds from a large data set. Recently developed approaches show that statistical analysis in combination with suspect and nontarget screening are useful methods to preselect relevant compounds. Currently, the unequivocal identification of unknowns still requires information from an authentic standard which has to be measured or is already available in user-defined MS/MS reference databases or libraries containing HRMS spectral information and retention times. In this context, we discuss the advantages and future needs of publicly available MS and MS/MS reference databases and libraries which have mostly been created for the metabolomic field. A big step forward has been achieved with computer-based tools when no MS library or MS database entry is found for a compound. The numerous search results from a large chemical database can be condensed to only a few by in silico fragmentation. This has been demonstrated for selected compounds and metabolites in recent publications. Still, only very few compounds have been identified or tentatively identified in environmental samples by nontarget screening. The availability of comprehensive MS libraries with a focus on environmental contaminants would tremendously improve the situation.  相似文献   

8.
We have systematically enumerated graph representations of scaffold topologies for up to eight-ring molecules and four-valence atoms, thus providing coverage of the lower portion of the chemical space of small molecules (Pollock et al. J. Chem. Inf. Model., this issue). Here, we examine scaffold topology distributions for several databases: ChemNavigator and PubChem for commercially available chemicals, the Dictionary of Natural Products, a set of 2742 launched drugs, WOMBAT, a database of medicinal chemistry compounds, and two subsets of PubChem, "actives" and DSSTox comprising toxic substances. We also examined a virtual database of exhaustively enumerated small organic molecules, GDB (Fink et al. Angew. Chem., Int. Ed. 2005, 44, 1504-1508), and we contrast the scaffold topology distribution from these collections to the complete coverage of up to eight-ring molecules. For reasons related, perhaps, to synthetic accessibility and complexity, scaffolds exhibiting six rings or more are poorly represented. Among all collections examined, PubChem has the greatest scaffold topological diversity, whereas GDB is the most limited. More than 50% of all entries (13 000 000+ actual and 13 000 000+ virtual compounds) exhibit only eight distinct topologies, one of which is the nonscaffold topology that represents all treelike structures. However, most of the topologies are represented by a single or very small number of examples. Within topologies, we found that three-way scaffold connections (3-nodes) are much more frequent compared to four-way (4-node) connections. Fused rings have a slightly higher frequency in biologically oriented databases. Scaffold topologies can be the first step toward an efficient coarse-grained classification scheme of the molecules found in chemical databases.  相似文献   

9.
10.
Two databases have been constructed to facilitate applications of cheminformatics and molecular modeling to medicinal plants. The first contains data on known chemical constituents of 240 commonly used Chinese herbs, the other contains information on target specificities of bioactive plant compounds. Structures are available for all compounds. In the case of the Chinese herbal constituents database, further details include trivial and systematic names, compound class and skeletal type, botanical and Chinese (pinyin) names of associated herb(s), CAS registry number, chirality, pharmacological and toxicological information, and chemical references. For the bioactive plant compounds database, details of molecular target(s), IC50 and related measures, and associated botanical species are given. For Chinese herbs, approximately 7000 unique compounds are listed, though some are found in more than one herb, the total number for all herbs being 8264. For bioactive plant compounds, 2597 compounds active against 78 molecular targets are covered. Statistical relationships within and between the two databases are explored.  相似文献   

11.
Protein-protein interactions are fundamental in mediating biological processes including metabolism, cell growth, and signaling. To be able to selectively inhibit or induce protein activity or complex formation is a key feature in controlling disease. For those situations in which protein-protein interactions derive substantial affinity from short linear peptide sequences, or motifs, we can develop search algorithms for peptidomimetic compounds that resemble the short peptide's structure but are not compromised by poor pharmacological properties. SAAMCO is a Web service ( http://bioware.ucd.ie/ approximately saamco) that facilitates the screening of motifs with known structures against bioactive compound databases. It is built on an algorithm that defines compound similarity based on the presence of appropriate amino acid side chain fragments and a favorable Root Mean Squared Deviation (RMSD) between compound and motif structure. The methodology is efficient as the available compound databases are preprocessed and fast regular expression searches filter potential matches before time-intensive 3D superposition is performed. The required input information is minimal, and the compound databases have been selected to maximize the availability of information on biological activity. "Hits" are accompanied with a visualization window and links to source database entries. Motif matching can be defined on partial or full similarity which will increase or reduce respectively the number of potential mimetic compounds. The Web server provides the functionality for rapid screening of known or putative interaction motifs against prepared compound libraries using a novel search algorithm. The tabulated results can be analyzed by linking to appropriate databases and by visualization.  相似文献   

12.
Advantages like intuitive interpretation, objectivity, general applicability, and its easy, automated calculation make the rmsd (root-mean-squared deviation) the measure of choice for the investigation of the accuracy of conformational model generators. For comparing conformations of a single molecule this is a clearly superior method. Single molecule analysis is, however, a rare scenario. Typically, conformations are generated for huge corporate or external vendor databases of high diversity which are then further investigated with high-throughput computational methods like docking or pharmacophore searching, in virtual screening campaigns. Representative subsets for accuracy investigations of computational methods need to mimic this diversity. Averaged rmsd values over these data sets are frequently used to assess the accuracy of the methods. There are, however, significant weaknesses in rmsd comparisons for such kind of data sets. The interpretation is for example no longer intuitive because what can be expected in terms of good or bad rmsd values crucially depends on the data set composition like size or number of rotatable bonds of the underlying molecules. Further, rmsd lacks normalization which might result in very high averaged rmsd values for highly flexible molecules and thus might completely skew results. We have developed a novel measure to compare conformations of molecules called Torsion Fingerprint Deviation (TFD). It extracts, weights, and compares Torsion Fingerprints from a query molecule and generated conformations under consideration of acyclic bonds as well as ring systems. TFD is alignment-free and overcomes major limitations of rmsd while retaining its advantages.  相似文献   

13.
Many laboratories identify proteins by searching tandem mass spectrometry data against genomic or protein sequence databases. These database searches typically use the measured peptide masses or the derived peptide sequence and, in this paper, we focus on the latter. We study the minimum peptide sequence data requirements for definitive protein identification from protein sequence databases. Accurate mass measurements are not needed for definitive protein identification, even when a limited amount of sequence data is available for searching. This information has implications for the mass spectrometry performance (and cost), data base search strategies and proteomics research.  相似文献   

14.
Summary We have developed a program, HookSpace, which provides a simplistic approach to assessing the diversity of molecular databases. The spatial relationship between pairs of intramolecular functional groups can be analysed in a variety of ways to provide both qualitative and quantitative measures of diversity. Results are described and contrasted for two commercially available databases and a combinatorial library of benzodiazepam derivatives. HookSpace highlights the main differences in molecular content of these data sets.  相似文献   

15.
A number of significant improvements in the electrophoretic performance and design of DNA sequencing devices have culminated in the introduction of truly industrial grade production scale instruments. These instruments have been the workhorses behind the massive increase in genomic sequencing data available in public and private databases. We highlight the recent progress in aspects of capillary electrophoresis (CE) that has enabled these achievements. In addition, we summarize recent developments in the use of microfabricated devices for DNA sequencing that promise to bring the next leap in productivity.  相似文献   

16.
17.
18.
A number of different approaches have been described to identify proteins from tandem mass spectrometry (MS/MS) data. The most common approaches rely on the available databases to match experimental MS/MS data. These methods suffer from several drawbacks and cannot be used for the identification of proteins from unknown genomes. In this communication, we describe a new de novo sequencing software package, PEAKS, to extract amino acid sequence information without the use of databases. PEAKS uses a new model and a new algorithm to efficiently compute the best peptide sequences whose fragment ions can best interpret the peaks in the MS/MS spectrum. The output of the software gives amino acid sequences with confidence scores for the entire sequences, as well as an additional novel positional scoring scheme for portions of the sequences. The performance of PEAKS is compared with Lutefisk, a well-known de novo sequencing software, using quadrupole-time-of-flight (Q-TOF) data obtained for several tryptic peptides from standard proteins.  相似文献   

19.
Eight large chemical databases have been analyzed and compared to each other. Central to this comparison is the open National Cancer Institute (NCI) database, consisting of approximately 250 000 structures. The other databases analyzed are the Available Chemicals Directory ("ACD," from MDL, release 1.99, 3D-version); the ChemACX ("ACX," from CamSoft, Version 4.5); the Maybridge Catalog and the Asinex database (both as distributed by CamSoft as part of ChemInfo 4.5); the Sigma-Aldrich Catalog (CD-ROM, 1999 Version); the World Drug Index ("WDI," Derwent, version 1999.03); and the organic part of the Cambridge Crystallographic Database ("CSD," from Cambridge Crystallographic Data Center, 1999 Version 5.18). The database properties analyzed are internal duplication rates; compounds unique to each database; cumulative occurrence of compounds in an increasing number of databases; overlap of identical compounds between two databases; similarity overlap; diversity; and others. The crystallographic database CSD and the WDI show somewhat less overlap with the other databases than those with each other. In particular the collections of commercial compounds and compilations of vendor catalogs have a substantial degree of overlap among each other. Still, no database is completely a subset of any other, and each appears to have its own niche and thus "raison d'être". The NCI database has by far the highest number of compounds that are unique to it. Approximately 200 000 of the NCI structures were not found in any of the other analyzed databases.  相似文献   

20.
Peptide sequencing by mass spectrometry is gaining increasing importance for peptide chemistry and proteomics. However, available tools for interpreting matrix-assisted laser desorption/ionization post-source decay (MALDI-PSD) mass spectra depend on databases, and identify peptides by matching experimental data with spectra calculated from database sequences. This severely obstructs the identification of proteins and peptides not listed in databases or of variations, e.g. mutated proteins. The development of a new computer program for database-independent peptide sequencing by MALDI-PSD mass spectrometry is reported here. This computer program was validated by the determination of the correct sequences for various peptides including sequences listed in the sequence databases, but also for peptides that deviate from database sequences or are completely artificial. This strategy should substantially facilitate the identification of novel or variant peptides and proteins, and increase the power of MALDI-PSD analyses in proteomics.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号