首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
In the last two decades, the volumes of chemical and biological data are constantly increasing. The problem of converting data sets into knowledge is both expensive and time-consuming, as a result a workflow technology with platforms such as KNIME, was built up to facilitate searching through multiple heterogeneous data sources and filtering for specific criteria then extracting hidden information from these large data. Before any QSAR modeling, a manual data curation is extremely recommended. However, this can be done, for small datasets, but for the extensive data accumulated recently in public databases a manual process of big data will be hardly feasible. In this work, we suggest using KNIME as an automated solution for workflow in data curation, development, and validation of predictive QSAR models from a huge dataset.In this study, we used 250250 structures from NCI database, only 3520 compounds could successfully pass through our workflow safely with their corresponding experimental log P, this property was investigated as a case study, to improve some existing log P calculation algorithms.  相似文献   

2.
3.
4.
Guided data capture software (GDC) is described for mass-scale abstraction from the literature of experimental thermophysical and thermochemical property data for organic chemical systems involving one, two, and three components, chemical reactions, and chemical equilibria. Property values are captured with a strictly hierarchical system based upon rigorous application of the thermodynamic constraints of the Gibbs phase rule with full traceability to source documents. Key features of the program and its adherence to scientific principles are described with particular emphasis on data-quality issues, both in terms of data accuracy and database integrity.  相似文献   

5.
One of the most commonly performed in vitro ADME assays during the lead generation and lead optimization stage of drug discovery is metabolic stability evaluation. Metabolic stability is typically assessed in liver microsomes, which contain Phase I metabolizing enzymes, mainly cytochrome P450 enzymes (CYPs). The amount of parent drug metabolized by these CYPs is determined by LC/MS/MS. The metabolic stability data are typically used to rank order compounds for in vivo evaluation. We describe a streamlined and intelligent workflow for the metabolic stability assay that permits high throughput analyses to be carried out while maintaining the standard of high quality. This is accomplished in the following ways: a novel post-incubation pooling strategy based on c Log D3.0 values, coupled with ultra-performance liquid chromatography/tandem mass spectrometry (UPLC/MS/MS), enables sample analysis times to be reduced significantly while ensuring adequate chromatographic separation of compounds within a group, so as to reduce the likelihood of compound interference. Assay quality and fast turnaround of data reports is ensured by performing automated real-time intelligent re-analysis of discrete samples for compounds that do not pass user-definable criteria during the pooling analysis. Intelligent, user-independent data acquisition and data evaluation are accomplished via a custom visual basic program that ties together every step in the workflow, including cassette compound selection, compound incubation, compound optimization, sample analysis and re-analysis (when appropriate), data processing, data quality evaluation, and database upload. The workflow greatly reduces labor and improves data turnaround time while maintaining high data quality.  相似文献   

6.
This work provides a curated database of experimental and calculated hydration free energies for small neutral molecules in water, along with molecular structures, input files, references, and annotations. We call this the Free Solvation Database, or FreeSolv. Experimental values were taken from prior literature and will continue to be curated, with updated experimental references and data added as they become available. Calculated values are based on alchemical free energy calculations using molecular dynamics simulations. These used the GAFF small molecule force field in TIP3P water with AM1-BCC charges. Values were calculated with the GROMACS simulation package, with full details given in references cited within the database itself. This database builds in part on a previous, 504-molecule database containing similar information. However, additional curation of both experimental data and calculated values has been done here, and the total number of molecules is now up to 643. Additional information is now included in the database, such as SMILES strings, PubChem compound IDs, accurate reference DOIs, and others. One version of the database is provided in the Supporting Information of this article, but as ongoing updates are envisioned, the database is now versioned and hosted online. In addition to providing the database, this work describes its construction process. The database is available free-of-charge via http://www.escholarship.org/uc/item/6sd403pz.  相似文献   

7.
The exponential growth of large-scale molecular sequence data and of the PubMed scientific literature has prompted active research in biological literature mining and information extraction to facilitate genome/proteome annotation and improve the quality of biological databases. Motivated by the promise of text mining methodologies, but at the same time, the lack of adequate curated data for training and benchmarking, the Protein Information Resource (PIR) has developed a resource for protein literature mining—iProLINK (integrated Protein Literature INformation and Knowledge). As PIR focuses its effort on the curation of the UniProt protein sequence database, the goal of iProLINK is to provide curated data sources that can be utilized for text mining research in the areas of bibliography mapping, annotation extraction, protein named entity recognition, and protein ontology development. The data sources for bibliography mapping and annotation extraction include mapped citations (PubMed ID to protein entry and feature line mapping) and annotation-tagged literature corpora. The latter includes several hundred abstracts and full-text articles tagged with experimentally validated post-translational modifications (PTMs) annotated in the PIR protein sequence database. The data sources for entity recognition and ontology development include a protein name dictionary, word token dictionaries, protein name-tagged literature corpora along with tagging guidelines, as well as a protein ontology based on PIRSF protein family names. iProLINK is freely accessible at http://pir.georgetown.edu/iprolink, with hypertext links for all downloadable files.  相似文献   

8.
9.
Existing toxicological data may be used for a variety of purposes such as hazard and risk assessment or toxicity prediction. The potential use of such data is, in part, dependent upon their quality. Consideration of data quality is of key importance with respect to the application of chemicals legislation such as REACH. Whether data are being used to make regulatory decisions or build computational models, the quality of the output is reflected by the quality of the data employed. Therefore, the need to assess data quality is an important requirement for making a decision or prediction with an appropriate level of confidence. This study considers the biological and chemical factors that may impact upon toxicological data quality and discusses the assessment of data quality. Four general quality criteria are introduced and existing data quality assessment schemes are discussed. Two case study datasets of skin sensitization data are assessed for quality providing a comparison of existing assessment methods. This study also discusses the limitations and difficulties encountered during quality assessment, including the use of differing quality schemes and the global versus chemical-specific assessments of quality. Finally, a number of recommendations are made to aid future data quality assessments.  相似文献   

10.
11.
Existing toxicological data may be used for a variety of purposes such as hazard and risk assessment or toxicity prediction. The potential use of such data is, in part, dependent upon their quality. Consideration of data quality is of key importance with respect to the application of chemicals legislation such as REACH. Whether data are being used to make regulatory decisions or build computational models, the quality of the output is reflected by the quality of the data employed. Therefore, the need to assess data quality is an important requirement for making a decision or prediction with an appropriate level of confidence. This study considers the biological and chemical factors that may impact upon toxicological data quality and discusses the assessment of data quality. Four general quality criteria are introduced and existing data quality assessment schemes are discussed. Two case study datasets of skin sensitization data are assessed for quality providing a comparison of existing assessment methods. This study also discusses the limitations and difficulties encountered during quality assessment, including the use of differing quality schemes and the global versus chemical-specific assessments of quality. Finally, a number of recommendations are made to aid future data quality assessments.  相似文献   

12.
13.
《Fluid Phase Equilibria》2006,242(1):43-56
The purpose of this work is to evaluate the potential of modeling the self-diffusion coefficient (SDC) of real fluids in all fluid states based on Lennard–Jones analytical relationships involving the SDC, the temperature, the density and the pressure. For that, we generated an equation of state (EOS) that interrelates the self-diffusion coefficient, the temperature and the density of the Lennard–Jones (LJ) fluid. We fit the parameters of such LJ–SDC–EOS using recent wide ranging molecular simulation data for the LJ fluid. We also used in this work a LJ pressure–density–temperature EOS that we combined with the LJ–SDC–EOS to make possible the calculation of LJ–SDC values from given temperature and pressure. Both EOSs are written in terms of LJ dimensionless variables, which are defined in terms of the LJ parameters ɛ and σ. These parameters are meaningful at molecular level. By combining both EOSs, we generated LJ corresponding states charts which make possible to conclude that the LJ fluid captures the observed behavioral patterns of the self-diffusion coefficient of real fluids over a wide range of conditions. In this work, we also performed predictions of the SDC of real fluids in all fluid states. For that, we assumed that a given real fluid behaves as a Lennard–Jones fluid which exactly matches the experimental critical temperature Tc and the experimental critical pressure Pc of the real fluid. Such an assumption implies average true prediction errors of the order of 10% for vapors, light supercritical fluids, some dense supercritical fluids and some liquids. These results make possible to conclude that it is worthwhile to use the LJ fluid reference as a basis to model the self-diffusion coefficient of real fluids, over a wide range of conditions, without resorting to non-LJ correlations for the density–temperature–pressure relationship. The database considered here contains more than 1000 experimental data points. The database practical reduced temperature range is from 0.53 to 2.4, and the practical reduced pressure range is from 0 to 68.4.  相似文献   

14.
An inter-laboratory comparison exercise was conducted under the European Union funded project entitled: Screening Methods for Water Data Information in Support of the Implementation of the Water Framework Directive (SWIFT-WFD) and coordinated by the Consejo Superior de Investigaciones Científicas (CSIC), in order to evaluate the reproducibility of different toxicity tests based on the bioluminescence inhibition of Vibrio fischeri, for the rapid water toxicity assessment.For the first time, this type of exercise has been organized in Europe, and using different tests based on the same principle. In this exercise, 10 laboratories from 8 countries (Austria, Cyprus, Germany, Greece, Italy, Portugal, Romania, and Spain) took place, and a total number of 360 samples were distributed.During the exercise, six series of six samples were analyzed along 5 months. Every batch of samples was composed by three real samples and three standard solutions. The real samples were: a raw influent and the effluent of a wastewater treatment plant (WWTP), and a sample from a first settlement of the WWTP spiked with a mixture of toxicant standards.A final number of 330 (91.7%) samples was analyzed, 3300 values in duplicate were collected, and the results for each sample were expressed as the 50% effective concentration (EC50) values calculated through five points of dilution inhibition curves, after 5 and 15 min of incubation times.A statistical study was initiated using 660 results. The mean values, standard deviations (σ), variances (σ2), and upper and lower warning limits (UWL and LWL) were obtained, using the EC50 values calculated with the result from the participating laboratories.The main objectives of this toxicity ring study were to evaluate the repeatability (r) and reproducibility (R) when different laboratories conduct the test, the influence of complex matrix samples, the variability between different tests based on the same principle, and to determine the rate at which participating laboratories successfully completed tests initiated.In this exercise, the 3.93% toxicity values were outliers according with the Z-score values and the Dixon test. The samples with the greater number of outliers were those with the smallest variability coefficient, corresponding to the greater and the smaller toxicity level.No relation was found through the cluster analysis, between the final results and the different commercial devices involved. Testing by multiple commercial devices did not appear to reduce the precision of the results, and the variability coefficient for the exercise was nearby to the average value for past editions carried out at national level, where the different participants used the same commercial device.Stability of samples was also followed during the exercise. While statistical significance differences were not found for the greater part of samples, for the sample from the WWTP influent, a significant decrease of the toxicity value was found along this study. Nevertheless, this was a type of sample with a high toxicity level during all the exercise.On the other hand, in order to obtain the chemical characterization of real samples, those were analyzed by chromatographic techniques, using different sequential solid phase extraction (SSPE) procedures, followed by liquid chromatography coupled with mass spectrometry (LC-MS), and gas chromatography-mass spectrometry (GC-MS). Good agreement was found between the chemical analysis results and the toxicity level of the samples.  相似文献   

15.
The crystallographic community is in many ways an exemplar of the benefits and practices of sharing data. Since the inception of the technique, virtually every published crystal structure has been made available to others. This has been achieved through the establishment of several specialist data centres, including the Cambridge Crystallographic Data Centre, which produces the Cambridge Structural Database. Containing curated structures of small organic molecules, some containing a metal, the database has been produced for almost 50 years. This has required the development of complex informatics tools and an environment allowing expert human curation. As importantly, a financial model has evolved which has, to date, ensured the sustainability of the resource. However, the opportunities afforded by technological changes and changing attitudes to sharing data make it an opportune moment to review current practices.  相似文献   

16.
Curating the data underlying quantitative structure–activity relationship models is a never-ending struggle. Some curation can now be automated but much cannot, especially where data as complex as those pertaining to molecular absorption, distribution, metabolism, excretion, and toxicity are concerned (vide infra). The authors discuss some particularly challenging problem areas in terms of specific examples involving experimental context, incompleteness of data, confusion of units, problematic nomenclature, tautomerism, and misapplication of automated structure recognition tools.  相似文献   

17.
Ready access to the data embodied in the scientific literature is essential for those attempting to devise new applications for analytical methods. Many of the compilations of polarographic data which have been made in the past are now incomplete and obsolete. A more comprehensive and enduring database for polarography is now being compiled.  相似文献   

18.
A flexible data analysis tool for chemical genetic screens   总被引:1,自引:0,他引:1  
High-throughput assays generate immense quantities of data that require sophisticated data analysis tools. We have created a freely available software tool, SLIMS (Small Laboratory Information Management System), for chemical genetics which facilitates the collection and analysis of large-scale chemical screening data. Compound structures, physical locations, and raw data can be loaded into SLIMS. Raw data from high-throughput assays are normalized using flexible analysis protocols, and systematic spatial errors are automatically identified and corrected. Various computational analyses are performed on tested compounds, and dilution-series data are processed using standard or user-defined algorithms. Finally, published literature associated with active compounds is automatically retrieved from Medline and processed to yield potential mechanisms of actions. SLIMS provides a framework for analyzing high-throughput assay data both as a laboratory information management system and as a platform for experimental analysis.  相似文献   

19.
How the Bureau of Drugs laboratories and offices obtain chemical data from the scientific literature, from user complaints and product defect reporting systems, from the drug manufacturers, from analyses of drug samples collected from the market, and from analytical research are described. The chemical data thus educed have been used successfully in developing new analytical methods, in establishing better specifications of drug quality, in removing adulterated drugs from the marketplace, in successfully prosecuting purveyors of substandard drugs, and in general assuring that consumers are provided with safe and effective drugs of high quality.  相似文献   

20.
Warmr: a data mining tool for chemical data   总被引:5,自引:0,他引:5  
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号