首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 203 毫秒
1.
2.
With the public availability of large data sources such as ChEMBLdb and the Open PHACTS Discovery Platform, retrieval of data sets for certain protein targets of interest with consistent assay conditions is no longer a time consuming process. Especially the use of workflow engines such as KNIME or Pipeline Pilot allows complex queries and enables to simultaneously search for several targets. Data can then directly be used as input to various ligand- and structure-based studies. In this contribution, using in-house projects on P-gp inhibition, transporter selectivity, and TRPV1 modulation we outline how the incorporation of linked life science data in the daily execution of projects allowed to expand our approaches from conventional Hansch analysis to complex, integrated multilayer models.  相似文献   

3.
The scientific literature is important source of experimental and chemical structure data. Very often this data has been harvested into smaller or bigger data collections leaving the data quality and curation issues on shoulders of users. The current research presents a systematic and reproducible workflow for collecting series of data points from scientific literature and assembling a database that is suitable for the purposes of high quality modelling and decision support. The quality assurance aspect of the workflow is concerned with the curation of both chemical structures and associated toxicity values at (1) single data point level and (2) collection of data points level. The assembly of a database employs a novel “timeline” approach. The workflow is implemented as a software solution and its applicability is demonstrated on the example of the Tetrahymena pyriformis acute aquatic toxicity endpoint. A literature collection of 86 primary publications for T. pyriformis was found to contain 2,072 chemical compounds and 2,498 unique toxicity values, which divide into 2,440 numerical and 58 textual values. Every chemical compound was assigned to a preferred toxicity value. Examples for most common chemical and toxicological data curation scenarios are discussed.  相似文献   

4.
We developed a read-across workflow using the OECD QSAR Toolbox for the prediction of skin irritation and corrosion. In the workflow, we gathered analogues using an improved profiler for skin irritation and corrosion to define valid categories. In addition, we refined categories by removing chemicals based on melting points and structural features. Finally, prediction results were obtained using our self-determined rule for read-across. In this rule, we decided the number of analogues from which the read-across is performed, analogue selection criteria (i.e. high similarity vs. near log Pow) and prediction rule (i.e. majority vs. unanimity). We created a program for the optimization of read-across workflows. We applied this program to 313 chemicals in the training set and sought the optimized workflows among >1000 possible choices of profilers and ways of subcategorization and data gap filling. Use of the optimized workflows provided highly accurate, unbiased, user-independent and reproducible read-across predictions. The prediction results obtained from read-across workflows can be used for the selection of in vitro test methods or as part of the weight-of-evidence approaches in the Integrated Approach on Testing and Assessment for skin irritation and corrosion. Moreover, these results can be used for screening purposes and/or preliminary hazard assessment.  相似文献   

5.
Traditional quantitative structure-activity relationship (QSAR) models aim to capture global structure-activity trends present in a data set. In many situations, there may be groups of molecules which exhibit a specific set of features which relate to their activity or inactivity. Such a group of features can be said to represent a local structure-activity relationship. Traditional QSAR models may not recognize such local relationships. In this work, we investigate the use of local lazy regression (LLR), which obtains a prediction for a query molecule using its local neighborhood, rather than considering the whole data set. This modeling approach is especially useful for very large data sets because no a priori model need be built. We applied the technique to three biological data sets. In the first case, the root-mean-square error (RMSE) for an external prediction set was 0.94 log units versus 0.92 log units for the global model. However, LLR was able to characterize a specific group of anomalous molecules with much better accuracy (0.64 log units versus 0.70 log units for the global model). For the second data set, the LLR technique resulted in a decrease in RMSE from 0.36 log units to 0.31 log units for the external prediction set. In the third case, we obtained an RMSE of 2.01 log units versus 2.16 log units for the global model. In all cases, LLR led to a few observations being poorly predicted compared to the global model. We present an analysis of why this was observed and possible improvements to the local regression approach.  相似文献   

6.
Ecotoxicity assessment is essential before placing new chemical substances on the market. An investigation of the use of the chromatographic retention (log k) in biopartitioning micellar chromatography (BMC) as an in vitro approach to evaluate the bioconcentration factor (BCF) of pesticides in fish is proposed. A heterogeneous set of 85 pesticides from six chemical families was used. For pesticides exhibiting bioconcentration in fish (experimental log BCF > 2), a quantitative retention-activity relationships (QRAR) model is able to perform precise log BCF estimations of new pesticides. Considering the present data, the results based on log k seem to be more reliable than those from available software (BCFWIN and KOWWIN) and from log P (quantitative structure-activity relationships (QSAR)). It is also possible to perform risk assessment tasks fixing a threshold value for log k, which substitute two common threshold values, log P and experimental log BCF, avoiding the experimental problems related with these two parameters.  相似文献   

7.
A novel method (in the context of quantitative structure-activity relationship (QSAR)) based on the k nearest neighbour (kNN) principle, has recently been introduced for the derivation of predictive structure-activity relationships. Its performance has been tested for estimating the estrogen binding affinity of a diverse set of 142 organic molecules. Highly predictive models have been obtained. Moreover, it has been demonstrated that consensus-type kNN QSAR models, derived from the arithmetic mean of individual QSAR models were statistically robust and provided more accurate predictions than the great majority of the individual QSAR models. Finally, the consensus QSAR method was tested with 3D QSAR and log P data from a widely used steroid benchmark data set.  相似文献   

8.
9.
Reversed-phase thin-layer chromatography with RP-8, RP-18, and RP-18W stationary phases was used in quantitative structure-activity relationship (QSAR) studies of new antimycotic compounds. The retention behavior of 10 dihydroxythiobenzanilides was examined for acquisition of log k' data. With water-acetone mixtures as the mobile phases, the concentration range for which the correlation between log k' and acetone concentration is linear was established for each stationary phase and used to determine hydrophobicity parameters log k'w by linear extrapolation. The effect of substituents on retention constants was quantitated by using the group contribution parameters tau W. On the basis of QSAR equations obtained from these studies, log k'w data can be used to predict antifungal activities of dihydroxythiobenzanilides with satisfactory accuracy.  相似文献   

10.
11.
12.
Predicting the log of the partition coefficient P is a long-standing benchmark problem in Quantitative Structure-Activity Relationships (QSAR). In this paper we show that a relatively simple molecular representation (using 14 variables) can be combined with leading edge machine learning algorithms to predict logP on new compounds more accurately than existing benchmark algorithms which use complex molecular representations.  相似文献   

13.
QSAR generated data appear as an attractive alternative to experimental data as foreseen in the proposed new chemicals legislation REACH. A preliminary risk assessment for the aquatic environment can be based on few factors, i.e. the octanol-water partition coefficient (Kow), the vapour pressure (VP) and the potential biodegradability of the compound in combination with the predicted no-effect concentration (PNEC) and the actual tonnage in which the substance is produced. Application of partial order ranking, allowing simultaneous inclusion of several parameters leads to a mutual prioritisation of the investigated substances, the prioritisation possibly being further analysed through the concept of linear extensions and average ranks. The ranking uses endpoint values (log Kow and log VP) derived from strictly linear 'noise-deficient' QSAR models as input parameters. Biodegradation estimates were adopted from the BioWin module of the EPI Suite. The population growth impairment of Tetrahymena pyriformis was used as a surrogate for fish lethality.  相似文献   

14.
15.
16.
Human dihydrofolate reductase (hDHFR) inhibitors have been a popular research object designed as anti-cancer, anti-malarial, and antibacterial drugs for decades. Besides quantitative structure-activity relationship (QSAR), artificial intelligence (AI) has recently been introduced in numerous professional biological researches, such as molecular drug design and biological activity prediction. In this study, we construct a deep-learning workflow for designing novel hDHFR inhibitors. This workflow mainly includes two networks, as described in the following: The first one is the artificial neural network trained by the molecules selected from the ChEMBL database with experimental hDHFR inhibitions as the label to evaluate the bioactivity of the designed molecular structures constructed from the second network. The second network utilizes conditional generative and adversarial networks (cGAN) to generate candidate molecules with the desired properties. Finally, the obtained candidate molecules with high hDHFR inhibition are subjected to a molecular docking process to verify their binding patterns and affinity strengths inside the active site of hDHFR. In the end, we have successfully identified several novel drug-like compounds with hDHFR inhibition comparable to those currently used in clinics. We present a new tool to effectively design new drug-like compounds through an AI approach.  相似文献   

17.
In this study we have investigated the relative correlation potential of Wiener (W), Szeged (Sz), and molecular connectivity indices (0chiR, 1chiR and 2chiR) in developing quantitative structure-activity relationships, QSAR; log P values of benzoic acid and its nuclear-substituted derivatives were used for this purpose. The statistical analyses for univariate and multivariate correlations had indicated that both W and Sz are closely related to the connectivity indices (mchiR) and that the W, the Sz, and the 1chiR indices have similar modeling potentials. 1chiR gives slightly better results than both W and Sz. Other connectivity indices 0chiR and 2chiR correlate poorly with log P.  相似文献   

18.
Tandem mass spectrometry is commonly used to identify peptides (and thereby proteins) that are present in complex mixtures. Peptide identification from tandem mass spectra is partially automated, but still requires human curation to resolve "borderline" peptide-spectrum matches (PSMs). SILVER is web-based software that assists manual curation of tandem mass spectra, using a recently developed intensity-based machine-learning approach to scoring PSMs, Elias et al. In this method, a large training set of peptide, fragment, and peak-intensity properties for both matched and mismatched PSMs was used to develop a score measuring consistency between each predicted fragment ion of a candidate peptide and its corresponding observed spectral peak intensity. The SILVER interface provides a visual representation of match quality between each candidate fragment ion and the observed spectrum, thereby expediting manual curation of tandem mass spectra. SILVER is available online at http://llama.med.harvard.edu/Software.html.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号