首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Toxicity of chemicals induced by different factors is an important consideration, especially during the drug research and development process. Thus, there is urgent need to develop computationally effective models that can predict the toxicity or adverse effects of chemicals for a specific class of chemicals. In this study, random forest (RF) was used to classify five toxicity data sets from Distributed Structure‐Searchable Toxicity database network, using substructure fingerprints calculated directly from simple molecular structure. Three model validation approaches, out‐of‐bag validation incorporated in RF, fivefold cross‐validation, and an independent validation set, were used for assessing the prediction capability of our models. The chemical space analysis of data sets was explored by multidimensional scaling plots, and outlying molecules were also detected by the proximity measure in RF. At the same time, the important substructure fingerprints, recognized by the RF technique, gave some insights into the structure features related to toxicity of chemicals. The results obtained showed that these in silico classification models with substructure patterns and RF are applicable for potential toxicity prediction of chemical compounds. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

2.
DNA methylation is the most promising biomarker for estimating human age. There are various methods used for analyzing DNA methylation. Among those, the SNaPshot assay-based method provides a semi-quantitative measurement of DNA methylation using capillary electrophoresis on genetic analyzers. However, DNA methylation measures produced using different types of genetic analyzers have never been compared, although differences in methylation values can directly affect age estimates. To evaluate the differences between the results generated by different genetic analyzers, we analyzed the same blood, saliva, and control methylated DNA using three genetic analyzers—the Applied Biosystems 3130, 3500, and SeqStudio—and compared the methylation values at five CpG sites: ELOVL2, FHL2, KLF14, MIR29B2C, and TRIM59. The methylation value at each of the five CpG sites decreased in the order 3130, 3500, and SeqStudio. The differences in the results produced by the different genetic analyzers resulted in significant errors when applying the 3500 and SeqStudio data to a previous age estimation model constructed using the 3130 Genetic Analyzer data. Therefore, DNA methylation measurements from 3500 and SeqStudio were corrected using the regression functions obtained by plotting the DNA methylation data of one instrument versus the other to facilitate the application of DNA methylation data from one instrument to the age prediction model based on other instruments. The age prediction accuracy obtained by applying corrected 3500 and SeqStudio data to the existing age estimation model was as high as observed in the 3130 data.  相似文献   

3.
Unsupervised methods, such as principal component analysis, have gained popularity and wide‐spread acceptance in the chemometrics and applied statistics communities. Unsupervised random forest is an additional method capable of discovering underlying patterns in the data. However, the number of applications of unsupervised random forest in chemometrics has been limited. One possible cause for this is the belief that random forest can only be used in a supervised analysis setting. This tutorial introduces the basic concepts of unsupervised random forest and illustrates several applications in chemometrics through worked examples. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

4.
Age prediction is of great importance for criminal investigation and judicial expertise. DNA methylation status is considered a promising method to infer tissue age by virtue of age-dependent changes on methylation sites. In recent years, forensic scientists have established various models to predict the chronological age of blood, saliva, and semen based on DNA methylation status. However, hair-inferred age has not been studied in the field of forensic science. In this study, we measured the methylation statuses of potential age-related CpG sites by using the multiplex methylation SNaPshot method. A total of 10 CpG sites from the LAG3, SCGN, ELOVL2, KLF14, C1orf132, SLC12A5, GRIA2, and PDE4C genes were found to be tightly associated with age in hair follicles. A correlation coefficient above 0.7 was found for four CpG sites (cg24724428 and Chr6:11044628 in ELOVL2, cg25148589 in GRIA2, and cg07547549 in SLC12A5). Among four age-prediction models, the multiple linear regression model consisting of 10 CpG sites provided the best-fitting results, with a median absolute deviation of 3.68 years. It is feasible to obtain both human identification and age information from a single scalp hair follicle. No significant differences in methylation degree were found between different sexes, hair types, or hair colors. In conclusion, we established a method to evaluate chronological age by assessing DNA methylation status in hair follicles.  相似文献   

5.
We constructed six new models to analyze the DNA sequences. First, we regarded a DNA primary sequence as a random process in t and gave three ways to define nucleotides' random distribution functions. We extracted some parameters from the linear model and analyzed the changes of the nucleotides' distributions. In order to facilitate the comparison of DNA sequences, we proposed two ways to measure their similarities. Finally, we compared the six models by analyzing the similarities of the DNA primary sequences presented in Table 1 and selected the optimal one.  相似文献   

6.
Ninety‐six acidic phosphorus‐containing molecules with pKa 1.88 to 6.26 were collected and divided into training and test sets by random sampling. Structural parameters were obtained by density functional theory calculation of the molecules. The relationship between the experimental pKa values and structural parameters was obtained by multiple linear regression fitting for the training set, and tested with the test set; the R2 values were 0.974 and 0.966 for the training and test sets, respectively. This regression equation, which quantitatively describes the influence of structural parameters on pKa, and can be used to predict pKa values of similar structures, is significant for the design of new acidic phosphorus‐containing extractants. © 2016 Wiley Periodicals, Inc.  相似文献   

7.
8.
Squared prediction errors (SPE) in are discussed in relation to the conventional PLSR versus bidiagonalization model and algorithm issue concerning residual and prediction consistency, with focus on process monitoring and fault detection. Our analysis leads to the conclusion that conventional PLSR based on the NIPALS algorithm is ambiguous in SPE values caused by process faults. The basic reason for this is that the sample residuals are not found as projections onto the orthogonal complement of the space where the scores and regression solution are located, and where also the statistical limit is defined. The alternative non‐orthogonalized PLSR and bidiagonalization (Bidiag2) algorithms, as well as a simple re‐formulation of the NIPALS algorithm (RE‐PLSR), give unambiguous SPE values, and the last two of these also retain orthogonal score vectors. While prediction results from all of these methods in theory are identical, our conclusion is that methods where the and SPE values for process faults are uncorrelated should be preferred. Tests with added errors on real data do not indicate that this conclusion should be altered because of such errors. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

9.
10.
11.
The goal of this study is to explore the application of epigenetic markers in the identification of biofluids that are commonly found at the crime scene. A series of genetic loci were examined in order to define epigenetic markers that display differential methylation patterns between blood, saliva, semen, and epithelial tissue. Among the different loci tested, we have identified a panel of markers, C20orf117, ZC3H12D, BCAS4, and FGF7, that can be used in the determination of these four tissue types. Since methylation modifications occur at cytosine bases that are immediately followed by guanine bases (CpG sites), methylation levels were measured at CpG sites spanning each marker. Up to 11 samples of each tissue type were collected and subjected to bisulfite modification to convert unmethylated CpG-associated cytosine bases to thymine bases. The bisulfite modified DNA was then amplified via nested PCR using a primer set of which one primer was biotin labeled. Biotinylated PCR products were in turn analyzed and the methylation level at each CpG site was quantitated by pyrosequencing. The percent methylation values at each CpG site were determined and averaged for each tissue type. The results indicated significant methylation differences between the tissue types. The methylation patterns at the ZC3H12D and FGF7 loci differentiated sperm from blood, saliva, and epithelial cells. The C20orf117 locus differentiated blood from sperm, saliva, and epithelial cells and saliva was differentiated from blood, sperm, and epithelial cells at a fourth locus, BCAS4. The results of this study demonstrate the applicability of epigenetic markers as a novel tool for the determination of biofluids using bisulfite modification and pyrosequencing.  相似文献   

12.
T-lymphocyte (T-cell) is a very important component in human immune system. T-cell epitopes can be used for the accurately monitoring the immune responses which activation by major histocompatibility complex (MHC), and rationally designing vaccines. Therefore, accurate prediction of T-cell epitopes is crucial for vaccine development and clinical immunology. In current study, two types peptide features, i.e., amino acid properties and chemical molecular features were used for the T-cell epitopes peptide representation. Based on these features, random forest (RF) algorithm, a powerful machine learning algorithm, was used to classify T-cell epitopes and non-T-cell epitopes. The classification accuracy, sensitivity, specificity, Matthews correlation coefficient (MCC), and area under the curve (AUC) values for proposed method are 97.54%, 97.22%, 97.60%, 0.9193, and 0.9868, respectively. These results indicate that current method based on the combined features and RF is effective for T-cell epitopes prediction.  相似文献   

13.
Two new methods for the simultaneous determination of acetylsalicylic acid, acetaminophen and caffeine based on total absorbance measurements and their processing by multiple linear regession and partial least-squares regression are proposed. The concentration ranges used to construct the calibration matrix were 4.0-12.0, 2.0-10.0 and 0.9-6.0 μg ml−1 for acetylsalicylic acid, acetaminophen and caffeine respectively. The proposed methods were validated by using a set of synthetic sample mixtures and subsequently applied to the determination of the three active principles in three different pharmaceutical preparations.  相似文献   

14.
15.
The use of some unconventional non-linear modeling techniques, i.e. classification and regression trees and multivariate adaptive regression splines-based methods, was explored to model the blood-brain barrier (BBB) passage of drugs and drug-like molecules. The data set contains BBB passage values for 299 structural and pharmacological diverse drugs, originating from a structured knowledge-based database. Models were built using boosted regression trees (BRT) and multivariate adaptive regression splines (MARS), as well as their respective combinations with stepwise multiple linear regression (MLR) and partial least squares (PLS) regression in two-step approaches. The best models were obtained using combinations of MARS with either stepwise MLR or PLS. It could be concluded that the use of combinations of a linear with a non-linear modeling technique results in some improved properties compared to the individual linear and non-linear models and that, when the use of such a combination is appropriate, combinations using MARS as non-linear technique should be preferred over those with BRT, due to some serious drawbacks of the BRT approaches.  相似文献   

16.
Ambiguous alteration patterns of 5‐methylcytosine (5mC) and 5‐hydroxymethylcytosine (5hmC) involved in Alzheimer's disease (AD) obstructed the mechanism investigation of this neurological disorder from epigenetic view. Here, we applied a fully quantitative and validated LC‐MS/MS method to determine genomic 5mC and 5hmC in the brain cortex of 3 month‐aged (12, 15, and 18 month) AD model mouse and found significant increases of 5mC and 5hmC levels in different months of AD mouse when compared with age‐matched wild‐type control and exhibited rising trend from 12‐month to 18‐month AD mouse, thereby supporting genomic DNA methylation and hydroxymethylation were positively correlated with developing AD.  相似文献   

17.
This paper tests the performance of a simple empirical scoring function on a set of candidate designs produced by a de novo design package. The scoring function calculates approximate ligand-receptor binding affinities given a putative binding geometry. To our knowledge this is the first substantial test of an empirical scoring function of this type on a set of molecular designs which were then subsequently synthesised and assayed. The performance illustrates that the methods used to construct the scoring function and the reliance on plausible, yet potentially false, binding modes can lead to significant over-prediction of binding affinity in bad cases. This is anticipated on theoretical grounds and provides caveats on the reliance which can be placed when using the scoring function as a screen in the choice of molecular designs. To improve the predictability of the scoring function and to understand experimental results, it is important to perform subsequent Quantitative Structure-Activity Relationship (QSAR) studies. In this paper, Bayesian regression is performed to improve the predictability of the scoring function in the light of the assay results. Bayesian regression provides a rigorous mathematical framework for the incorporation of prior information, in this case information from the original training set, into a regression on the assay results of the candidate molecular designs. The results indicate that Bayesian regression is a useful and practical technique when relevant prior knowledge is available and that the constraints embodied in the prior information can be used to improve the robustness and accuracy of regression models. We believe this to be the first application of Bayesian regression to QSAR analysis in chemistry.  相似文献   

18.
19.
The model organism Hydra has been used for molecular studies for more than 20 years, however, its DNA base composition has not been determined yet. We have analyzed DNA and total RNA of the freshwater polyp Hydra magnipapillata with two independent procedures of high accuracy and sensitivity – fluorescence labeling of nucleotides followed by CE‐LIF detection and 32P‐postlabeling. DNA of Hydra was digested either to deoxyribonucleoside‐5′‐monophosphates or deoxyribonucleoside‐3′‐monophosphates selectively conjugated with the fluorescent dye 4,4‐difluoro‐5,7‐dimethyl‐4‐bora‐3a,4a‐diaza‐s‐indacene‐3‐propionyl ethylene diamine hydrochloride (BODIPY FL EDA) separated and detected using CE‐LIF. Both versions of the assay revealed a high A+T composition of 78 and 71%, whereas total DNA methylation (5‐methyldeoxycytidine) was 2.6 and 3.1%. Total Hydra RNA showed highest base levels for guanine (33%) and a level of 1.4% for pseudouracil. All values were in good agreement with those determined by the 32P‐postlabeling method.  相似文献   

20.
A separation‐free single‐base extension (SBE) assay utilizing fluorescence resonance energy transfer (FRET) was developed for rapid and convenient interrogation of DNA methylation status at specific cytosine and guanine dinucleotide sites. In this assay, the SBE was performed in a tube using an allele‐specific oligonucleotide primer (i.e., extension primer) labeled with Cy3 as a FRET donor fluorophore at the 5′‐end, a nucleotide terminator (dideoxynucleotide triphosphate) labeled with Cy5 as a FRET acceptor, a PCR amplicon derived from bisulfite‐converted genomic DNA, and a DNA polymerase. A single base‐extended primer (i.e., SBE product) that was 5′‐Cy3‐ and 3′‐Cy5‐tagged was formed by incorporation of the Cy5‐labeled terminator into the 3′‐end of the extension primer, but only if the terminator added was complementary to the target nucleotide. The resulting SBE product brought the Cy3 donor and the Cy5 acceptor into close proximity. Illumination of the Cy3 donor resulted in successful FRET and excitation of the Cy5 acceptor, generating fluorescence emission from the acceptor. The capacity of the developed assay to discriminate as low as 10% methylation from a mixture of methylated and unmethylated DNA was demonstrated at multiple cytosine and guanine dinucleotide sites.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号