首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Outlier detection is crucial in building a highly predictive model. In this study, we proposed an enhanced Monte Carlo outlier detection method by establishing cross‐prediction models based on determinate normal samples and analyzing the distribution of prediction errors individually for dubious samples. One simulated and three real datasets were used to illustrate and validate the performance of our method, and the results indicated that this method outperformed Monte Carlo outlier detection in outlier diagnosis. After these outliers were removed, the value of validation by Kovats retention indices and the root mean square error of prediction decreased from 3.195 to 1.655, and the average cross‐validation prediction error decreased from 2.0341 to 1.2780. This method helps establish a good model by eliminating outliers. © 2015 Wiley Periodicals, Inc.  相似文献   

2.
High throughput analysis of differential gene expression is a powerful tool that can be applied to many areas in molecular cell biology, including differentiation, development, physiology, and pharmacology. In recent years, a variety of techniques have been developed to analyze differential gene expression, including comparative expressed sequence tag sequencing, differential display, representational difference analysis, cDNA or oligonucleotide arrays, and serial analysis of gene expression. This review explains the technologies, their scopes, impact on science, as well as their costs and possible limitations. The application of differential display is presented as a tool to identify genes induced by darkness or yellowing process in rice leaves.  相似文献   

3.
4.
A new strategy of outlier detection for QSAR/QSPR   总被引:1,自引:0,他引:1  
The crucial step of building a high performance QSAR/QSPR model is the detection of outliers in the model. Detecting outliers in a multivariate point cloud is not trivial, especially when several outliers coexist in the model. The classical identification methods do not always identify them, because they are based on the sample mean and covariance matrix influenced by the outliers. Moreover, existing methods only lay stress on some type of outliers but not all the outliers. To avoid these problems and detect all kinds of outliers simultaneously, we provide a new strategy based on Monte‐Carlo cross‐validation, which was termed as the MC method. The MC method inherently provides a feasible way to detect different kinds of outliers by establishment of many cross‐predictive models. With the help of the distribution of predictive residuals such obtained, it seems to be able to reduce the risk caused by the masking effect. In addition, a new display is proposed, in which the absolute values of mean value of predictive residuals are plotted versus standard deviations of predictive residuals. The plot divides the data into normal samples, y direction outliers and X direction outliers. Several examples are used to demonstrate the detection ability of MC method through the comparison of different diagnostic methods. © 2009 Wiley Periodicals, Inc. J Comput Chem, 2010  相似文献   

5.
A novel outlier detection method in partial least squares based on random sample consensus is proposed. The proposed algorithm repeatedly generates partial least squares solutions estimated from random samples and then tests each solution for the support from the complete dataset for consistency. A comparative study of the proposed method and leave-one-out cross validation in outlier detection on simulated data and near-infrared data of pharmaceutical tablets is presented. In addition, a comparison between the proposed method and PLS, RSIMPLS, PRM is provided. The obtained results demonstrate that the proposed method is highly efficient.  相似文献   

6.
By combining the advantages of RT-PCR with the sensitivity of bioluminescence using the photoprotein aequorin, a bioluminescence assay has been applied to the determination of message regulation during infectious disease. The bioluminescence produced by the aequorin conjugate covers more than seven logs concentration, of which approximately five logs produces a linear relationship between product and bioluminescence signal. Aequorin - based bioluminescent detection protocols for mRNA are sensitive into the attomolar range, which obligate fewer cycles of PCR and avoid the plateau effect traditionally associated with other noncompetitive RT-PCR techniques. Additional advantages of aequorin-based bioluminescence methods are ease of automation, compatibility with microtiter plate format, low cost, and flexibility.  相似文献   

7.
8.
Near-infrared (NIR) spectrometry will present a more promising tool for quantitative measurement if the robustness and predictive ability of the partial least square (PLS) model are improved. In order to achieve the purpose, we present a new algorithm for simultaneous wavelength selection and outlier detection; at the same time, the problems of background and noise in multivariate calibration are also solved. The strategy is a combination of continuous wavelet transform (CWT) and modified iterative predictors and objects weighting PLS (mIPOW-PLS). CWT is performed as a pretreatment tool for eliminating background and noise synchronously; then, mIPOW-PLS is proposed to remove both the useless wavelengths and the multiple outliers in CWT domain. After pretreatment with CWT-mIPOW-PLS, a PLS model is built finally for prediction. The results indicate that the combination of CWT and mIPOW-PLS produces robust and parsimonious regression models with very few wavelengths.  相似文献   

9.
10.
11.
A general procedure based on variation of experimental design for checking robustness in the validation of analytical methods is presented. This procedure, which is easy to apply, consists in estimating the main total effects, in detecting outliers, checking the curvature and in determining the main side effects. Two methodologies based on the analysis of a) the residuals from the reduced model and b) the replicates from the reconstructed design were employed for the detection of outliers. In further studies, general experimental design principles were applied using two- and three-level factorial designs. In some cases, a dummy variable was introduced in order not to modify the structure of the designs utilized.  相似文献   

12.
Ortiz MC  Sarabia LA  Herrero A 《Talanta》2006,70(3):499-512
The validation of an analytical procedure means the evaluation of some performance criteria such as accuracy, sensitivity, linear range, capability of detection, selectivity, calibration curve, etc. This implies the use of different statistical methodologies, some of them related with statistical regression techniques, which may be robust or not. The presence of outlier data has a significant effect on the determination of sensitivity, linear range or capability of detection amongst others, when these figures of merit are evaluated with non-robust methodologies.In this paper some of the robust methods used for calibration in analytical chemistry are reviewed: the Huber M-estimator; the Andrews, Tukey and Welsh GM-estimators; the fuzzy estimators; the constrained M-estimators, CM; the least trimmed squares, LTS. The paper also shows that the mathematical properties of the least median squares (LMS) regression can be of great interest in the detection of outlier data in chemical analysis. A comparative analysis is made of the results obtained by applying these regression methods to synthetic and real data. There is also a review of some applications where this robust regression works in a suitable and simple way that proves very useful to secure an objective detection of outliers. The use of a robust regression is recommended in ISO 5725-5.  相似文献   

13.
Outlier detection is a prerequisite to identify the presence of aberrant samples in a given set of data. The identification of such diverse data samples is significant particularly for multivariate data analysis where increasing data dimensionality can easily hinder the data exploration and such outliers often go undetected. This paper is aimed to introduce a novel Mahalanobis distance measure (namely, a pseudo-distance) termed as locally centred Mahalanobis distance, derived by centering the covariance matrix at each data sample rather than at the data centroid as in the classical covariance matrix. Two parameters, called as Remoteness and Isolation degree, were derived from the resulting pairwise distance matrix and their salient features facilitated a better identification of atypical samples isolated from the rest of the data, thus reflecting their potential application towards outlier detection. The Isolation degree demonstrated to be able to detect a new kind of outliers, that is, isolated samples within the data domain, thus resulting in a useful diagnostic tool to evaluate the reliability of predictions obtained by local models (e.g. k-NN models).  相似文献   

14.
A general procedure based on variation of experimental design for checking robustness in the validation of analytical methods is presented. This procedure, which is easy to apply, consists in estimating the main total effects, in detecting outliers, checking the curvature and in determining the main side effects. Two methodologies based on the analysis of a) the residuals from the reduced model and b) the replicates from the reconstructed design were employed for the detection of outliers. In further studies, general experimental design principles were applied using two- and three-level factorial designs. In some cases, a dummy variable was introduced in order not to modify the structure of the designs utilized. Received: 30 October 1998 / Revised: 19 April 1999 / Accepted: 6 May 1999  相似文献   

15.
Robustness tests are usually based on an experimental design approach. As designed experiments generally lead to a large variability among the results, erroneous results are often not readily detected. As a consequence, the ordinary least squares (OLS) estimates of the effects of the robustness test can be biased. Here, two robustness tests are studied, which both contain a suspicious result. Moreover, simulated datasets are considered to examine the influence of the extent of the outlier as well as the influence of multiple outliers. On the one hand, different methods are applied to inspect the results of the experiments for outliers: the half-normal plot of the OLS residuals, the normal probability plot of the effects and a method, which is based on experimental design reconstruction. On the other hand, two robust regression methods are applied to calculate the effects with a minimum influence of possible outliers. The different methods are compared and it is evaluated under which circumstances they can be applied.  相似文献   

16.
17.
Mining patterns of co-expressed genes across the subset of conditions help to narrow down the search space for the analysis of gene expression data. Identifying conditions specific key genes from the large-scale gene expression data is a challenging task. The conditions specific key gene signifies functional behavior of a group of co-expressed genes across the subset of conditions and can be act as biomarkers of the diseases. In this paper, we have propose a novel approach for identification of conditions specific key genes from Basal-Like Breast Cancer (BLBC) disease using biclustering algorithm and Gene Co-expression Network (GCN). The proposed approach is a two-stage approach. In the first stage, significant biclusters have been extracted with the help of ‘runibic’ biclustering algorithm. The second stage identifies conditions specific key genes from the extracted significant biclusters with the help of GCN. By using difference matrix and gene correlation matrix, we have constructed biologically meaningful and statistically strong GCN. Also, presented the proposed approach with the help of a process diagram and demonstrated the procedure with an example of bicluster number 93 (Bic93). From the experimental results, we observed that 95% and 85% of the extracted biclusters are found to be biologically significant at the p-values less than 0.05 and 0.01 respectively. We have compared proposed approach with the Weighted Gene Co-expression Network Analysis (WGCNA) based approach. From the comparison, our approach has performed effectively and extracted biologically significant biclusters. Also, identified conditions specific key genes which cannot be extracted using the WGCNA based approach. Some of the important identified known key genes are PIK3CA, SHC3, ERBB2, SHC4, PTOV1, STAG1, ZNF215 etc. These key genes can be used as a diagnostic and prognostic biomarker for the BLBC disease after the rigorous analysis. The identified conditions specific key genes can be helpful to reduce the analysis time and increase the accuracy of further research such as biomarker identification, drug target discovery etc.  相似文献   

18.
This paper describes the application of plasmonics-based nanoprobes that combine the modulation of the plasmonics effect to change the surface-enhanced Raman scattering (SERS) of a Raman label and the specificity of a DNA hairpin loop sequence to recognize and discriminate a variety of molecular target sequences. Hybridization with target DNA opens the hairpin and physically separates the Raman label from the metal nanoparticle thus reducing the plasmonics effect and quenching the SERS signal of the label. We have successfully demonstrated the specificity and selectivity of the nanoprobes in the detection of a single-nucleotide polymorphism (SNP) in the breast cancer BRCA1 gene in a homogenous solution at room temperature. In addition, the potential application of plasmonics nanoprobes for quantitative DNA diagnostic testing is discussed.  相似文献   

19.
We demonstrate the application of differential pulse voltammetry (DPV) for the electrochemical detection of perchloroethylene (PCE) on an unmodified glassy carbon electrode surface. Detection sensitivity was substantially improved using DPV, in which dechlorination was denoted by a cathodic peak observed at approximately − 0.6 V (vs Ag/AgCl). Peak current intensity was found to correlate linearly with concentration over a tested range of 0 to 10 μM. The utility of this technique was subsequently evaluated for PCE-spiked environmental samples containing either Methylobacterium adhaesivum (1 × 106 cells/mL) or creek water (10% v/v). In all environmental samples, a linear dynamic range was also observed from approximately 0 to 10 μM. The limit of detection was determined to be 0.3 μM in blank buffer, 0.4 μM in bacteria-containing samples and 1.2 μM in creek water samples.  相似文献   

20.
A procedure is described to determine the limit of detection of DSC instruments by using tiny signals from spontaneous polymorphic transitions of CsCl, K2Cr2O7 and Na2SO4. It is shown how such signals can be found well-resolved in DSC diagrams of powder samples. To distinguish them from the baseline noise they should exhibit a height at least twice that of the baseline width. For the instrument employed the corresponding smallest amount of heat, i.e., the limit of detection, was found to be 0.1 mJ.The authors thank Mr. H. Maltry for technical help and the Deutsche Forschungsgemeinschaft for support.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号