首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
In plants, excess light has the potential to damage the photosynthetic apparatus. The damage is caused in part by reactive oxygen species (ROS) generated by electrons leaking from the photosynthetic electron transport system. To investigate the mechanisms equipped in higher plants to reduce high light (HL) stress, we surveyed the response of 7000 Arabidopsis genes to HL, taking advantage of the recently developed microarray technology. Our analysis revealed that 110 genes had a positive response to a 3 h treatment at a light intensity of 150 W m(-2). In addition to the scavenging enzymes of ROS, the genes involved in biosynthesis of lignins and flavonoids are activated by HL and actually resulted in increased accumulation of lignins and anthocyanins. Comparing the HL-responsive genes with drought-inducible genes identified with the same microarray system revealed a dense overlap between HL- and drought-inducible genes. In addition, we have identified 10 genes that showed upregulation by HL, drought, cold and also salt stress. These genes include RD29A, ERD7, ERD10, KIN1, LEA14 and COR15a, most of which are thought to be involved in the protection of cellular components.  相似文献   

3.
Normalization of the data of cDNA microarray is an obligatory step during microarray experiments due to the relatively frequent non-specific errors. Generally, normalization of microarray data is based on the null hypothesis and variance model. In the Yang's model (Yang et al., 2001), at least two types of noises are included. The one is additive noise and the other is multiplicative noise. Usually, background is considered as one of additive noise to the signal and the variation between the signal pixels is the representative multiplicative noise. In this study, the relation between the signal (spot intensity minus background intensity) and background was observed and the influence of background on normalization as a representative additive factor was investigated. Although the relation has not been considered as a factor affecting the normalization, it could improve the accuracy of microarray data when the normalization was carried out considering signal/background ratio. The background dependent normalization decreased the number of genes whose expression levels were changed significantly and it could make their distribution more consistent through the whole range of signal intensities. In this study, printing pin dependent normalization was also carried out regarding the printing pin as a representative multiplicative noise. It improved the distribution of spots in the Cy3-Cy5 scatter plot, but its effect was slight. These studies suggest that there are some influences of the signals on the local backgrounds and they must be considered for the normalization of cDNA microarray data.  相似文献   

4.
We analyze publicly available data on Affymetrix microarray spike-in experiments on the human HGU133 chipset in which sequences are added in solution at known concentrations. The spike-in set contains sequences of bacterial, human, and artificial origin. Our analysis is based on a recently introduced molecular-based model (Carlon, E.; Heim, T. Physica A 2006, 362, 433) that takes into account both probe-target hybridization and target-target partial hybridization in solution. The hybridization free energies are obtained from the nearest-neighbor model with experimentally determined parameters. The molecular-based model suggests a rescaling that should result in a "collapse" of the data at different concentrations into a single universal curve. We indeed find such a collapse, with the same parameters as obtained previously for the older HGU95 chip set. The quality of the collapse varies according to the probe set considered. Artificial sequences, chosen by Affymetrix to be as different as possible from any other human genome sequence, generally show a much better collapse and thus a better agreement with the model than all other sequences. This suggests that the observed deviations from the predicted collapse are related to the choice of probes or have a biological origin rather than being a problem with the proposed model.  相似文献   

5.
6.
Microarrays are used to simultaneously determine the expressions of thousands of genes. An important application of microarrays is in the classification of samples into classes of interest (e.g. either healthy cells or tumour cells). Discriminant partial least squares (DPLS) has often been used for this purpose. In this paper, we describe an improvement to DPLS that uses kernel-based probability density functions and the Bayes rule to classify samples whilst keeping the option of not classifying the sample if this cannot be done with sufficient confidence. With this approach, those samples outside the boundaries of the known classes or from the ambiguity region between classes are rejected and only samples with a high probability of being correctly classified are indeed classified. The optimal model is found by simultaneously minimizing the misclassification and rejection costs. The method (p-DPLS with reject option) was tested with two datasets. For the human cancers dataset the accuracy (obtained by leave-one-out cross-validation) was improved from 97% to 99% when compared to p-DPLS without reject option. For the breast cancer dataset, p-DPLS with reject option was able to reject 100% of the test samples that did not belong to any of the modelled classes. These samples would have been misclassified if the reject option had not been considered.  相似文献   

7.
Gene expression data sets hold the promise to provide cancer diagnosis on the molecular level. However, using all the gene profiles for diagnosis may be suboptimal. Detection of the molecular signatures not only reduces the number of genes needed for discrimination purposes, but may elucidate the roles they play in the biological processes. Therefore, a central part of diagnosis is to detect a small set of tumor biomarkers which can be used for accurate multiclass cancer classification. This task calls for effective multiclass classifiers with built-in biomarker selection mechanism. We propose the sparse optimal scoring (SOS) method for multiclass cancer characterization. SOS is a simple prototype classifier based on linear discriminant analysis, in which predictive biomarkers can be automatically determined together with accurate classification. Thus, SOS differentiates itself from many other commonly used classifiers, where gene preselection must be applied before classification. We obtain satisfactory performance while applying SOS to several public data sets.  相似文献   

8.
A novel approach for CE data analysis based on pattern recognition techniques in the wavelet domain is presented. Low-resolution, denoised electropherograms are obtained by applying several preprocessing algorithms including denoising, baseline correction, and detection of the region of interest in the wavelet domain. The resultant signals are mapped into character sequences using first derivative information and multilevel peak height quantization. Next, a local alignment algorithm is applied on the coded sequences for peak pattern recognition. We also propose 2-D and 3-D representations of the found patterns for fast visual evaluation of the variability of chemical substances concentration in the analyzed samples. The proposed approach is tested on the analysis of intracerebral microdialysate data obtained by CE and LIF detection, achieving a correct detection rate of about 85% with a processing time of less than 0.3 s per 25,000-point electropherogram. Using a local alignment algorithm on low-resolution denoised electropherograms might have a great impact on high-throughput CE since the proposed methodology will substitute automatic fast pattern recognition analysis for slow, human based time-consuming visual pattern recognition methods.  相似文献   

9.
DNA microarray data has been widely used in cancer research due to the significant advantage helped to successfully distinguish between tumor classes. However, typical gene expression data usually presents a high-dimensional imbalanced characteristic, which poses severe challenge for traditional machine learning methods to construct a robust classifier performing well on both the minority and majority classes. As one of the most successful feature weighting techniques, Relief is considered to particularly suit to handle high-dimensional problems. Unfortunately, almost all relief-based methods have not taken the class imbalance distribution into account. This study identifies that existing Relief-based algorithms may underestimate the features with the discernibility ability of minority classes, and ignore the distribution characteristic of minority class samples. As a result, an additional bias towards being classified into the majority classes can be introduced. To this end, a new method, named imRelief, is proposed for efficiently handling high-dimensional imbalanced gene expression data. imRelief can correct the bias towards to the majority classes, and consider the scattered distributional characteristic of minority class samples in the process of estimating feature weights. This way, imRelief has the ability to reward the features which perform well at separating the minority classes from other classes. Experiments on four microarray gene expression data sets demonstrate the effectiveness of imRelief in both feature weighting and feature subset selection applications.  相似文献   

10.
The classification of cancer is a major research topic in bioinformatics. The nature of high dimensionality and small size associated with gene expression data,however,makes the classification quite challenging. Although principal component analysis (PCA) is of particular interest for the high-dimensional data,it may overemphasize some aspects and ignore some other important information contained in the richly complex data,because it displays only the difference in the first twoor three-dimensional PC subsp...  相似文献   

11.
Li-Juan Tang  Hai-Long Wu 《Talanta》2009,79(2):260-1694
One problem with discriminant analysis of microarray data is representation of each sample by a large number of genes that are possibly irrelevant, insignificant or redundant. Methods of variable selection are, therefore, of great significance in microarray data analysis. To circumvent the problem, a new gene mining approach is proposed based on the similarity between probability density functions on each gene for the class of interest with respect to the others. This method allows the ascertainment of significant genes that are informative for discriminating each individual class rather than maximizing the separability of all classes. Then one can select genes containing important information about the particular subtypes of diseases. Based on the mined significant genes for individual classes, a support vector machine with local kernel transform is constructed for the classification of different diseases. The combination of the gene mining approach with support vector machine is demonstrated for cancer classification using two public data sets. The results reveal that significant genes are identified for each cancer, and the classification model shows satisfactory performance in training and prediction for both data sets.  相似文献   

12.
Class prediction based on DNA microarray data has been emerged as one of the most important application of bioinformatics for diagnostics/prognostics. Robust classifiers are needed that use most biologically relevant genes embedded in the data. A consensus approach that combines multiple classifiers has attributes that mitigate this difficulty compared to a single classifier. A new classification method named as consensus analysis of multiple classifiers using non-repetitive variables (CAMCUN) was proposed for the analysis of hyper-dimensional gene expression data. The CAMCUN method combined multiple classifiers, each of which was built from distinct, non-repeated genes that were selected for effectiveness in class differentiation. Thus, the CAMCUN utilized most biologically relevant genes in the final classifier. The CAMCUN algorithm was demonstrated to give consistently more accurate predictions for two well-known datasets for prostate cancer and leukemia. Importantly, the CAMCUN algorithm employed an integrated 10-fold cross-validation and randomization test to assess the degree of confidence of the predictions for unknown samples.  相似文献   

13.
Many microarray experiments involve examining the time elapsed prior to the occurrence of a specific event. One purpose of these studies is to relate the gene expressions to the survival times. The Cox proportional hazards model has been the major tool for analyzing such data. The transformation model provides a viable alternative to the classical Cox's model. We investigate the use of transformation models in microarray survival data in this paper. The transformation model, which can be viewed as a generalization of proportional hazards model and the proportional odds model, is more robust than the proportional hazards model, because it is not susceptible to erroneous results for cases when the assumption of proportional hazards is violated. We analyze a gene expression dataset from Beer et al. [Beer, D.G., Kardia, S.L., Huang, C.C., Giordano, T.J., Levin, A.M., Misek, D.E., Lin, L., Chen, G., Gharib, T.G., Thomas, D.G., Lizyness, M.L., Kuick, R., Hayasaka, S., Taylor, J.M., Iannettoni, M.D., Orringer, M.B., Hanash, S., 2002. Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat. Med. 8 (8), 816-824] and show that the transformation model provides higher prediction precision than the proportional hazards model.  相似文献   

14.
15.
High-throughput DNA microarray provides an effective approach to the monitoring of expression levels of thousands of genes in a sample simultaneously. One promising application of this technology is the molecular diagnostics of cancer, e.g. to distinguish normal tissue from tumor or to classify tumors into different types or subtypes. One problem arising from the use of microarray data is how to analyze the high-dimensional gene expression data, typically with thousands of variables (genes) and much fewer observations (samples). There is a need to develop reliable classification methods to make full use of microarray data and to evaluate accurately the predictive ability and reliability of such derived models. In this paper, discriminant partial least squares was used to classify the different types of human tumors using four microarray datasets and showed good prediction performance. Four different cross-validation procedures (leave-one-out versus leave-half-out; incomplete versus full) were used to evaluate the classification model. Our results indicate that discriminant partial least squares using leave-half-out cross-validation provides a more realistic estimate of the predictive ability of a classification model, which may be overestimated by some of the cross-validation procedures, and the information obtained from different cross-validation procedures can be used to evaluate the reliability of the classification model.  相似文献   

16.
Motivation: Microarrays have allowed the expression level of thousands of genes or proteins to be measured simultaneously. Data sets generated by these arrays consist of a small number of observations (e.g., 20-100 samples) on a very large number of variables (e.g., 10,000 genes or proteins). The observations in these data sets often have other attributes associated with them such as a class label denoting the pathology of the subject. Finding the genes or proteins that are correlated to these attributes is often a difficult task since most of the variables do not contain information about the pathology and as such can mask the identity of the relevant features. We describe a genetic algorithm (GA) that employs both supervised and unsupervised learning to mine gene expression and proteomic data. The pattern recognition GA selects features that increase clustering, while simultaneously searching for features that optimize the separation of the classes in a plot of the two or three largest principal components of the data. Because the largest principal components capture the bulk of the variance in the data, the features chosen by the GA contain information primarily about differences between classes in the data set. The principal component analysis routine embedded in the fitness function of the GA acts as an information filter, significantly reducing the size of the search space since it restricts the search to feature sets whose principal component plots show clustering on the basis of class. The algorithm integrates aspects of artificial intelligence and evolutionary computations to yield a smart one pass procedure for feature selection, clustering, classification, and prediction.  相似文献   

17.
A promising pathway to improve on the sensitivity of protein microarrays is to immobilize the capture antibodies in a three dimensional hydrogel matrix. We describe a simple method based on printing of an aqueous protein solution containing a photosensitive polymer and the capture antibody onto a plastic chip surface. During short UV-exposure photocrosslinking occurs, which leads to formation of a hydrogel, which is simultaneously bound to the substrate surface. In the same reaction the antibody becomes covalently attached to the forming hydrogel. As the capture antibodies are immobilized in the three-dimensional hydrogel microstructures, high fluorescence intensities can be obtained. The chip system is designed such, that non-specific protein adsorption is strongly prevented. Thus, the background fluorescence is strongly reduced and very high signal-to-background ratios are obtained (SBR > 6 for cBSA = 1 pM; SBR > 100 for cBSA > 100 pM). The kinetics of antigen binding to the arrayed antibodies can be used to determine the concentration of a specific protein (for example the tumor marker β2-microglobulin) in solution for a broad range of analyte concentrations. By varying size and composition of the protein-filled hydrogel microstructures as well as adjusting the extent of labeling it is possible to easily adapt the surface concentration of the probe molecules such that the fluorescence signal intensity is tuned to the prevalence of the protein in the analyte. As a consequence, the signal tuning allows to analyze solutions, which contain both proteins with high (here: upper mg mL−1 range) and with very low concentrations (here: lower μg mL−1 range). This way quantitative analysis with an exceptionally large dynamic range can be performed.  相似文献   

18.
Cell-based biosensors utilize functional changes in cellular response to identify the biological threats in a physiological relevant manner. Cell-based sensors have been used for a wide array of applications including toxicological assessment and drug-screening. In this paper, we utilize DNA arrays to identify differential gene expression events induced by toxin exposure for the purpose of developing a reporter gene assay system compatible with insertion into a cell-based sensor platform. HT29, an intestine epithelial cell line, was used as a cell model to study the cholera toxin (CT)-induced host cell modulation using DNA array analysis. A false positive model was generated from analysis of housekeeping genes in untreated control experiments to characterize our system and to minimize the number of false positives in the data. Threshold probability scores (−3.72), which gives <0.02% false positives for up/down regulation from the false positive model, were used to identify 73 and 25 known genes/expression tag sequences (ESTs) that were up- and down-regulated, respectively, in cells exposed 23 nM of CT. Using quantitative multiplex PCR assay, the gene expression levels for several genes shown to be modulated according to the microarray experiments, such as apolipoprotein D (Apol D), E-cadherin, and cyclin A2, were confirmed. The differential expression of genes encoding cytochrome P450, glutathione transferase (GST), and MGAT2 were noteworthy and consistent with previous studies. Our study provides an approach to analyze cDNA microarray data with defined false positive rates. The utility of cDNA microarray information for the design of cell-based sensor using a reporter gene approach is discussed.  相似文献   

19.
A modified SIMPLe-to-use Interactive Self-modeling Mixture Analysis (SIMPLISMA) algorithm, referred to as real-time (RT) SIMPLISMA has been combined with two-dimensional (2D) wavelet compression (WC2). This tool was evaluated with datasets of drugs and bacteria that were acquired from two different ion mobility spectrometers and published reference data that comprised Raman, FTIR microscopy, near-infrared (NIR) and mass spectral data. RTSIMPLISMA is amenable for real-time modeling and is able to determine the number of components automatically. The 2D wavelet compression, which compresses both acquisition and drift time dimensions of measurement, was applied to the datasets prior to RTSIMPLISMA modeling. RTSIMPLISMA models obtained from the compressed data were wavelet transformed back to the uncompressed representation. The effects of wavelet filter types and compression levels were investigated. The relative root-mean-square errors (RRMSE) of reconstruction, which calculate the relative difference between the extracted models with and without 2D compressions, were used to evaluate the effects of compression on self-modeling. The results showed that satisfactory models could be obtained when a data was compressed to 1/256 of its size.  相似文献   

20.
A DNA microarray can track the expression levels of thousands of genes simultaneously. Previous research has demonstrated that this technology can be useful in the classification of cancers. Cancer microarray data normally contains a small number of samples which have a large number of gene expression levels as features. To select relevant genes involved in different types of cancer remains a challenge. In order to extract useful gene information from cancer microarray data and reduce dimensionality, feature selection algorithms were systematically investigated in this study. Using a correlation-based feature selector combined with machine learning algorithms such as decision trees, nave Bayes and support vector machines, we show that classification performance at least as good as published results can be obtained on acute leukemia and diffuse large B-cell lymphoma microarray data sets. We also demonstrate that a combined use of different classification and feature selection approaches makes it possible to select relevant genes with high confidence. This is also the first paper which discusses both computational and biological evidence for the involvement of zyxin in leukaemogenesis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号