首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
Generalized canonical correlation analysis is a versatile technique that allows the joint analysis of several sets of data matrices. The generalized canonical correlation analysis solution can be obtained through an eigenequation and distributional assumptions are not required. When dealing with multiple set data, the situation frequently occurs that some values are missing. In this paper, two new methods for dealing with missing values in generalized canonical correlation analysis are introduced. The first approach, which does not require iterations, is a generalization of the Test Equating method available for principal component analysis. In the second approach, missing values are imputed in such a way that the generalized canonical correlation analysis objective function does not increase in subsequent steps. Convergence is achieved when the value of the objective function remains constant. By means of a simulation study, we assess the performance of the new methods. We compare the results with those of two available methods; the missing-data passive method, introduced in Gifi’s homogeneity analysis framework, and the GENCOM algorithm developed by Green and Carroll. An application using world bank data is used to illustrate the proposed methods.  相似文献   

2.
An approach to dealing with missing data, both during the design and normal operation of a neuro-fuzzy classifier is presented in this paper. Missing values are processed within a general fuzzy min–max neural network architecture utilising hyperbox fuzzy sets as input data cluster prototypes. An emphasis is put on ways of quantifying the uncertainty which missing data might have caused. This takes a form of classification procedure whose primary objective is the reduction of a number of viable alternatives rather than attempting to produce one winning class without supporting evidence. If required, the ways of selecting the most probable class among the viable alternatives found during the primary classification step, which are based on utilising the data frequency information, are also proposed. The reliability of the classification and the completeness of information is communicated by producing upper and lower classification membership values similar in essence to plausibility and belief measures to be found in the theory of evidence or possibility and necessity values to be found in the fuzzy sets theory. Similarities and differences between the proposed method and various fuzzy, neuro-fuzzy and probabilistic algorithms are also discussed. A number of simulation results for well-known data sets are provided in order to illustrate the properties and performance of the proposed approach.  相似文献   

3.
Exploring incomplete data using visualization techniques   总被引:1,自引:0,他引:1  
Visualization of incomplete data allows to simultaneously explore the data and the structure of missing values. This is helpful for learning about the distribution of the incomplete information in the data, and to identify possible structures of the missing values and their relation to the available information. The main goal of this contribution is to stress the importance of exploring missing values using visualization methods and to present a collection of such visualization techniques for incomplete data, all of which are implemented in the ${{\sf R}}$ package VIM. Providing such functionality for this widely used statistical environment, visualization of missing values, imputation and data analysis can all be done from within ${{\sf R}}$ without the need of additional software.  相似文献   

4.
The available methods to handle missing values in principal component analysis only provide point estimates of the parameters (axes and components) and estimates of the missing values. To take into account the variability due to missing values a multiple imputation method is proposed. First a method to generate multiple imputed data sets from a principal component analysis model is defined. Then, two ways to visualize the uncertainty due to missing values onto the principal component analysis results are described. The first one consists in projecting the imputed data sets onto a reference configuration as supplementary elements to assess the stability of the individuals (respectively of the variables). The second one consists in performing a principal component analysis on each imputed data set and fitting each obtained configuration onto the reference one with Procrustes rotation. The latter strategy allows to assess the variability of the principal component analysis parameters induced by the missing values. The methodology is then evaluated from a real data set.  相似文献   

5.
Cluster analysis, the determination of natural subgroups in a data set, is an important statistical methodology that is used in many contexts. A major problem with hierarchical clustering methods used today is the tendency for classification errors to occur when the empirical data departs from the ideal conditions of compact isolated clusters. Many empirical data sets have structural imperfections that confound the identification of clusters. We use a Self Organizing Map (SOM) neural network clustering methodology and demonstrate that it is superior to the hierarchical clustering methods. The performance of the neural network and seven hierarchical clustering methods is tested on 252 data sets with various levels of imperfections that include data dispersion, outliers, irrelevant variables, and nonuniform cluster densities. The superior accuracy and robustness of the neural network can improve the effectiveness of decisions and research based on clustering messy empirical data.  相似文献   

6.
针对现实生活中大量数据存在偏斜的情况,构建偏正态数据下的众数回归模型.又加之数据的缺失常有发生,采用插补方法处理缺失数据集,为比较插补效果,考虑对响应变量随机缺失情形进行统计推断研究.利用高斯牛顿迭代法给出众数回归模型参数的极大似然估计,比较该模型在均值插补,回归插补,众数插补三种插补条件下的插补效果.随机模拟和实例分...  相似文献   

7.
A GMDH type-neural network was used to predict liquid phase equilibrium data for the (water + ethanol + trans-decalin) ternary system in the temperature range of 300.2–315.2 K. In order to accomplish modeling, the experimental data were divided into train and test sections. The data set was divided into two parts: 70% were used as data for “training” and 30% were used as a test set. The predicted values were compared with those of experimental values in order to evaluate the performance of the GMDH neural network method. The results obtained by using GMDH type neural network are in excellent agreement with the experimental results.  相似文献   

8.
The 2004 Basel II Accord has pointed out the benefits of credit risk management through internal models using internal data to estimate risk components: probability of default (PD), loss given default, exposure at default and maturity. Internal data are the primary data source for PD estimates; banks are permitted to use statistical default prediction models to estimate the borrowers’ PD, subject to some requirements concerning accuracy, completeness and appropriateness of data. However, in practice, internal records are usually incomplete or do not contain adequate history to estimate the PD. Current missing data are critical with regard to low default portfolios, characterised by inadequate default records, making it difficult to design statistically significant prediction models. Several methods might be used to deal with missing data such as list-wise deletion, application-specific list-wise deletion, substitution techniques or imputation models (simple and multiple variants). List-wise deletion is an easy-to-use method widely applied by social scientists, but it loses substantial data and reduces the diversity of information resulting in a bias in the model's parameters, results and inferences. The choice of the best method to solve the missing data problem largely depends on the nature of missing values (MCAR, MAR and MNAR processes) but there is a lack of empirical analysis about their effect on credit risk that limits the validity of resulting models. In this paper, we analyse the nature and effects of missing data in credit risk modelling (MCAR, MAR and NMAR processes) and take into account current scarce data set on consumer borrowers, which include different percents and distributions of missing data. The findings are used to analyse the performance of several methods for dealing with missing data such as likewise deletion, simple imputation methods, MLE models and advanced multiple imputation (MI) alternatives based on MarkovChain-MonteCarlo and re-sampling methods. Results are evaluated and discussed between models in terms of robustness, accuracy and complexity. In particular, MI models are found to provide very valuable solutions with regard to credit risk missing data.  相似文献   

9.
The paper describes a methodology used for selecting the most relevant clinical features and for generating decision rules based on selected attributes from a medical data set with missing values. These rules will help emergency room (ER) medical personnel in triage (initial assessment) of children with abdominal pain. Presented approach is based on rough set theory extended with the ability of handling missing values and with the fuzzy measures allowing estimation of a value of information brought by particular attributes. The proposed methodology was applied for analyzing the data set containing records of patients with abdominal pain, collected in the emergency room of the cooperating hospital. Generated rules will be embedded into a computer decision support system that will be used in the emergency room. The system based on results of presented approach should allow improving of triage accuracy by the emergency room staff, and reducing management costs.  相似文献   

10.
11.
A novel interval set approach is proposed in this paper to induce classification rules from incomplete information table, in which an interval-set-based model to represent the uncertain concepts is presented. The extensions of the concepts in incomplete information table are represented by interval sets, which regulate the upper and lower bounds of the uncertain concepts. Interval set operations are discussed, and the connectives of concepts are represented by the operations on interval sets. Certain inclusion, possible inclusion, and weak inclusion relations between interval sets are presented, which are introduced to induce strong rules and weak rules from incomplete information table. The related properties of the inclusion relations are proved. It is concluded that the strong rules are always true whatever the missing values may be, while the weak rules may be true when missing values are replaced by some certain known values. Moreover, a confidence function is defined to evaluate the weak rule. The proposed approach presents a new view on rule induction from incomplete data based on interval set.  相似文献   

12.
A. Celler  J. Qranfal  M.R. Trummer 《PAMM》2007,7(1):2020111-2020112
This paper presents an algorithm for reconstructing a dynamic SPECT image. Projections of a time-dependent activity are acquired in discrete time frames. This leads to a highly underdetermined set of equations for the images corresponding to each time frame. We reconstruct these images using a Kalman filter algorithm which helps is processing the time-varying information, To obtain meaningful results a positivity constraint must be enforced. Due to the ill-posedness of the problem, regularization is required. We compared Tikhonov and total bounded variation regularization schemes, and found the latter to be more effective, producing superior results. (© 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

13.
微生物组学大数据在生态环境、人类健康和疾病研究方面都起到了重要作用。通过数学、统计等数据挖掘方法,从高维复杂数据中提取有用信息,是微生物组学大数据建模和分析的关键问题。本文分析了微生物组学大数据的特点,对当前数据分析和计算研究中存在的热点和难点进行了探讨分析,并综述了当前微生物组学大数据模式挖掘、网络重建与分析的研究概况。  相似文献   

14.
微生物组学大数据在生态环境、人类健康和疾病研究方面都起到了重要作用。通过数学、统计等数据挖掘方法,从高维复杂数据中提取有用信息,是微生物组学大数据建模和分析的关键问题。本文分析了微生物组学大数据的特点,对当前数据分析和计算研究中存在的热点和难点进行了探讨分析,并综述了当前微生物组学大数据模式挖掘、网络重建与分析的研究概况。  相似文献   

15.
A hierarchical model is developed for the joint mortality analysis of pension scheme datasets. The proposed model allows for a rigorous statistical treatment of missing data. While our approach works for any missing data pattern, we are particularly interested in a scenario where some covariates are observed for members of one pension scheme but not the other. Therefore, our approach allows for the joint modelling of datasets which contain different information about individual lives. The proposed model generalizes the specification of parametric models when accounting for covariates. We consider parameter uncertainty using Bayesian techniques. Model parametrization is analysed in order to obtain an efficient MCMC sampler, and address model selection. The inferential framework described here accommodates any missing-data pattern, and turns out to be useful to analyse statistical relationships among covariates. Finally, we assess the financial impact of using the covariates, and of the optimal use of the whole available sample when combining data from different mortality experiences.  相似文献   

16.
This paper investigates the use of neural network combining methods to improve time series forecasting performance of the traditional single keep-the-best (KTB) model. The ensemble methods are applied to the difficult problem of exchange rate forecasting. Two general approaches to combining neural networks are proposed and examined in predicting the exchange rate between the British pound and US dollar. Specifically, we propose to use systematic and serial partitioning methods to build neural network ensembles for time series forecasting. It is found that the basic ensemble approach created with non-varying network architectures trained using different initial random weights is not effective in improving the accuracy of prediction while ensemble models consisting of different neural network structures can consistently outperform predictions of the single ‘best’ network. Results also show that neural ensembles based on different partitions of the data are more effective than those developed with the full training data in out-of-sample forecasting. Moreover, reducing correlation among forecasts made by the ensemble members by utilizing data partitioning techniques is the key to success for the neural ensemble models. Although our ensemble methods show considerable advantages over the traditional KTB approach, they do not have significant improvement compared to the widely used random walk model in exchange rate forecasting.  相似文献   

17.
We present methods for predicting the solution of time‐dependent partial differential equations when that solution is so complex that it cannot be properly resolved numerically, but when prior statistical information can be found. The sparse numerical data are viewed as constraints on the solution, and the gist of our proposal is a set of methods for advancing the constraints in time so that regression methods can be used to reconstruct the mean future. For linear equations we offer general recipes for advancing the constraints; the methods are generalized to certain classes of nonlinear problems, and the conditions under which strongly nonlinear problems and partial statistical information can be handled are briefly discussed. Our methods are related to certain data acquisition schemes in oceanography and meteorology. © John Wiley & Sons, Inc.  相似文献   

18.
Supplier selection and evaluation is a complicated and disputed issue in supply chain network management, by virtue of the variety of intellectual property of the suppliers, the several variables involved in supply demand relationship, the complex interactions and the inadequate information of suppliers. The recent literature confirms that neural networks achieve better performance than conventional methods in this area. Hence, in this paper, an effective artificial intelligence (AI) approach is presented to improve the decision making for a supply chain which is successfully utilized for long-term prediction of the performance data in cosmetics industry. A computationally efficient model known as locally linear neuro-fuzzy (LLNF) is introduced to predict the performance rating of suppliers. The proposed model is trained by a locally linear model tree (LOLIMOT) learning algorithm. To demonstrate the performance of the proposed model, three intelligent techniques, multi-layer perceptron (MLP) neural network, radial basis function (RBF) neural network and least square-support vector machine (LS-SVM) are considered. Their results are compared by using an available dataset in cosmetics industry. The computational results show that the presented model performs better than three foregoing techniques.  相似文献   

19.
An expert system was desired for a group decision-making process. A highly variable data set from previous groups' decisions was available to simulate past group decisions. This data set has much missing information and contains many possible errors. Classification and regression trees (CART) was selected for rule induction, and compared with multiple linear regression and discriminant analysis. We conclude that CART's decision rules can be used for rule induction. CART uses all available information and can predict observations with missing data. Errors in results from CART compare well with those from multiple linear regression and discriminant analysis. CART results are easier to understand.  相似文献   

20.
Statistical methods of discrimination and classification are used for the prediction of protein structure from amino acid sequence data. This provides information for the establishment of new paradigms of carcinogenesis modeling on the basis of gene expression. Feed forward neural networks and standard statistical classification procedures are used to classify proteins into fold classes. Logistic regression, additive models, and projection pursuit regression from the family of methods based on a posterior probabilities; linear, quadratic, and a flexible discriminant analysis from the class of methods based on class conditional probabilities, and the nearest-neighbors classification rule are applied to a data set of 268 sequences. From analyzing the prediction error obtained with a test sample (n = 125) and with a cross validation procedure, we conclude that the standard linear discriminant analysis and nearest-neighbor methods are at the same time statistically feasible and potent competitors to the more flexible tools of feed forward neural networks. Further research is needed to explore the gain obtainable from statistical methods by the application to larger sets of protein sequence data and to compare the results with those from biophysical approaches.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号