首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We briefly overview the problem of learning probabilities from data using imprecise probability models that express very weak prior beliefs. Then we comment on the new contributions to this question given in the paper by Masegosa and Moral and provide some insights about the performance of their models in data mining experiments of classification.  相似文献   

2.
高维大数据的相似性计算是数据挖掘领域的研究重点,论文通过分析高维大数据相似性计算的难点,提出采用可拓学的方法解决其中矛盾问题的研究思路。在基元表示高维大数据的基础上,借助数据转换、数据筛选、权重的确定、数据预处理等技术实现了数据之间的相似性计算,并基于水污染常规分析数据进行了算法验证。论文借助可拓的思想研究大数据相似性的问题,不仅对数据挖掘的研究有一定的理论促进,同时也为可拓学的研究提供了新的应用空间。  相似文献   

3.
With the broad development of the World Wide Web, various kinds of heterogeneous data (including multimedia data) are now available to decision support tasks. A data warehousing approach is often adopted to prepare data for relevant analysis. Data integration and dimensional modeling indeed allow the creation of appropriate analysis contexts. However, the existing data warehousing tools are well-suited to classical, numerical data. They cannot handle complex data. In our approach, we adapt the three main phases of the data warehousing process to complex data. In this paper, we particularly focus on two main steps in complex data warehousing. The first step is data integration. We define a generic UML model that helps representing a wide range of complex data, including their possible semantic properties. Complex data are then stored in XML documents generated by a piece of software we designed. The second important phase we address is the preparation of data for dimensional modeling. We propose an approach that exploits data mining techniques to assist users in building relevant dimensional models.  相似文献   

4.
粗糙集理论作为一种智能数据分析和数据挖掘的新的数学工具,其主要优点在于它不需要任何关于被处理数据的先验或额外知识.提出了一种基于粗糙集理论的智能数据分析模型,从目标数据集出发,通过数据预处理、数据分类和规则获取,实现对原始数据集的智能分析,并通过实例测试验证了该模型的有效性.  相似文献   

5.
赵蕾  程国胜 《大学数学》2008,24(2):100-103
数据分析在计算机数据处理中占有重要地位.概念格理论是数据分析有力工具,本文以概念格为工具,讨论数据扩展而引起的相容性问题.主要是将数据作为概念格中的对象,在给定数据基本集并假设数据特征一定的条件下,考虑数据扩展相容性问题,解决了数据扩展的相容性判定问题并给出了相应的判定定理.目的是使在特征一定的情况下,数据对象达到最大化.  相似文献   

6.
Temporal data are information measured in the context of time. This contextual structure provides components that need to be explored to understand the data and that can form the basis of interactions applied to the plots. In multivariate time series, we expect to see temporal dependence, long term and seasonal trends, and cross-correlations. In longitudinal data, we also expect within and between subject dependence. Time series and longitudinal data, although analyzed differently, are often plotted using similar displays. We provide a taxonomy of interactions on plots that can enable exploring temporal components of these data types, and describe how to build these interactions using data transformations. Because temporal data are often accompanied other types of data we also describe how to link the temporal plots with other displays of data. The ideas are conceptualized into a data pipeline for temporal data and implemented into the R package cranvas. This package provides many different types of interactive graphics that can be used together to explore data or diagnose a model fit.  相似文献   

7.
高质量的决策越来越依赖于高质量的数据挖掘及其分析,高质量的数据挖掘离不开高质量的数据.在大型仪器利用情况调查中,由于主客观因素,总是致使有些数据出现异常,影响数据的质量.这就需要通过适用的方法对异常数据进行检测处理.不同类型数据往往需要不同的异常值检测方法.分析了大型仪器利用情况调查数据的总体特点、一般方法,并以国家科技部平台中心主持的"我国大型仪器资源现状调查"(2009)中大型仪器使用机时和共享机时数据为主线,比较研究了回归方法、基于深度的方法和箱线图方法等对不同类型数据异常值检测的适用性.选取不同角度,检验并采用不同的适用方法,找出相关的可疑异常值,有助于下一步有效开展大型仪器利用情况异常数据的分析处理,提高数据质量,为大型仪器利用情况综合评价奠定基础,也为科技资源调查数据预处理中异常值检测方法提供有益借鉴.  相似文献   

8.
Abstract

Statistical software systems include modules for manipulating data sets, model fitting, and graphics. Because plots display data, and models are fit to data, both the model-fitting and graphics modules depend on the data. Today's statistical environments allow the analyst to choose or even build a suitable data structure for storing the data and to implement new kinds of plots. The multiplicity problem caused by many plot varieties and many data representations is avoided by constructing a plot-data interface. The interface is a convention by which plots communicate with data sets, allowing plots to be independent of the actual data representation. This article describes the components of such a plot-data interface. The same strategy may be used to deal with the dependence of model-fitting procedures on data.  相似文献   

9.
微生物组学大数据在生态环境、人类健康和疾病研究方面都起到了重要作用。通过数学、统计等数据挖掘方法,从高维复杂数据中提取有用信息,是微生物组学大数据建模和分析的关键问题。本文分析了微生物组学大数据的特点,对当前数据分析和计算研究中存在的热点和难点进行了探讨分析,并综述了当前微生物组学大数据模式挖掘、网络重建与分析的研究概况。  相似文献   

10.
微生物组学大数据在生态环境、人类健康和疾病研究方面都起到了重要作用。通过数学、统计等数据挖掘方法,从高维复杂数据中提取有用信息,是微生物组学大数据建模和分析的关键问题。本文分析了微生物组学大数据的特点,对当前数据分析和计算研究中存在的热点和难点进行了探讨分析,并综述了当前微生物组学大数据模式挖掘、网络重建与分析的研究概况。  相似文献   

11.
SVM解决两分类问题时,在大规模数据上训练速度很慢,利用数据提取的方法可以减少训练样本数目,加快训练速度。本文利用马氏距离和"aσ-方法"提出新的数据提取方法,根据样本点到训练集的马氏距离来确定样本点与样本集的位置关系,只提取对于建立超平面有作用的样本点,避免了以往数据提取方法的随机性;并考虑提取的数据占原来总样本集数目的比例,通过调整a的值,控制数据提取的数量,避免提取后训练样本集的数据太多或太少,从而加快SVM的训练速度。  相似文献   

12.
We consider the implications of streaming data for data analysis and data mining. Streaming data are becoming widely available from a variety of sources. In our case we consider the implications arising from Internet traffic data. By implication, streaming data are unlikely to be time homogeneous so that standard statistical and data mining procedures do not necessarily apply. Because it is essentially impossible to store streaming data, we consider recursive algorithms, algorithms which are adaptive and discount the past and also algorithms that create finite pseudo-samples. We also suggest some evolutionary graphics procedures that are suitable for streaming data. We begin our discussion with a discussion of Internet traffic in order to give the reader some sense of the time and data scale and visual resolution needed for such problems.  相似文献   

13.
数据驱动的决策支持系统概念及内涵   总被引:1,自引:0,他引:1  
从数据的观点出发,讨论了数据驱动的决策支持系统的概念及其内涵,对数据仓库、联机分析处理和数据挖掘等手段也进行了一定程度的讨论。另外,还对DSS数据和日常操作数据进行了分析,并给出了数据驱动的决策支持系统的基本结构。  相似文献   

14.
基于神经网络的期货预测数据预处理问题研究   总被引:1,自引:0,他引:1  
期货预测研究在期货价格数据预处理和预测方法上存在直接套用原始数据代入模型以及价格预测模型和原始数据模型不相匹配等问题,需要予以解决.本研究在采用通货膨胀率指数调整、平均周期项以及滤波等方法对铜期货价格时间序列数据进行预处理后,分别将预处理前后的期货价格数据输入到神经网络预测模型,通过比较两者预测结果来验证原始期货时间序列数据预处理的必要性.  相似文献   

15.
模糊关系是模糊粗糙分析的基础。从属性数据生成模糊关系是模糊粗糙集实际应用中的重要问题。针对模糊属性刻画,给出了生成几种T相似性关系的方法。首先,对于每一个属性,分别生成一个T相似性关系。而后,通过聚合算子来合成这些T相似性关系,以得到一个综合的T相似性关系。  相似文献   

16.
For deblurring images corrupted by random valued noise, two-phase methods first select likely-to-be reliables (data that are not corrupted by random valued noise) and then deblur images only with selected data. Two-phase methods, however, often cause defective data artifacts, which are mixed results of missing data artifacts caused by the lack of data and noisy data artifacts caused mainly by falsely selected outliers (data that are corrupted by random valued noise). In this paper, to suppress these defective data artifacts, we propose a blurring model based reliable-selection technique to select reliables as many as possible to make all of to-be-recovered pixel values to contribute to selected data, while excluding outliers as accurately as possible. We also propose a normalization technique to compensate for non-uniform rates in recovering pixel values. We conducted simulation studies on Gaussian and diagonal deblurring to evaluate the performance of proposed techniques. Simulation results showed that proposed techniques improved the performance of two-phase methods, by suppressing defective data artifacts effectively.  相似文献   

17.
云计算和大数据已成为IT领域的研究热点,如何将云计算在数据存储和数据处理方面的优势应用于大数据领域具有重要的实际应用价值.开源的云平台OpenStack可方便地从硬件管理方面构建私有云,其存储模块Swift能够支持PB级的大数据存储.开源的云平台Hadoop在数据处理方面具有很强的优势,但在支持超大数据存储方面存在不足.通过对OpenStack中的存储模块Swift和Hadoop中的文件处理模块HDFS的比较分析,提出了将Swift和Hadoop的MapReduce技术结合来构建企业处理大数据的私有云计算系统方案.分析结果显示该方案是可行的,这种异构的私有云系统可以整合不同云计算平台各自的优势进行高效的大数据处理.  相似文献   

18.
In this paper, a multi-space data association algorithm based on the wavelet transform is proposed. In addition to carrying out the traditional hard logic data association in measurement space, the new algorithm updates the state of the target in the pattern space. Such a function significantly reduces the complicated environment misassociation effects on the data association. Simulation results show that the performance of the multi-spaced data association is much better than the existing data association algorithms in complicated clutter environments, such as the nearest-neighbor standard filter (NNSF), the probabilistic data association (PDA) and the joint probabilistic data association (JPDA). The computation of the multiple-space data association is much less than the aforementioned other existing data associations, and this new data association does not need any priori information of the environment. In complicated clutter environments, compared with the other data association, the new data association proposed in this paper is very robust, reliable and stable.  相似文献   

19.
Diverse reduct subspaces based co-training for partially labeled data   总被引:1,自引:0,他引:1  
Rough set theory is an effective supervised learning model for labeled data. However, it is often the case that practical problems involve both labeled and unlabeled data, which is outside the realm of traditional rough set theory. In this paper, the problem of attribute reduction for partially labeled data is first studied. With a new definition of discernibility matrix, a Markov blanket based heuristic algorithm is put forward to compute the optimal reduct of partially labeled data. A novel rough co-training model is then proposed, which could capitalize on the unlabeled data to improve the performance of rough classifier learned only from few labeled data. The model employs two diverse reducts of partially labeled data to train its base classifiers on the labeled data, and then makes the base classifiers learn from each other on the unlabeled data iteratively. The classifiers constructed in different reduct subspaces could benefit from their diversity on the unlabeled data and significantly improve the performance of the rough co-training model. Finally, the rough co-training model is theoretically analyzed, and the upper bound on its performance improvement is given. The experimental results show that the proposed model outperforms other representative models in terms of accuracy and even compares favorably with rough classifier trained on all training data labeled.  相似文献   

20.
With the advance of computer storage capacity and online observation technique, more and more data are collected with curves and images. The most two important feature of curve and image data are high-dimension and high correlation between adjacent data. Functional data analysis has more advantage in deal with these data, which can not be treated by traditional multivariate statistics methods. Recently, a variety of functional data methods have been developed, including curve alignment, principal component analysis, regression, classification and clustering. In this paper, we mainly introduce the origins,development and recent process of functional data. Specifically, we firstly introduce the notion of functional data. Secondly, functional principal component analysis has been presented. Then, this paper is devoted to introduce estimation, variable selection and hypothesis testing of functional regression models. Lastly, the paper concludes with a brief discussion of future directions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号