首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 125 毫秒
1.
研究发现,通过全基因组关联分析,找出与疾病相关的位点或基因,对于人们防治遗传病,具有重要意义.首先,考虑固定效应(SNP位点)和随机效应(人群中的群体结构和亲缘关系),建立了混合线性模型,并且利用基于FDR标准的BH法对多重检验的P值进行校正,找出最有可能的致病位点.其次,利用Fisher的P值组合方法,将基因所包含的所有SNP位点组合,找出与疾病最可能相关的基因.由于遗传疾病可能与基因所包含的位点的子集关联,我们参考已有的ARTP模型,对模型进行了改进.最后,建立多表型联合模型MultiPhen找出与10个性状有关联的位点.  相似文献   

2.
随着信息技术的进步和发展,现代生物学越来越多地将这些技术用于大规模生物数据的收集、分析、挖掘等过程.大量计算机技术,特别是统计方法被用来进行复杂疾病的分析.大量研究表明,人体的许多表型性状差异以及对药物和疾病的易感性等都可能与某些位点相关联,或和包含有多个位点的基因相关联.因此,定位与性状或疾病相关联的位点在染色体或基因中的位置,能帮助研究人员了解性状和一些疾病的遗传机理,也能使人们对致病位点加以干预,防止一些遗传病的发生.利用随机森林方法、Bootstrap重抽样、logistic回归等大数据分析方法,意在解决优化生物学位点关联性分析中单一致病位点识别、多位点相互作用和多性状位点关联性分析等子问题.  相似文献   

3.
对候选基因的关联检验,都是针对性状在候选基因内使用多个SNP标记,并通过检验SNP单倍型来完成的,众所周知,多标记单倍型方法往往要比单标记方法表达出更多的信息,但是,单倍型的数量往往会随着所标记的SNP的数目增多而急剧的增加,这又会大大增加检验统计量的自由度,使用统计学中的主成分分析法来降低单倍型空间的维数,并构造关联检验来检验一个数量性状与多个单倍型的关联情况,模拟结果显示,此检验方法是较合理的.  相似文献   

4.
为便于进行数据分析,首先将数据中的位点信息由原来字母编码方式转换为数值编码的方式,根据位点的编码信息和患病信息,采用Logistic回归的方法,找出某种疾病最有可能的一个或几个致病位点,同时采用显著性检验进一步对建立的模型进行检验,证明了建立结果的合理性。此外,通过主成分分析,从原有的300个主成分中取出了225个主成分尽可能多地反映原来基因变量的信息,再通过主成分Logistic回归分析找出与疾病最有可能相关的一个或几个基因。最后,采用典型相关分析找出与相关性状有关联的基因位点。  相似文献   

5.
通过对KullbackLeibler距离的研究,计算出基因位点分布之间的KullbackLeibler距离,根据KullbackLeibler距离值找出某种疾病最有可能的一个或几个致病位点的位置信息,实现对致病位点的位置信息的快速识别,为遗传学中发现遗传病或性状的遗传机理方法做出参考.  相似文献   

6.
对候选基因的关联检验,多标记单倍型方法往往要比单标记方法表达出更多的信息,但是单倍型的数量往往会随着所标记的SNP的数目增多而急剧的增加,这又会大大增加检验统计量的自由度,通过使用统计学中的主成分分析法来降低单倍型空间的维数来检验一个数量性状与多个单倍型的关联情况,并与传统的方法做对比,模拟结果显示,此检验方法有较好的第一类错误率及功效.  相似文献   

7.
重组近亲杂交合作小鼠品系(CC-RIX)具有很多优点,特别是在复杂疾病的数量性状位点定位方面,具有较高功效.本文在现有研究基础上,考虑了仅含主基因效应的混合线性定位模型,并通过组LASSO惩罚函数法对问题进行转换,然后采用迭代加权最小二乘法对转换后的问题进行求解.从而克服了设计矩阵容易奇异,计算速度慢等计算上的难题.模拟计算表明,本文所提模型和方法在CC-RIX品系的数量性状位点定位中能快速、准确地识别出数量性状位点,并具有较高的真阳性率,以及较低的假阳性率.  相似文献   

8.
基因的结构预测是目前极为活跃的研究领域,而基因剪切位点的识别则是基因结构预测中的重要环节.本文采用自己提出的平均似然比等方法,同时采用目前较为有效的权重矩阵(WMM)、最大相关分解(MDD)、动态规划等模型,充分提取剪切位点信号区的统计特征,采用统计中判别分析的方法进行判别,得到了较好的判别效果,并与其它方法进行了比较.  相似文献   

9.
基于贝叶斯统计方法的两总体基因表达数据分类   总被引:1,自引:0,他引:1  
在疾病的诊断过程中,对疾病的精确分类是提高诊断准确率和疾病治愈率至 关重要的一个环节,DNA芯片技术的出现使得我们从微观的层次获得与疾病分类及诊断 密切相关的基因功能信息.但是DNA芯片技术得到的基因的表达模式数据具有多变量小 样本特点,使得分类过程极不稳定,因此我们首先筛选出表达模式发生显著性变化的基因 作为特征基因集合以减少变量个数,然后再根据此特征基因集合建立分类器对样本进行分 类.本文运用似然比检验筛选出特征基因,然后基于贝叶斯方法建立了统计分类模型,并 应用马尔科夫链蒙特卡罗(MCMC)抽样方法计算样本归类后验概率.最后我们将此模型 应用到两组真实的DNA芯片数据上,并将样本成功分类.  相似文献   

10.
协整检验是进行回归分析的首要过程,是避免伪回归的主要方法.然而,大多数协整检验技术都是建立在非稳健的普通最小二乘框架下.这对于普遍具有尖峰厚尾的时间序列来说,可能会导致统计检验的失效.为了解决这个困难,本文提出带线性时间趋势模型的分位数回归协整检验方法.不同于传统的静态协整分析,我们构建了一个分位数残差累积和(QCS)统计量来检验不同分位点上变量间的动态协整关系.应用分位数回归和泛函极限理论,推导出了统计量的渐近分布,并提出了修正的QCS统计量,拓展了其在序列相关以及长期内生性模型中的应用.模拟给出了统计量的临界值并证明了本文的协整检验方法具有良好的有限样本性质.最后,利用所提方法,检验了可支配收入与实际消费之间的动态协整关系,发现随着分位点的增大,它们之间的协整关系越强.  相似文献   

11.
支持向量机及其在提高采收率潜力预测中的应用   总被引:3,自引:0,他引:3  
提高采收率潜力分析的基础是进行提高采收率方法的潜力预测 .建立提高采收率潜力预测模型从统计学习的角度来看 ,实质是属于函数逼近问题 .本文首次将统计学习理论及支持向量机方法引入提高采收率方法的潜力预测中 .根据 Vapnik结构风险最小化原则 ,应尽量提高学习机的泛化能力 ,即由有效的训练集样本得到的小的误差能够保证对独立的测试集仍保持小的误差 .在本文所用较少样本条件下 ,支持向量机方法能够兼顾模型的通用性和推广性 ,具有较好的应用前景 .研究中采用的是综合正交设计法、油藏数值模拟和经济评价等方法生成的理论样本集  相似文献   

12.
The intraclass correlation model is well known in the literature of multivariate analysis and it is mainly used in studying familial data. This model is considered in this paper and the interest is focused on the estimation of the intraclass correlation on the basis of familial data from families which are randomly selected from two or more independent populations. The size of the families is considered unequal and the variances of the populations are considered unequal, too. In this statistical framework some preliminary test estimators are presented in a unified way and their asymptotic distribution is obtained. A decision-theoretic approach is developed to compare the estimators by using the asymptotic distributional quadratic risk under the null hypothesis of equality of the intraclass correlations and under contiguous alternative hypotheses, as well. Some interesting relationships are obtained between the estimators considered.  相似文献   

13.
梁爽  刁节文  肖邦 《运筹与管理》2021,30(1):170-176
随着大数据和机器学习的流行,其在破产预测和风险预测领域逐渐崭露头角。本文运用爬虫技术得到885家网贷平台的16815条数据,通过因子分析及模型验证挖掘出了若干能较好评估P2P平台风险的因子。然后本文通过选取的指标体系建立了Logistics回归、支持向量机、BP神经网络、LightGBM等单模型以及融合模型进行学习训练,所建立的融合模型在测试集中得到最高的准确率,说明本文所建的融合模型能较好地评估网贷平台的风险。本文还选取决策树绘图以及对特征进行重要性排名,选取出了对识别问题平台有重要作用的十项特征。这对投资者选取安全平台进行投资或者监管者选取重点平台进行监管有很好的借鉴意义。  相似文献   

14.
This paper deals with sales forecasting of a given commodity in a retail store of large distribution. For many years statistical methods such as ARIMA and Exponential Smoothing have been used to this aim. However the statistical methods could fail if high irregularity of sales are present, as happens for instance in case of promotions, because they are not well suited to model the nonlinear behaviors of the sales process. In recent years new methods based on machine learning are being employed for forecasting applications. A preliminary investigation indicates that methods based on the support vector machine (SVM) are more promising than other machine learning methods for the case considered. The paper assesses the application of SVM to sales forecasting under promotion impacts, compares SVM with other statistical methods, and tackles two real case studies.  相似文献   

15.
The development of credit risk assessment models is often considered within a classification context. Recent studies on the development of classification models have shown that a combination of methods often provides improved classification results compared to a single-method approach. Within this context, this study explores the combination of different classification methods in developing efficient models for credit risk assessment. A variety of methods are considered in the combination, including machine learning approaches and statistical techniques. The results illustrate that combined models can outperform individual models for credit risk analysis. The analysis also covers important issues such as the impact of using different parameters for the combined models, the effect of attribute selection, as well as the effects of combining strong or weak models.  相似文献   

16.
Structure-enforced matrix factorization (SeMF) represents a large class of mathematical models appearing in various forms of principal component analysis, sparse coding, dictionary learning and other machine learning techniques useful in many applications including neuroscience and signal processing. In this paper, we present a unified algorithm framework, based on the classic alternating direction method of multipliers (ADMM), for solving a wide range of SeMF problems whose constraint sets permit low-complexity projections. We propose a strategy to adaptively adjust the penalty parameters which is the key to achieving good performance for ADMM. We conduct extensive numerical experiments to compare the proposed algorithm with a number of state-of-the-art special-purpose algorithms on test problems including dictionary learning for sparse representation and sparse nonnegative matrix factorization. Results show that our unified SeMF algorithm can solve different types of factorization problems as reliably and as efficiently as special-purpose algorithms. In particular, our SeMF algorithm provides the ability to explicitly enforce various combinatorial sparsity patterns that, to our knowledge, has not been considered in existing approaches.  相似文献   

17.
Stability is a major requirement to draw reliable conclusions when interpreting results from supervised statistical learning. In this article, we present a general framework for assessing and comparing the stability of results, which can be used in real-world statistical learning applications as well as in simulation and benchmark studies. We use the framework to show that stability is a property of both the algorithm and the data-generating process. In particular, we demonstrate that unstable algorithms (such as recursive partitioning) can produce stable results when the functional form of the relationship between the predictors and the response matches the algorithm. Typical uses of the framework in practical data analysis would be to compare the stability of results generated by different candidate algorithms for a dataset at hand or to assess the stability of algorithms in a benchmark study. Code to perform the stability analyses is provided in the form of an R package. Supplementary material for this article is available online.  相似文献   

18.
A novel machine learning aided structural reliability analysis for functionally graded frame structures against static loading is proposed. The uncertain system parameters, which include the material properties, dimensions of structural members, applied loads, as well as the degree of gradation of the functionally graded material (FGM), can be incorporated within a unified structural reliability analysis framework. A 3D finite element method (FEM) for static analysis of bar-type engineering structures involving FGM is presented. By extending the traditional support vector regression (SVR) method, a new kernel-based machine learning technique, namely the extended support vector regression (X-SVR), is proposed for modelling the underpinned relationship between the structural behaviours and the uncertain system inputs. The proposed structural reliability analysis inherits the advantages of the traditional sampling method (i.e., Monte-Carlo Simulation) on providing the information regarding the statistical characteristics (i.e., mean, standard deviations, probability density functions and cumulative distribution functions etc.) of any concerned structural outputs, but with significantly reduced computational efforts. Five numerical examples are investigated to illustrate the accuracy, applicability, and computational efficiency of the proposed computational scheme.  相似文献   

19.
In this paper, we propose a general framework for Extreme Learning Machine via free sparse transfer representation, which is referred to as transfer free sparse representation based on extreme learning machine (TFSR-ELM). This framework is suitable for different assumptions related to the divergence measures of the data distributions, such as a maximum mean discrepancy and K-L divergence. We propose an effective sparse regularization for the proposed free transfer representation learning framework, which can decrease the time and space cost. Different solutions to the problems based on the different distribution distance estimation criteria and convergence analysis are given. Comprehensive experiments show that TFSR-based algorithms outperform the existing transfer learning methods and are robust to different sizes of training data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号