首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Differential replication is a method to adapt existing machine learning solutions to the demands of highly regulated environments by reusing knowledge from one generation to the next. Copying is a technique that allows differential replication by projecting a given classifier onto a new hypothesis space, in circumstances where access to both the original solution and its training data is limited. The resulting model replicates the original decision behavior while displaying new features and characteristics. In this paper, we apply this approach to a use case in the context of credit scoring. We use a private residential mortgage default dataset. We show that differential replication through copying can be exploited to adapt a given solution to the changing demands of a constrained environment such as that of the financial market. In particular, we show how copying can be used to replicate the decision behavior not only of a model, but also of a full pipeline. As a result, we can ensure the decomposability of the attributes used to provide explanations for credit scoring models and reduce the time-to-market delivery of these solutions.  相似文献   

2.
Background: Electronic fetal monitoring (EFM) is the universal method for the surveillance of fetal well-being in intrapartum. Our objective was to predict acidemia from fetal heart signal features using machine learning algorithms. Methods: A case–control 1:2 study was carried out compromising 378 infants, born in the Miguel Servet University Hospital, Spain. Neonatal acidemia was defined as pH < 7.10. Using EFM recording logistic regression, random forest and neural networks models were built to predict acidemia. Validation of models was performed by means of discrimination, calibration, and clinical utility. Results: Best performance was attained using a random forest model built with 100 trees. The discrimination ability was good, with an area under the Receiver Operating Characteristic curve (AUC) of 0.865. The calibration showed a slight overestimation of acidemia occurrence for probabilities above 0.4. The clinical utility showed that for 33% cutoff point, missing 5% of acidotic cases, 46% of unnecessary cesarean sections could be prevented. Logistic regression and neural networks showed similar discrimination ability but with worse calibration and clinical utility. Conclusions: The combination of the variables extracted from EFM recording provided a predictive model of acidemia that showed good accuracy and provides a practical tool to prevent unnecessary cesarean sections.  相似文献   

3.
Nonalcoholic fatty liver disease (NAFLD) is the hepatic manifestation of metabolic syndrome and is the most common cause of chronic liver disease in developed countries. Certain conditions, including mild inflammation biomarkers, dyslipidemia, and insulin resistance, can trigger a progression to nonalcoholic steatohepatitis (NASH), a condition characterized by inflammation and liver cell damage. We demonstrate the usefulness of machine learning with a case study to analyze the most important features in random forest (RF) models for predicting patients at risk of developing NASH. We collected data from patients who attended the Cardiovascular Risk Unit of Mostoles University Hospital (Madrid, Spain) from 2005 to 2021. We reviewed electronic health records to assess the presence of NASH, which was used as the outcome. We chose RF as the algorithm to develop six models using different pre-processing strategies. The performance metrics was evaluated to choose an optimized model. Finally, several interpretability techniques, such as feature importance, contribution of each feature to predictions, and partial dependence plots, were used to understand and explain the model to help obtain a better understanding of machine learning-based predictions. In total, 1525 patients met the inclusion criteria. The mean age was 57.3 years, and 507 patients had NASH (prevalence of 33.2%). Filter methods (the chi-square and Mann–Whitney–Wilcoxon tests) did not produce additional insight in terms of interactions, contributions, or relationships among variables and their outcomes. The random forest model correctly classified patients with NASH to an accuracy of 0.87 in the best model and to 0.79 in the worst one. Four features were the most relevant: insulin resistance, ferritin, serum levels of insulin, and triglycerides. The contribution of each feature was assessed via partial dependence plots. Random forest-based modeling demonstrated that machine learning can be used to improve interpretability, produce understanding of the modeled behavior, and demonstrate how far certain features can contribute to predictions.  相似文献   

4.
The adaptation of deep learning models within safety-critical systems cannot rely only on good prediction performance but needs to provide interpretable and robust explanations for their decisions. When modeling complex sequences, attention mechanisms are regarded as the established approach to support deep neural networks with intrinsic interpretability. This paper focuses on the emerging trend of specifically designing diagnostic datasets for understanding the inner workings of attention mechanism based deep learning models for multivariate forecasting tasks. We design a novel benchmark of synthetically designed datasets with the transparent underlying generating process of multiple time series interactions with increasing complexity. The benchmark enables empirical evaluation of the performance of attention based deep neural networks in three different aspects: (i) prediction performance score, (ii) interpretability correctness, (iii) sensitivity analysis. Our analysis shows that although most models have satisfying and stable prediction performance results, they often fail to give correct interpretability. The only model with both a satisfying performance score and correct interpretability is IMV-LSTM, capturing both autocorrelations and crosscorrelations between multiple time series. Interestingly, while evaluating IMV-LSTM on simulated data from statistical and mechanistic models, the correctness of interpretability increases with more complex datasets.  相似文献   

5.
Magnetization switching is one of the most fundamental topics in the field of magnetism.Machine learning(ML)models of random forest(RF),support vector machine(SVM),deep neural network(DNN)methods are built and trained to classify the magnetization reversal and non-reversal cases of single-domain particle,and the classification performances are evaluated by comparison with micromagnetic simulations.The results show that the ML models have achieved great accuracy and the DNN model reaches the best area under curve(AUC)of 0.997,even with a small training dataset,and RF and SVM models have lower AUCs of 0.964 and 0.836,respectively.This work validates the potential of ML applications in studies of magnetization switching and provides the benchmark for further ML studies in magnetization switching.  相似文献   

6.
Adversarial examples are one of the most intriguing topics in modern deep learning. Imperceptible perturbations to the input can fool robust models. In relation to this problem, attack and defense methods are being developed almost on a daily basis. In parallel, efforts are being made to simply pointing out when an input image is an adversarial example. This can help prevent potential issues, as the failure cases are easily recognizable by humans. The proposal in this work is to study how chaos theory methods can help distinguish adversarial examples from regular images. Our work is based on the assumption that deep networks behave as chaotic systems, and adversarial examples are the main manifestation of it (in the sense that a slight input variation produces a totally different output). In our experiments, we show that the Lyapunov exponents (an established measure of chaoticity), which have been recently proposed for classification of adversarial examples, are not robust to image processing transformations that alter image entropy. Furthermore, we show that entropy can complement Lyapunov exponents in such a way that the discriminating power is significantly enhanced. The proposed method achieves 65% to 100% accuracy detecting adversarials with a wide range of attacks (for example: CW, PGD, Spatial, HopSkip) for the MNIST dataset, with similar results when entropy-changing image processing methods (such as Equalization, Speckle and Gaussian noise) are applied. This is also corroborated with two other datasets, Fashion-MNIST and CIFAR 19. These results indicate that classifiers can enhance their robustness against the adversarial phenomenon, being applied in a wide variety of conditions that potentially matches real world cases and also other threatening scenarios.  相似文献   

7.
In this article, we consider a version of the challenging problem of learning from datasets whose size is too limited to allow generalisation beyond the training set. To address the challenge, we propose to use a transfer learning approach whereby the model is first trained on a synthetic dataset replicating features of the original objects. In this study, the objects were smartphone photographs of near-complete Roman terra sigillata pottery vessels from the collection of the Museum of London. Taking the replicated features from published profile drawings of pottery forms allowed the integration of expert knowledge into the process through our synthetic data generator. After this first initial training the model was fine-tuned with data from photographs of real vessels. We show, through exhaustive experiments across several popular deep learning architectures, different test priors, and considering the impact of the photograph viewpoint and excessive damage to the vessels, that the proposed hybrid approach enables the creation of classifiers with appropriate generalisation performance. This performance is significantly better than that of classifiers trained exclusively on the original data, which shows the promise of the approach to alleviate the fundamental issue of learning from small datasets.  相似文献   

8.
为了提升虾新鲜度判别的准确性,提出了一种基于宽度学习(BLS)的虾新鲜度检测方法。首先采用多元散射校正(MSC)、标准正态变量校正(SNV)和直接正交信号校正(DOSC)对不同冷藏天数虾的原始高光谱进行预处理,再使用t分布随机邻域嵌入(t-SNE)将预处理之后的数据可视化,可视化结果表明DOSC聚类效果最佳。然后使用随机森林(RF)、主成分分析(PCA)和二维相关光谱分析(2D-COS)对经DOSC预处理之后的光谱数据进行特征选择。最后基于选择的特征波长对虾新鲜度进行建模分析。将宽度学习(BLS)首次用于虾新鲜度建模,同时与偏最小二乘判别(PLS-DA)和极限学习机(ELM)等经典判别模型做比较。研究结果表明RF方法最大限度地消除了光谱中的冗余信息,而BLS与线性建模方法PLS-DA以及非线性建模方法ELM相比,准确率更高并且判别时间更短,因此RF-BLS组合模型获得了最佳新鲜度判别效果,表明高光谱成像技术结合宽度学习识别虾的新鲜度是可行的,可以为在线检测虾新鲜度系统的开发提供理论依据。  相似文献   

9.
基于RF-GABPSO混合选择算法的黑土有机质含量估测研究   总被引:1,自引:0,他引:1  
针对土壤有机质含量高光谱估测研究中变量维数过高与特征谱段筛选问题,提出了一种结合随机森林和自适应搜索算法的混合特征选择方法。首先依据随机森林变量重要性原理获取初始优化集,然后利用遗传二进制粒子群封装算法对初始优化集进一步自适应筛选。对于土壤有机质含量估测建模问题,选择稳健性强且能有效处理高维变量的随机森林算法。以典型黑土区采集的土壤样品为研究对象,将ASD光谱仪获取的可见光-近红外区间光谱数据和经化学分析得到的土壤有机质含量为数据源,对原始光谱进行光谱变换和重采样处理后,采用随机森林-遗传二进制粒子群混合选择方法提取特征光谱区间,构建有机质含量随机森林估测模型。与利用全光谱、随机森林方法筛选的光谱和自适应搜索算法筛选的光谱构建随机森林模型得到的预测精度进行比较。结果表明,利用随机森林-遗传二进制粒子群混合特征选择算法筛选的波谱变量参与随机森林建模,预测决定系数,均方根误差和相对分析误差分别为0.838,0.54%,2.534。该方案应用最少的变量个数获得最高的预测精度,能够较高效地估测黑土有机质含量,也能为其他类型土壤在有机质含量估测研究的变量筛选与建模问题上提供参考。  相似文献   

10.
It is desirable to combine the expressive power of deep learning with Gaussian Process (GP) in one expressive Bayesian learning model. Deep kernel learning showed success as a deep network used for feature extraction. Then, a GP was used as the function model. Recently, it was suggested that, albeit training with marginal likelihood, the deterministic nature of a feature extractor might lead to overfitting, and replacement with a Bayesian network seemed to cure it. Here, we propose the conditional deep Gaussian process (DGP) in which the intermediate GPs in hierarchical composition are supported by the hyperdata and the exposed GP remains zero mean. Motivated by the inducing points in sparse GP, the hyperdata also play the role of function supports, but are hyperparameters rather than random variables. It follows our previous moment matching approach to approximate the marginal prior for conditional DGP with a GP carrying an effective kernel. Thus, as in empirical Bayes, the hyperdata are learned by optimizing the approximate marginal likelihood which implicitly depends on the hyperdata via the kernel. We show the equivalence with the deep kernel learning in the limit of dense hyperdata in latent space. However, the conditional DGP and the corresponding approximate inference enjoy the benefit of being more Bayesian than deep kernel learning. Preliminary extrapolation results demonstrate expressive power from the depth of hierarchy by exploiting the exact covariance and hyperdata learning, in comparison with GP kernel composition, DGP variational inference and deep kernel learning. We also address the non-Gaussian aspect of our model as well as way of upgrading to a full Bayes inference.  相似文献   

11.
CT图像中肺叶位置的确定对于肺部疾病的准确定位以及定性定量分析具有重要意义。为了提高肺叶自动分割准确率,提出了一种结合气管,血管等传统解剖学特征以及深度学习的肺叶分割算法。对原始图像进行预处理,获取肺实质、气管、血管以及基于深度学习网络的肺裂分割结果;整合来自多个解剖结构的信息生成分水岭分割所需成本图像;通过基于深度学习网络的肺叶粗分割结果,获取肺叶标记区域;执行基于标记的分水岭分割,实现肺叶的自动分割。选取了来自上海市肺科医院的20例含有肺部疾病患者的CT图像对该方法进行验证,最终的Jaccard相似性系数为92.4%。实验结果表明方法具有较高的肺叶分割精度,并且具有较强的鲁棒性。  相似文献   

12.
Multivariate time series anomaly detection is a widespread problem in the field of failure prevention. Fast prevention means lower repair costs and losses. The amount of sensors in novel industry systems makes the anomaly detection process quite difficult for humans. Algorithms that automate the process of detecting anomalies are crucial in modern failure prevention systems. Therefore, many machine learning models have been designed to address this problem. Mostly, they are autoencoder-based architectures with some generative adversarial elements. This work shows a framework that incorporates neuroevolution methods to boost the anomaly detection scores of new and already known models. The presented approach adapts evolution strategies for evolving an ensemble model, in which every single model works on a subgroup of data sensors. The next goal of neuroevolution is to optimize the architecture and hyperparameters such as the window size, the number of layers, and the layer depths. The proposed framework shows that it is possible to boost most anomaly detection deep learning models in a reasonable time and a fully automated mode. We ran tests on the SWAT and WADI datasets. To the best of our knowledge, this is the first approach in which an ensemble deep learning anomaly detection model is built in a fully automatic way using a neuroevolution strategy.  相似文献   

13.
Studying the complex quantum dynamics of interacting many-body systems is one of the most challenging areas in modern physics. Here, we use machine learning (ML) models to identify the symmetrized base states of interacting Rydberg atoms of various atom numbers (up to six) and geometric configurations. To obtain the data set for training the ML classifiers, we generate Rydberg excitation probability profiles that simulate experimental data by utilizing Lindblad equations that incorporate laser intensities and phase noise. Then, we classify the data sets using support vector machines (SVMs) and random forest classifiers (RFCs). With these ML models, we achieve high accuracy of up to 100% for data sets containing only a few hundred samples, especially for the closed atom configurations such as the pentagonal (five atoms) and hexagonal (six atoms) systems. The results demonstrate that computationally cost-effective ML models can be used in the identification of Rydberg atom configurations.  相似文献   

14.
Using chest X-ray images is one of the least expensive and easiest ways to diagnose patients who suffer from lung diseases such as pneumonia and bronchitis. Inspired by existing work, a deep learning model is proposed to classify chest X-ray images into 14 lung-related pathological conditions. However, small datasets are not sufficient to train the deep learning model. Two methods were used to tackle this: (1) transfer learning based on two pretrained neural networks, DenseNet and ResNet, was employed; (2) data were preprocessed, including checking data leakage, handling class imbalance, and performing data augmentation, before feeding the neural network. The proposed model was evaluated according to the classification accuracy and receiver operating characteristic (ROC) curves, as well as visualized by class activation maps. DenseNet121 and ResNet50 were used in the simulations, and the results showed that the model trained by DenseNet121 had better accuracy than that trained by ResNet50.  相似文献   

15.
深度学习是目前最好的模式识别工具,预期会在核物理领域帮助科学家从大量复杂数据中寻找与某些物理最相关的特征。本文综述了深度学习技术的分类,不同数据结构对应的最优神经网络架构,黑盒模型的可解释性与预测结果的不确定性。介绍了深度学习在核物质状态方程、核结构、原子核质量、衰变与裂变方面的应用,并展示如何训练神经网络预测原子核质量。结果发现使用实验数据训练的神经网络模型对未参与训练的实验数据拥有良好的预测能力。基于已有的实验数据外推,神经网络对丰中子的轻原子核质量预测结果与宏观微观液滴模型有较大偏离。此区域可能存在未被宏观微观液滴模型包含的新物理,需要进一步的实验数据验证。  相似文献   

16.
There is an increasing interest in machine learning (ML) algorithms for predicting patient outcomes, as these methods are designed to automatically discover complex data patterns. For example, the random forest (RF) algorithm is designed to identify relevant predictor variables out of a large set of candidates. In addition, researchers may also use external information for variable selection to improve model interpretability and variable selection accuracy, thereby prediction quality. However, it is unclear to which extent, if at all, RF and ML methods may benefit from external information. In this paper, we examine the usefulness of external information from prior variable selection studies that used traditional statistical modeling approaches such as the Lasso, or suboptimal methods such as univariate selection. We conducted a plasmode simulation study based on subsampling a data set from a pharmacoepidemiologic study with nearly 200,000 individuals, two binary outcomes and 1152 candidate predictor (mainly sparse binary) variables. When the scope of candidate predictors was reduced based on external knowledge RF models achieved better calibration, that is, better agreement of predictions and observed outcome rates. However, prediction quality measured by cross-entropy, AUROC or the Brier score did not improve. We recommend appraising the methodological quality of studies that serve as an external information source for future prediction model development.  相似文献   

17.
Identification of voice disorders has a fundamental role in our life nowadays. Therefore, many of these diseases must be diagnosed at early stages of occurrence before they lead to a critical condition. Acoustic analysis can be used to identify voice disorders as a complementary technique with other traditional invasive methods, such as laryngoscopy. In this article, we followed an extensive study in the diagnosis of voice disorders using the statistical pattern recognition techniques. Finally, we proposed a combined scheme of feature reduction methods followed by pattern recognition methods to classify voice disorders. Six classifiers are used to evaluate feature vectors obtained by principal component analysis or linear discriminant analysis (LDA) as feature reduction methods. Furthermore, individual, forward, backward, and branch-and-bound methods are examined as feature selection methods. The performance of each combined scheme is evaluated in terms of the accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve (AUC). The experimental results denote that LDA along with support vector machine (SVM) has the best performance, with a recognition rate of 94.26% and AUC of 97.94%. Additionally, this structure has the lowest complexity in comparison with other architectures. Among feature selection methods, individual feature selection followed by SVM classifier shows the best recognition rate of 91.55% and AUC of 95.80%.  相似文献   

18.
傅里叶变换红外光谱通常包含有大量的波长变量点,对其进行定性分析需要建立稳健的、可解释性的分类模型。稀疏线性判别分析(SLDA)是一种较为新颖和有效的机器学习算法,常用于高维度、小样本数据的变量筛选和判别分析,SLDA通过在线性判别分析中引入正则项,使分类器训练过程和变量选择过程同时完成,不同判别方向上载荷系数的稀疏性则增强了模型的可解释性。采集甘肃不同产地的秦艽样本94个,其中麻花秦艽(Gentiana straminea Maxim)30个,黄管秦艽(Gentiana officinalis)28个,大叶秦艽(Gentiana macrophylla Pall)36个,利用傅里叶变换红外光谱法获得所有样本的光谱图。取其中70个样本构成训练集,剩余24个为测试集。使用训练集建立SLDA模型,对2个判别方向上不为0的载荷系数个数进行网格化寻优,得到了最优的参数空间。利用建立的SLDA模型对测试样本进行预测,其分类准确率达到100%,实现了对三种秦艽的快速、准确鉴别。实验结果表明,与PLS-DA方法相比,SLDA模型在分类准确率、稀疏性及可解释性方面均具有一定优势,是一种新颖、有效的光谱定性分析方法。  相似文献   

19.
In this paper, a deep learning and expert knowledge based receiver is proposed for underwater acoustic (UWA) orthogonal frequency division multiplexing (OFDM). Different from the existing deep learning based UWA OFDM receivers, the proposed receiver combines deep learning with the classical expert knowledge of block-based signal processing in UWA OFDM to improve system performance and interpretability. It performs joint channel estimation and signal detection by designing skip connection (SC) convolutional neural network (CNN) cascaded attention mechanism (AM) enhanced bi-directional long short-term memory (BiLSTM) network, abbreviated as SC-CNN-AM-BiLSTM network (SCABNet). Specifically, the channel estimation subnet is designed with SC-CNN to utilize the thought of image super-resolution to reconstruct the entire channel frequency response of all subcarriers. The signal detection subnet is designed with AM-BiLSTM to extract the correlations of received sequential data for signal detection. Especially with the AM, the signal detection subnet can focus more on effective information of the received distorted signal to train the optimal network weights to improve the accuracy of data recovery. The proposed SCABNet is evaluated by experimental data, and the results have demonstrated that the SCABNet has the lowest BER and robust performance compared to the traditional linear algorithm, deep learning based black-box receiver, and ComNet receiver. And the proposed SCABNet is effective and robust when multiple nonideal factors co-exist.  相似文献   

20.
Deforestation is depletion of the forest cover and degradation in forest quality mainly through repeated fires, over-exploitation, and diseases. In a forest ecosystem, occurrence of wildfires is a natural phenomena. The curse of global warming and man-made interventions have made the wildfires increasingly extreme and widespread. Though, extremely challenging due to rapidly changing climate, accurate prediction of these fire events can significantly improve forestation worldwide. In this paper, we have addressed this issue by proposing a deep learning (DL) framework using long short term memory (LSTM) model. The proposed mechanism accurately forecasts weekly fire alerts and associated burnt area (ha) utilizing historical fire data provided by GLOBAL FOREST WATCH. Pakistan is taken as a case study since its deforestation rate is among the highest in the world while having one of the lowest forest covers. Number of epochs, dense layers, hidden layers and hidden layer units are varied to optimize the model for high estimation accuracy and low root mean square error (RMSE). Simulation results show that the proposed method can predict the forest fire occurrences with 95% accuracy by employing a suitable hyperparameter tuning.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号