首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
    
Raman spectroscopy's capability to provide meaningful composition predictions is heavily reliant on a preprocessing step to remove insignificant spectral variation. This is crucial in biofluid analysis. Widespread adoption of diagnostics using Raman requires a robust model that can withstand routine spectra discrepancies due to unavoidable variations such as age, diet, and medical background. A wealth of preprocessing methods are available, and it is often up to trial-and-error or user experience to select the method that gives the best results. This process can be incredibly time consuming and inconsistent for multiple operators. In this study, we detail a method to analyze the statistical variability within a set of training spectra and determine suitability to form a robust model. This allows us to selectively qualify or exclude a preprocessing method, predetermine robustness, and simultaneously identify the number of components that will form the best predictive model. We demonstrate the ability of this technique to improve predictive models of two artificial biological fluids. Raman spectroscopy is ideal for noninvasive, nondestructive analysis. Routine health monitoring that maximizes comfort is increasingly crucial, particularly in epidemic-level diabetes diagnoses. High variability in spectra of biological samples can hinder Raman's adoption for these methods. Our technique allows the decision of optimal pretreatment method to be determined for the operator; model performance is no longer a function of user experience. We foresee this statistical technique being an instrumental element to widening the adoption of Raman as a monitoring tool in a field of biofluid analysis.  相似文献   

2.
    
Optical spectroscopy and imaging techniques play important roles in many fields such as disease diagnosis, biological study, information technology, optical science, and materials science. Over the past decade, machine learning (ML) has proved promising in decoding complex data, enabling rapid and accurate analysis of optical spectra and images. This review aims to shed light on various ML algorithms for optical data analysis with a focus on their applications in a wide range of fields. The goal of this work is to sketch the validity of ML-based optical data decoding. The review concludes with an outlook on unaddressed problems and opportunities in this emerging subject that interfaces optics, data science, and ML.  相似文献   

3.
4.
    
Payment data is one of the most valuable assets that retail banks can leverage as the major competitive advantage with respect to new entrants such as Fintech companies or giant internet companies. In marketing, the value behind data relates to the power of encoding customer preferences: the better you know your customer, the better your marketing strategy. In this paper, we present a B2B2C lead generation application based on payment transaction data within the online banking system. In this approach, the bank is an intermediary between its private customers and merchants. The bank uses its competence in Machine Learning driven marketing to build a lead generation application that helps merchants run data driven campaigns through the banking channels to reach retail customers. The bank’s retail customers trade the utility hidden in its payment transaction data for special offers and discounts offered by merchants. During the entire process banks protects the privacy of the retail customer.  相似文献   

5.
Li Xing 《光谱学快报》2013,46(1):47-53
Data preprocessing and multivariate regression methods are two key factors influencing the model prediction ability of near-infrared (NIR) spectroscopy. The present paper evaluated the application of the combined stationary wavelet transform–support vector machine method for developing juice NIR models. The performance of this method has been compared with other methods, such as stand normal variate–partial least squares, stationary wavelet transform–partial least squares, and stand normal variate–stationary wavelet transform methods. The result showed that compared with other methods, the stationary wavelet transform–support vector machine method can provide good quantitative analysis on saccharose concentration in juice.  相似文献   

6.
    
Laser-induced breakdown spectroscopy (LIBS) has attracted much attention in terms of both scientific research and industrial application. An important branch of LIBS research in Asia, the development of data processing methods for LIBS, is reviewed. First, the basic principle of LIBS and the characteristics of spectral data are briefly introduced. Next, two aspects of research on and problems with data processing methods are described: i) the basic principles of data preprocessing methods are elaborated in detail on the basis of the characteristics of spectral data; ii) the performance of data analysis methods in qualitative and quantitative analysis of LIBS is described. Finally, a direction for future development of data processing methods for LIBS is also proposed.  相似文献   

7.
    
Data stream mining techniques have recently received increasing research interest, especially in medical data classification. An unbalanced representation of the classification’s targets in these data is a common challenge because classification techniques are biased toward the major class. Many methods have attempted to address this problem but have been exaggeratedly biased toward the minor class. In this work, we propose a method for balancing the presence of the minor class within the current window of the data stream while preserving the data’s original majority as much as possible. The proposed method utilized similarity analysis for selecting specific instances from the previous window. This group of minor-class was then added to the current window’s instances. Implementing the proposed method using the Siena dataset showed promising results compared to the Skew ensemble method and some other research methods.  相似文献   

8.
    
The accurate prediction of gross box-office markets is of great benefit for investment and management in the movie industry. In this work, we propose a machine learning-based method for predicting the movie box-office revenue of a country based on the empirical comparisons of eight methods with diverse combinations of economic factors. Specifically, we achieved a prediction performance of the relative root mean squared error of 0.056 in the US and of 0.183 in China for the two case studies of movie markets in time-series forecasting experiments from 2013 to 2016. We concluded that the support-vector-machine-based method using gross domestic product reached the best prediction performance and satisfies the easily available information of economic factors. The computational experiments and comparison studies provided evidence for the effectiveness and advantages of our proposed prediction strategy. In the validation process of the predicted total box-office markets in 2017, the error rates were 0.044 in the US and 0.066 in China. In the consecutive predictions of nationwide box-office markets in 2018 and 2019, the mean relative absolute percentage errors achieved were 0.041 and 0.035 in the US and China, respectively. The precise predictions, both in the training and validation data, demonstrate the efficiency and versatility of our proposed method.  相似文献   

9.
肖邓杰  乔予思  储中明 《强激光与粒子束》2021,33(5):054004-1-054004-7
轨道校正是加速器束流调节最基本的步骤之一,也是目前各加速器实验室共同面对的问题之一。在传统方法中,线性代数工具被应用于各种类型的响应矩阵,以解决响应矩阵的奇异性等问题。提出一种基于机器学习的加速器轨道校正方法,可以避免处理响应矩阵的问题通过直接读取BPM数据和校正磁铁强度值实时构建机器学习模型快速地对轨道进行修正。对机器学习的轨道校正方法进行了介绍,并从数学公式、算法模型、在模拟和真实数据上的测试等方面对该方法进行了讨论。结果表明,在误差范围内该方法能有效的对加速器束流轨道进行校正。  相似文献   

10.
    
Applying machine learning algorithms for assessing the transmission quality in optical networks is associated with substantial challenges. Datasets that could provide training instances tend to be small and heavily imbalanced. This requires applying imbalanced compensation techniques when using binary classification algorithms, but it also makes one-class classification, learning only from instances of the majority class, a noteworthy alternative. This work examines the utility of both these approaches using a real dataset from a Dense Wavelength Division Multiplexing network operator, gathered through the network control plane. The dataset is indeed of a very small size and contains very few examples of “bad” paths that do not deliver the required level of transmission quality. Two binary classification algorithms, random forest and extreme gradient boosting, are used in combination with two imbalance handling methods, instance weighting and synthetic minority class instance generation. Their predictive performance is compared with that of four one-class classification algorithms: One-class SVM, one-class naive Bayes classifier, isolation forest, and maximum entropy modeling. The one-class approach turns out to be clearly superior, particularly with respect to the level of classification precision, making it possible to obtain more practically useful models.  相似文献   

11.
    
Trend anomaly detection is the practice of comparing and analyzing current and historical data trends to detect real-time abnormalities in online industrial data-streams. It has the advantages of tracking a concept drift automatically and predicting trend changes in the shortest time, making it important both for algorithmic research and industry. However, industrial data streams contain considerable noise that interferes with detecting weak anomalies. In this paper, the fastest detection algorithm “sliding nesting” is adopted. It is based on calculating the data weight in each window by applying variable weights, while maintaining the method of trend-effective integration accumulation. The new algorithm changes the traditional calculation method of the trend anomaly detection score, which calculates the score in a short window. This algorithm, SNWFD–DS, can detect weak trend abnormalities in the presence of noise interference. Compared with other methods, it has significant advantages. An on-site oil drilling data test shows that this method can significantly reduce delays compared with other methods and can improve the detection accuracy of weak trend anomalies under noise interference.  相似文献   

12.
Background: the machine learning (ML) techniques have been implemented in numerous applications, including health-care, security, entertainment, and sports. In this article, we present how the ML can be used for building a professional football team and planning player transfers. Methods: in this research, we defined numerous parameters for player assessment, and three definitions of a successful transfer. We used the Random Forest, Naive Bayes, and AdaBoost algorithms in order to predict the player transfer success. We used realistic, publicly available data in order to train and test the classifiers. Results: in the article, we present numerous experiments; they differ in the weights of parameters, the successful transfer definitions, and other factors. We report promising results (accuracy = 0.82, precision = 0.84, recall = 0.82, and F1-score = 0.83). Conclusion: the presented research proves that machine learning can be helpful in professional football team building. The proposed algorithm will be developed in the future and it may be implemented as a professional tool for football talent scouts.  相似文献   

13.
    
All features of any data type are universally equipped with categorical nature revealed through histograms. A contingency table framed by two histograms affords directional and mutual associations based on rescaled conditional Shannon entropies for any feature-pair. The heatmap of the mutual association matrix of all features becomes a roadmap showing which features are highly associative with which features. We develop our data analysis paradigm called categorical exploratory data analysis (CEDA) with this heatmap as a foundation. CEDA is demonstrated to provide new resolutions for two topics: multiclass classification (MCC) with one single categorical response variable and response manifold analytics (RMA) with multiple response variables. We compute visible and explainable information contents with multiscale and heterogeneous deterministic and stochastic structures in both topics. MCC involves all feature-group specific mixing geometries of labeled high-dimensional point-clouds. Upon each identified feature-group, we devise an indirect distance measure, a robust label embedding tree (LET), and a series of tree-based binary competitions to discover and present asymmetric mixing geometries. Then, a chain of complementary feature-groups offers a collection of mixing geometric pattern-categories with multiple perspective views. RMA studies a system’s regulating principles via multiple dimensional manifolds jointly constituted by targeted multiple response features and selected major covariate features. This manifold is marked with categorical localities reflecting major effects. Diverse minor effects are checked and identified across all localities for heterogeneity. Both MCC and RMA information contents are computed for data’s information content with predictive inferences as by-products. We illustrate CEDA developments via Iris data and demonstrate its applications on data taken from the PITCHf/x database.  相似文献   

14.
    
Quantum Machine Learning (QML) has not yet demonstrated extensively and clearly its advantages compared to the classical machine learning approach. So far, there are only specific cases where some quantum-inspired techniques have achieved small incremental advantages, and a few experimental cases in hybrid quantum computing are promising, considering a mid-term future (not taking into account the achievements purely associated with optimization using quantum-classical algorithms). The current quantum computers are noisy and have few qubits to test, making it difficult to demonstrate the current and potential quantum advantage of QML methods. This study shows that we can achieve better classical encoding and performance of quantum classifiers by using Linear Discriminant Analysis (LDA) during the data preprocessing step. As a result, the Variational Quantum Algorithm (VQA) shows a gain of performance in balanced accuracy with the LDA technique and outperforms baseline classical classifiers.  相似文献   

15.
    
Data from smart grids are challenging to analyze due to their very large size, high dimensionality, skewness, sparsity, and number of seasonal fluctuations, including daily and weekly effects. With the data arriving in a sequential form the underlying distribution is subject to changes over the time intervals. Time series data streams have their own specifics in terms of the data processing and data analysis because, usually, it is not possible to process the whole data in memory as the large data volumes are generated fast so the processing and the analysis should be done incrementally using sliding windows. Despite the proposal of many clustering techniques applicable for grouping the observations of a single data stream, only a few of them are focused on splitting the whole data streams into the clusters. In this article we aim to explore individual characteristics of electricity usage and recommend the most suitable tariff to the customer so they can benefit from lower prices. This work investigates various algorithms (and their improvements) what allows us to formulate the clusters, in real time, based on smart meter data.  相似文献   

16.
    
Machine learning algorithms learn a desired input-output relation from examples in order to interpret new inputs. This is important for tasks such as image and speech recognition or strategy optimisation, with growing applications in the IT industry. In the last couple of years, researchers investigated if quantum computing can help to improve classical machine learning algorithms. Ideas range from running computationally costly algorithms or their subroutines efficiently on a quantum computer to the translation of stochastic methods into the language of quantum theory. This contribution gives a systematic overview of the emerging field of quantum machine learning. It presents the approaches as well as technical details in an accessible way, and discusses the potential of a future theory of quantum learning.  相似文献   

17.
宋雨  焦谱  李刚 《应用声学》2015,23(12):79-79
伴随着工业及社会信息化程度的增强,各个领域的自动化程度越来越高,大规模及超大规模海量数据应运而生,呈现出大数据特征。这些海量数据在提升行业发展动力的同时,也带来了巨大的挑战性问题—数据可用性。为了从海量数据中甄别出无用信息、挖掘有利于相关领域发展的有价值信息,就需要对其进行数据分析。数据预处理技术可以极大地减少数据分析时的处理量,提高数据分析处理的效率,而属性约简在数据预处理中是一个比较重要的环节。在分析大数据属性特征的基础上,较系统地分析了目前几种主流的属性约简算法,对各类算法的性能进行了剖析,并展望了今后大数据预处理的研究工作思路。  相似文献   

18.
    
Cybercriminals use malicious URLs as distribution channels to propagate malware over the web. Attackers exploit vulnerabilities in browsers to install malware to have access to the victim’s computer remotely. The purpose of most malware is to gain access to a network, ex-filtrate sensitive information, and secretly monitor targeted computer systems. In this paper, a data mining approach known as classification based on association (CBA) to detect malicious URLs using URL and webpage content features is presented. The CBA algorithm uses a training dataset of URLs as historical data to discover association rules to build an accurate classifier. The experimental results show that CBA gives comparable performance against benchmark classification algorithms, achieving 95.8% accuracy with low false positive and negative rates.  相似文献   

19.
    
Privacy-preserving techniques allow private information to be used without compromising privacy. Most encryption algorithms, such as the Advanced Encryption Standard (AES) algorithm, cannot perform computational operations on encrypted data without first applying the decryption process. Homomorphic encryption algorithms provide innovative solutions to support computations on encrypted data while preserving the content of private information. However, these algorithms have some limitations, such as computational cost as well as the need for modifications for each case study. In this paper, we present a comprehensive overview of various homomorphic encryption tools for Big Data analysis and their applications. We also discuss a security framework for Big Data analysis while preserving privacy using homomorphic encryption algorithms. We highlight the fundamental features and tradeoffs that should be considered when choosing the right approach for Big Data applications in practice. We then present a comparison of popular current homomorphic encryption tools with respect to these identified characteristics. We examine the implementation results of various homomorphic encryption toolkits and compare their performances. Finally, we highlight some important issues and research opportunities. We aim to anticipate how homomorphic encryption technology will be useful for secure Big Data processing, especially to improve the utility and performance of privacy-preserving machine learning.  相似文献   

20.
    
Medical data includes clinical trials and clinical data such as patient-generated health data, laboratory results, medical imaging, and different signals coming from continuous health monitoring. Some commonly used data analysis techniques are text mining, big data analytics, and data mining. These techniques can be used for classification, clustering, and machine learning tasks. Machine learning could be described as an automatic learning process derived from concepts and knowledge without deliberate system coding. However, finding a suitable machine learning architecture for a specific task is still an open problem. In this work, we propose a machine learning model for the multi-class classification of medical data. This model is comprised of two components—a restricted Boltzmann machine and a classifier system. It uses a discriminant pruning method to select the most salient neurons in the hidden layer of the neural network, which implicitly leads to a selection of features for the input patterns that feed the classifier system. This study aims to investigate whether information-entropy measures may provide evidence for guiding discriminative pruning in a neural network for medical data processing, particularly cancer research, by using three cancer databases: Breast Cancer, Cervical Cancer, and Primary Tumour. Our proposal aimed to investigate the post-training neuronal pruning methodology using dissimilarity measures inspired by the information-entropy theory; the results obtained after pruning the neural network were favourable. Specifically, for the Breast Cancer dataset, the reported results indicate a 10.68% error rate, while our error rates range from 10% to 15%; for the Cervical Cancer dataset, the reported best error rate is 31%, while our proposal error rates are in the range of 4% to 6%; lastly, for the Primary Tumour dataset, the reported error rate is 20.35%, and our best error rate is 31%.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号