首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Text classification is a fundamental research direction, aims to assign tags to text units. Recently, graph neural networks (GNN) have exhibited some excellent properties in textual information processing. Furthermore, the pre-trained language model also realized promising effects in many tasks. However, many text processing methods cannot model a single text unit’s structure or ignore the semantic features. To solve these problems and comprehensively utilize the text’s structure information and semantic information, we propose a Bert-Enhanced text Graph Neural Network model (BEGNN). For each text, we construct a text graph separately according to the co-occurrence relationship of words and use GNN to extract text features. Moreover, we employ Bert to extract semantic features. The former part can take into account the structural information, and the latter can focus on modeling the semantic information. Finally, we interact and aggregate these two features of different granularity to get a more effective representation. Experiments on standard datasets demonstrate the effectiveness of BEGNN.  相似文献   

2.
This study constructs a comprehensive index to effectively judge the optimal number of topics in the LDA topic model. Based on the requirements for selecting the number of topics, a comprehensive judgment index of perplexity, isolation, stability, and coincidence is constructed to select the number of topics. This method provides four advantages to selecting the optimal number of topics: (1) good predictive ability, (2) high isolation between topics, (3) no duplicate topics, and (4) repeatability. First, we use three general datasets to compare our proposed method with existing methods, and the results show that the optimal topic number selection method has better selection results. Then, we collected the patent policies of various provinces and cities in China (excluding Hong Kong, Macao, and Taiwan) as datasets. By using the optimal topic number selection method proposed in this study, we can classify patent policies well.  相似文献   

3.
复杂背景彩色图像中的文字分割   总被引:6,自引:0,他引:6  
胡小锋  周勇  叶庆泰 《光学技术》2006,32(1):141-143
提出了一种纹理和连通域特征相结合的彩色文字分割方法。基于文字的边缘纹理特征,粗略分割出可能的文字区域。计算各区域内的颜色统计直方图,进行二类颜色聚类。分析文字连通域的几何特征,滤除非文字连通域。计算文字连通域像素点的垂直投影,估计文字的宽度和文字间隔,判断是否为文字排列,对粗分割的文字区域进行校验,确定文字区域的边框。通过自然场景下拍摄的100幅图像的文字分割实验,证明了该文字定位和分割方法的有效性。  相似文献   

4.
徐艳芳  刘浩学  黄敏  宋月红 《光学学报》2012,32(12):1233001
对数字硬拷贝成像ISO13660标准的线条客观属性量与其形成文本的感知清晰度的相关性进行了研究。实验针对6.5 磅和10.5磅简单笔划的黑体汉字、Times New Roman西文字符,以及等价宽度的线条组成的等宽度明暗条纹,进行了清晰度感知质量的主观测评和相应单线条客观质量属性的测量,并建立了两者间的相关模型。结果表明,线条的对比度、粗糙度、相对模糊度及线宽与文字的感知清晰度具有线性相关性,且感知清晰度与线条的对比度正相关,与粗糙度和相对模糊度负相关;对少笔划或多笔划但间距较大的情况,感知清晰度与线宽正相关,反之,负相关。此外,各线属性量对感知清晰度的影响权重相互间至少有1倍的差距。  相似文献   

5.
At present, short text classification is a hot topic in the area of natural language processing. Due to the sparseness and irregularity of short text, the task of short text classification still faces great challenges. In this paper, we propose a new classification model from the aspects of short text representation, global feature extraction and local feature extraction. We use convolutional networks to extract shallow features from short text vectorization, and introduce a multi-level semantic extraction framework. It uses BiLSTM as the encoding layer while the attention mechanism and normalization are used as the interaction layer. Finally, we concatenate the convolution feature vector and semantic results of the semantic framework. After several rounds of feature integration, the framework improves the quality of the feature representation. Combined with the capsule network, we obtain high-level local information by dynamic routing and then squash them. In addition, we explore the optimal depth of semantic feature extraction for short text based on a multi-level semantic framework. We utilized four benchmark datasets to demonstrate that our model provides comparable results. The experimental results show that the accuracy of SUBJ, TREC, MR and ProcCons are 93.8%, 91.94%, 82.81% and 98.43%, respectively, which verifies that our model has greatly improves classification accuracy and model robustness.  相似文献   

6.
The Variational AutoEncoder (VAE) has made significant progress in text generation, but it focused on short text (always a sentence). Long texts consist of multiple sentences. There is a particular relationship between each sentence, especially between the latent variables that control the generation of the sentences. The relationships between these latent variables help in generating continuous and logically connected long texts. There exist very few studies on the relationships between these latent variables. We proposed a method for combining the Transformer-Based Hierarchical Variational AutoEncoder and Hidden Markov Model (HT-HVAE) to learn multiple hierarchical latent variables and their relationships. This application improves long text generation. We use a hierarchical Transformer encoder to encode the long texts in order to obtain better hierarchical information of the long text. HT-HVAE’s generation network uses HMM to learn the relationship between latent variables. We also proposed a method for calculating the perplexity for the multiple hierarchical latent variable structure. The experimental results show that our model is more effective in the dataset with strong logic, alleviates the notorious posterior collapse problem, and generates more continuous and logically connected long text.  相似文献   

7.
Online social media provides massive open-ended platforms for users of a wide variety of backgrounds, interests, and beliefs to interact and debate, facilitating countless discussions across a myriad of subjects. With numerous unique voices being lent to the ever-growing information stream, it is essential to consider how the types of conversations that result from a social media post represent the post itself. We hypothesize that the biases and predispositions of users cause them to react to different topics in different ways not necessarily entirely intended by the sender. In this paper, we introduce a set of unique features that capture patterns of discourse, allowing us to empirically explore the relationship between a topic and the conversations it induces. Utilizing “microscopic” trends to describe “macroscopic” phenomena, we set a paradigm for analyzing information dissemination through the user reactions that arise from a topic, eliminating the need to analyze the involved text of the discussions. Using a Reddit dataset, we find that our features not only enable classifiers to accurately distinguish between content genre, but also can identify more subtle semantic differences in content under a single topic as well as isolating outliers whose subject matter is substantially different from the norm.  相似文献   

8.
Software security is a very important aspect for software development organizations who wish to provide high-quality and dependable software to their consumers. A crucial part of software security is the early detection of software vulnerabilities. Vulnerability prediction is a mechanism that facilitates the identification (and, in turn, the mitigation) of vulnerabilities early enough during the software development cycle. The scientific community has recently focused a lot of attention on developing Deep Learning models using text mining techniques for predicting the existence of vulnerabilities in software components. However, there are also studies that examine whether the utilization of statically extracted software metrics can lead to adequate Vulnerability Prediction Models. In this paper, both software metrics- and text mining-based Vulnerability Prediction Models are constructed and compared. A combination of software metrics and text tokens using deep-learning models is examined as well in order to investigate if a combined model can lead to more accurate vulnerability prediction. For the purposes of the present study, a vulnerability dataset containing vulnerabilities from real-world software products is utilized and extended. The results of our analysis indicate that text mining-based models outperform software metrics-based models with respect to their F2-score, whereas enriching the text mining-based models with software metrics was not found to provide any added value to their predictive performance.  相似文献   

9.
应用傅里叶变换红外光谱(FTIR)联合主成分分析法(PCA),分析直肠癌转移淋巴结的谱学特征,并对直肠癌转移淋巴结和非转移淋巴结进行线性判别分析。80例直肠癌转移淋巴结和80例未转移淋巴结进行FTIR光谱分析,计算峰强并进行主成分分析,得出在波数4 000~1 700 cm-1范围主成分1(Principal components 1,PC1)是3 260 cm-1,PC2为1 740 cm-1。波数1 700~1 000 cm-1范围,PC1为1 640 cm-1,PC2为1 080 cm-1,将良、恶性淋巴结光谱3 260,1 740,1 640,1 080 cm-1相对峰强比(I/I1 460)和波数1 080和1 300 cm-1进行t检验,良、恶性结果差异有统计学意义(p<0.05),表明癌转移淋巴结中蛋白含量、蛋白的形成、氨基酸增多;脂肪含量明显减少与癌组织中无氧酵解脂肪含量减少有关。将相对峰强比(I1 080/I1 460, I1 640/I1 460, I3 260/I1 460, I1 740/I1 460, n=160)进行PCA聚类分析,结果显示可以将良恶性淋巴结鉴别,良性淋巴结聚类在第一和四象限,恶性淋巴结聚类在二和三象限。将相对峰强比、1 080和1 300 cm-1进行线性判别分析(LDA),将25例淋巴结作为验证集进行分析,得出PCA/LDA模型的敏感度是87.5%,特异度是88.5%。结果表明傅里叶变换红外光谱分析技术可成为术中原位、在体和快速诊断直肠癌淋巴结转移的一种简便方法。  相似文献   

10.
快速倾斜镜的建模与模型参考自适应控制技术研究   总被引:1,自引:0,他引:1  
刘敏 《光学技术》2008,34(1):108-112
通过实验建模方法建立快速倾斜镜四阶模型,采用routh降阶方法将快速倾斜镜四阶模型降为二阶模型,数学仿真验证了简化模型较好的反映了原系统主要性能。提出的建模方法对快速倾斜镜的准确建模是简便、有效的,为现代控制理论的研究提供了数学基础。在建立二阶模型的基础上,设计了基于李亚普诺夫稳定理论的模型参考自适应控制器,并进行了全系统的计算机仿真。仿真结果表明,针对快速倾斜提出的基于李亚普诺夫稳定理论的模型参考自适应控制器能够减小系统稳态误差、提高系统响应速度。研究结果为倾斜镜的快速控制提供了一种新的思路。  相似文献   

11.
矿井突水是影响矿井安全生产的重要因素之一,如果矿井发生突水,能够快速、准确地判别突水水源类型是治理矿井突水灾害保证生产安全的重要环节,因此,建立一个能够快速识别矿井突水水源的模型具有重要的意义。水化学分析法作为在传统的矿井突水水源类型识别方法里应用最为广泛的识别方法,通过获得相应的pH值、离子浓度、电导率等参数,然后利用这些参数来建立突水水源的类型识别模型对矿井突水的类型进行判别。针对这种传统矿井突水水源识别方法在判别时间上耗时长和识别准确率低等不足,鉴于LIF技术具有分析速度快、灵敏度高等优点,提出了将线性判别分析(LDA)算法作为弱分类器的自适应提升(AdaBoost)算法用于激光诱导荧光(LIF)光谱识别矿井突水水源的新方法。用于实验的九种水样(每种水样各取50个样本)由淮南地区某矿的老空水、灰岩水以及按不同比例混合的老空水与灰岩水的七种混合水构成。将405 nm激光器发射的激光打入被测水体并采集荧光光谱数据,然后对采集到450组荧光光谱数据进行分析,取其中360组光谱数据(每种水样各40组)用作训练集,取剩余90组光谱数据用作测试集。分别选取三种算法针对水样的激光诱导荧光光谱的分类进行了建模并将三种结果进行对比。首先利用决策树算法对光谱进行分类识别,在节点个数为8时决策树对测试集的分类效果最好,分类准确率达到91.11%。然后针对决策树算法分类效果的不足,利用决策树算法作为弱分类器的AdaBoost算法,当选取节点个数为9的决策树作为弱分类器的时,对训练集的分类准确率为97.78%。最后针对基于决策树的AdaBoost算法的泛化性能不足和为了获得更好的分类效果,提出了基于LDA算法作为弱分类器的AdaBoost算法,在设置迭代次数为150后对水样光谱数据分类准确率可以达到100%。通过实验结果可以发现,集成学习算法的分类能力比传统的分类算法对水样的光谱的分类识别能力更强,相较于同为九个节点的决策树算法,采用节点数为9的决策树作为弱学习器的AdaBoost算法对测试集的分类准确率从88.89%提升到了97.78%,对训练集的分类准确率从99.72%提升到了100%;然后可以发现相对于使用决策树作为弱分类器的AdaBoost算法,采用LDA算法作为AdaBoost算法的弱分类器对水样的光谱的测试集的分类准确率从97.78%提升到了100%,对训练集的分类准确率达到100%,具有更好的识别效果,并且具有更好的泛化性能。实验结果证明采用Adaboost-LDA算法为激光荧光光谱的模式分类用于矿井突水水源的判别和预警是可行且有效的。  相似文献   

12.
某型激光导引头光电系统建模与测试方法研究   总被引:3,自引:0,他引:3  
刘敏 《光学技术》2008,34(2):189-193
建立了激光导引头跟踪回路的数学模型,根据实际情况设计了仿真系统环境,提出了一个易于实现的可靠的激光导引头静态测试和动态测试的仿真系统方案以及在全弹道仿真条件下验证导引头数学模型准确度的仿真方案。目前该系统已经成功应用于某型激光制导武器的半实物仿真。  相似文献   

13.
Modern megavoltage x-ray radiotherapy with high spatial and temporal dose gradients puts high demands on the entire delivery system, including not just the linear accelerator and the multi-leaf collimator, but also algorithms used for optimization and dose calculations, and detectors used for quality assurance and dose verification. In this context, traceable in-phantom dosimetry using a well-characterized point detector is often an important supplement to 2D-based quality assurance methods based on radiochromic film or detector arrays. In this study, an in-house developed dosimetry system based on fiber-coupled plastic scintillator detectors was evaluated and compared with a Farmer-type ionization chamber and a small-volume ionization chamber. An important feature of scintillator detectors is that the sensitive volume of the detector can easily be scaled, and five scintillator detectors of different scintillator length were thus employed to quantify volume averaging effects by direct measurement. The dosimetric evaluation comprised several complex-shape static fields as well as simplified dynamic deliveries using RapidArc, a volumetric-modulated arc therapy modality often used at the participating clinic. The static field experiments showed that the smallest scintillator detectors were in the best agreement with dose calculations, while needing the smallest volume averaging corrections. Concerning total dose measured during RapidArc, all detectors agreed with dose calculations within 1.1 ± 0.7% when positioned in regions of high homogenous dose. Larger differences were observed for high dose gradient and organ at risk locations, were differences between measured and calculated dose were as large as 8.0 ± 5.5%. The smallest differences were generally seen for the small-volume ionization chamber and the smallest scintillators. The time-resolved RapidArc dose profiles revealed volume-dependent discrepancies between scintillator and ionization chamber response, which confirmed that correction factors for ionization chambers in high temporal and spatial dose gradients are dominated by the volume averaging effect. The unique scaling of the scintillator volumes indicated how such time-dependent volume averaging corrections could be quantified. The time-resolved measurements further supported the claim that small-volume water equivalent detectors are most likely to accurately detect changes in dose delivery, although exact positioning of detectors remains critical.  相似文献   

14.
Trend anomaly detection is the practice of comparing and analyzing current and historical data trends to detect real-time abnormalities in online industrial data-streams. It has the advantages of tracking a concept drift automatically and predicting trend changes in the shortest time, making it important both for algorithmic research and industry. However, industrial data streams contain considerable noise that interferes with detecting weak anomalies. In this paper, the fastest detection algorithm “sliding nesting” is adopted. It is based on calculating the data weight in each window by applying variable weights, while maintaining the method of trend-effective integration accumulation. The new algorithm changes the traditional calculation method of the trend anomaly detection score, which calculates the score in a short window. This algorithm, SNWFD–DS, can detect weak trend abnormalities in the presence of noise interference. Compared with other methods, it has significant advantages. An on-site oil drilling data test shows that this method can significantly reduce delays compared with other methods and can improve the detection accuracy of weak trend anomalies under noise interference.  相似文献   

15.
介绍了低速模拟中低速叶型的设计准则,分析说明了叶型喉道宽度与最大厚度、最大厚度位置存在线性关系.建立了最大厚度模型,用以预估低速叶型最大厚度.为了验证最大厚度模型的适用性,进行了一组典型CDA叶型的低速模拟设计.数值计算结果显示,低速模拟叶型的特性和叶表无量纲速度分布较为一致,用以判断最大厚度模型适用性的吸力面峰值速度...  相似文献   

16.
快速倾斜镜的模糊PID自适应控制器设计   总被引:1,自引:0,他引:1  
刘敏 《光学技术》2008,34(2):227-229
介绍了一种快速倾斜镜系统。由于存在大气扰动、温度变化、环境干扰等非线性、不确定因素的影响,使用经典控制方法对这种变参数系统进行设计很难满足设计指标要求。在深入研究模糊理论的基础上,提出了模糊推理与经典PID相结合的自适应控制器设计方案,控制参数可以在线调整,解决了参数时变和建模不确定性的问题。仿真结果证明,模糊PID自适应控制器能很好地解决大气扰动、温度变化等环境干扰问题,改善了快速倾斜镜动态响应过程。研究结果为倾斜镜的快速控制提供了一种新的思路。  相似文献   

17.
Electrostatic probes for measuring the boundary plasma in tokamaks are reviewed and presented. Transport properties in JFT‐2, the ion temperature and the magnetic surface in JFT‐2M and floating potential fluctuations during the strong additional heating in JT‐60 are measured by several types of electrostatic probe the above‐mentioned purposes. The Langmuir probe including the double probe is applied to measure the spatial profile of boundary plasma in JFT‐2. The ion sensitive probe, the rotating cylindrical double probe, the asymmetric double probe and the differential double probe are applied to measure the ion temperature and magnetic surface in JFT‐2M. The reciprocating Langmuir probe applied to JFT‐2M observes the potential and density fluctuations and a new type probe is proposed for the quick diagnostic of core hot plasmas as a development of this probe. The fluctuation observed in JT‐60 is identified to be the ion cyclotron instability of the hot plasma caused by the strong anisotropy of the ion distribution function (© 2011 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

18.
Electromigration reliability remains a major threat to microelectronic circuits. Microstructure of a thin film conductor used in integrated circuit affects the electromigration lifetime significantly. A wealth of knowledge is acquired on thin film microstructure and electromigration in metallic interconnects from relevant studies in past few decades. However, it is noticed that the various techniques to measure microstructure-related attributes of thin film metallization are not presented in the context of electromigration, since these measurement techniques have their own importance. On the other hand, aggressive scaling of interconnect line-width down to nano regime, poses new challenges to microstructure characterization techniques. This article connects these two aspects of electromigration study, e.g., the characterization of microstructure and measurement techniques for the influential microstructural attributes especially for Cu-based interconnects. The microstructure-related parameters, attributes, and their impacts on electromigration lifetime are discussed. The sample preparation and various techniques to measure attributes of microstructure are presented in detail. This article describes the current state-of-the-art for the advancement of studying microstructure dependent electromigration reliability.  相似文献   

19.
Raman spectroscopy is a vibrational spectroscopic technique that can be used to monitor the therapeutic efficacy of anticancer drugs during carcinogenesis in a non‐invasive and label‐free manner. The present study aims to investigate the biochemical changes exerted upon free silibinin (SIL) and its nanoparticulate (SILNPs) treatment against 7,12‐dimethylbenz[a]anthracene (DMBA)‐induced oral carcinogenesis in the fingerprint region of 1800–500 cm−1 using HE‐785 Raman spectrometer. Raman spectra differed significantly between the control and tumor tissues, with tumor tissues characterized by increased intensities of vibrational bands such as nucleic acids, phenylalanine and tryptophan and a lower percentage of lipids when compared to the control tissues. Further, oral administration of free SIL and SILNPs significantly increased lipids and decreased the levels of tryptophan, phenylalanine and nucleic acid contents. Overall, the treatment of nanoparticulate SIL was found to be a more potent antitumor effect than free SIL in preventing the formation of tumor and also brought back the several Raman bands to a normal range in the buccal mucosa of hamsters during DMBA‐induced oral carcinogenesis. In addition, the detailed secondary structure of proteins in the control and experimental groups is also presented. Furthermore, the diagnostic algorithms based on principal component linear discriminant analysis (PC‐LDA) achieved an overall sensitivity of 94–100% and specificity of 76–100%. These results further demonstrate that Raman spectroscopy associated with PC‐LDA diagnostic algorithms could be a valuable tool for rapid and sensitive detection of specific biomolecular changes at the molecular level in response to anticancer drug. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

20.
病原菌检测对于保障饮用水和食品安全,应对突发公共卫生事件至关重要.现行检测标准或方法存在操作耗时费力,成本高等缺陷,难以满足现代社会高时效性要求,因此开发操作简单、低成本的病原菌快检技术迫在眉睫.近年来,随着激光技术和光电探测技术的高速发展,能够快速获取微生物指纹信息的激光光谱引起了研究者的广泛关注,其中表面增强拉曼光...  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号