首页 | 本学科首页   官方微博 | 高级检索  
     

改进的卷积神经网络源代码相似性度量方法
引用本文:谢春丽,蔺疆旭,刘小洋,张文斌,黄军伟. 改进的卷积神经网络源代码相似性度量方法[J]. 应用数学和力学, 2019, 40(11): 1235-1245. DOI: 10.21656/1000-0887.400221
作者姓名:谢春丽  蔺疆旭  刘小洋  张文斌  黄军伟
作者单位:江苏师范大学 计算机科学与技术学院, 江苏 徐州 221116
基金项目:国家自然科学基金(61773185;61877030;61502212); 江苏省高校青蓝工程
摘    要:源代码相似性是指不同代码段功能上的相似程度,是软件工程领域一项重要的研究问题.现有的方法主要从文本、结构两方面,利用代码的统计学特征计算相似性,其最大缺点就是无法表达代码的语义特征.为解决此类问题,提出了一种融合统计信息的卷积神经网络(statistics information for code embedding convolutional neural networks, SICE-CNN)源代码相似性检测方法.该方法首先通过词嵌入对源代码进行信息表示,获取代码的词嵌入向量信息;其次,构建CNN训练模型学习源代码文档的嵌入表示;最后,计算源代码对的余弦相似值.实验表明,该方法和一般词嵌入方法相比提高了一定的性能,能较好地检测源代码的语义相似性.

关 键 词:深度学习   卷积神经网络   代码相似性   词嵌入
收稿时间:2019-07-22

A Source Code Similarity Approach Based on Improved Convolutional Neural Networks
Affiliation:School of Computer Science & Technology, Jiangsu Normal University, Xuzhou, Jiangsu 221116, P.R.China
Abstract:The source code similarity refers to the functional similarity of different code segments, which touches off important research in the field of software engineering. The existing methods mainly extracted texts and structure features manually from source codes to calculate the similarity based on the statistical information in disregard of the semantic characteristics of codes. To solve this problem, a source code similarity detection method based on the CNN was proposed. First, the source code was represented through word embedding to obtain the vector information of word embedding. Second, the CNN training model was constructed to learn the embedded representation of source code documents. Finally, the cosine similarity value of source code pairs was calculated. Experiments show that, the proposed method can certainly improve the performance with respect to the semantic similarity of source codes.
Keywords:
本文献已被 CNKI 等数据库收录!
点击此处可从《应用数学和力学》浏览原始摘要信息
点击此处可从《应用数学和力学》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号