首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 140 毫秒
1.
传统针对文本数据的分析,往往基于词频、词频逆文本统计量作为文本的表示特征.这类方法往往只反映了文本的部分信息,忽略了文本的内在语义特征.本文研究了中文词语衔接的概率语言模型,其基本思想在于根据文本中词语出现的先后顺序进行建模分析,该模型在短文本数据挖掘中能够很好地针对文本语义进行量化分析.主要解决两类问题:一、如何合理地将中文词转化为数字向量,并且保证中文近义词在数字空间特征上的相似性;二、如何建立恰当的向量空间,将中文文本的语义和结构特征等信息保留在向量空间中.最后结合某城市房屋管理部门留言板的实际留言文本数据,利用BP神经网络和RNN网络两种算法,实现概率语言模型的求解.与传统文本处理方法的对比说明,本文的模型方法针对短文本语义挖掘问题具有一定的优势性.  相似文献   

2.
评估两个句子的语义相似性是文本检索和文本摘要等自然语言处理任务的重要组成部分.学者利用深度神经网络执行该任务,然而它们依赖于上下文独立词向量,从而导致性能不佳.为了缓解该问题,采用预训练模型BERT替换传统的词向量,并提出交叉自注意力以增强两个句子的语义,然后与BERT结合.在提出的模型中,为了实现交叉自注意力操作,设计了向量的对齐方法.最后,将BERT输出输入一个双向循环神经网络,以稳定性能,克服BERT自身带来的波动性.实验中,采用3个公开数据集DBMI2019、CDD-ref和CDD-ful对提出的混合模型进行评价.实验结果表明,由于使用了BERT生成的语境词向量,提出模型的性能始终优于现存方法;交叉自注意力实现了彼此的语义交互而增强了句对的语义,使得相似句对的语义差异更小,而不相似句对的语义差异更大,提高了相似性评估的性能.最终,提出模型在DBMI2019、CDD-ref和CDD-ful上分别取得了0.846,0.849和0.845的皮尔逊相关系数,超越了直接以[CLS]输出向量作为评估的方法.  相似文献   

3.
针对具有不同粒度语言评价矩阵和属性未知的群决策问题,给出了一种基于二元语义和TOPSIS算法的群决策方法。在该方法中,首先给出了不同粒度语言评价矩阵一致化为由基本语言评价集表示的二元语义信息的方法;然后引入TOPSIS的方法,结合二元语义形式计算规则,确定未知的属性客观权重,利用二元语义集结算子,得到单个决策者对方案的评价值;再通过T-OWA算子对各决策者给出的评价信息进行集结和方案选优;最后给出了一个算例。  相似文献   

4.
在群体信息集结过程中,专家提供的评估信息可以通过各种偏好形式表示.为了集结异构偏好信息,文章提出一种基于矩阵相似性的异构群体偏好信息集结方法.首先,应用不同的转化函数将不同的异构偏好信息转换为模糊偏好关系矩阵,然后,提出一种改进的基于矩阵相似性的幂加权平均算子,并给出该算子的一些性质.在偏好信息集结过程中,基于矩阵相似性的幂加权平均算子不仅考虑了偏好信息之间的相似程度,还能反映不同偏好信息之间的支撑程度.最后,基于提出的改进幂加权平均算子,构建一种新的异构群体偏好信息集结方法,并使用基于互补判断矩阵的排序方法来选择最佳决策方案.两个数值例子说明了文章提出方法的有效性和可行性.  相似文献   

5.
为了有效积累和重用直升机事故案例蕴含的宝贵知识,从而提升通用航空安全水平,提出了一种基于本体的直升机事故案例建模与检索方法.首先基于运营决策支持和安全信息应用这一目标建立了直升机事故案例建模与检索的系统框架结构;然后分析了直升机事故案例特征,进而设计了基于本体的直升机事故案例表示;在此基础上,面向不同应用需求,设计了不同的检索索引,并提出了语义属性和数值属性的相似性计算方法,从而实现多尺度的案例检索.开发的原型系统及其应用表明,直升机事故案例建模与检索实现了案例知识的积累和应用,为通用航空安全信息处理提供了系统的解决方案.  相似文献   

6.
歧义问题的描述和消除问题是制约计算语言学发展的瓶颈问题.将交叉熵引入计算语言学消岐领域.采用语句的真实语义作为交叉熵的训练集的先验信息,将机器翻译的语义作为测试集后验信息,计算两者的交叉熵,并以交叉熵指导对歧义的辨识和消除.实例表明,该方法简洁有效,易于计算机自适应实现,交叉熵不失为计算语言学消岐的一种较为有效的工具.  相似文献   

7.
抗癌药物敏感性数据的缺失会对后续癌症数据分析产生重要影响.高通量测序技术为构建计算模型,有效预测抗癌药物敏感性提供了可能.依据已有的合理性假设:相似的细胞系对于目标药物具有相似的反应;相似的药物对于目标细胞系具有相似的反应,本文综合考虑了细胞系的基因表达和基因突变特征,给出细胞系相似性新的定义形式,结合药物相似性度量方法,提出了"细胞系-药物K近邻"计算模型,并将其成功应用于癌症细胞系百科全书(CCLE),得到的抗癌药物敏感性预测结果明显优于已有的经典模型.  相似文献   

8.
提出了一种基于模糊聚类的属性匹配算法。该算法采用能综合反映属性名称相似性和语义相似性的模糊相似关系,提高了属性匹配的准确率;以等价闭包法对相似属性进行模糊聚类,得到多层次属性分类结果,更客观真实地反映了属性匹配的模糊性;同时,属性匹配过程中不需要设置匹配参数,避免了人为造成的误差。  相似文献   

9.
由于信息的不完全性,我们得到的表示事物特征的数据往往是一些区间值.基于区间值的模糊聚类分析方法,可以更大程度地保留信息.首次提出了基于区间值的最大最小法构造相似系数,给出解决区间数多指标信息聚类问题的计算步骤和基于区间值的聚类方法,最后,通过算例说明了所给出的聚类方法在实际中的应用.  相似文献   

10.
基于稀疏重构的图像修复依赖于图像全局自相似性信息的利用和稀疏分解字典的选择,为此提出了基于分类学习字典全局稀疏表示模型的图像修复思路.该算法首先将图像未丢失信息聚类为具有相似几何结构的多个子区域,并分别对各个子区域用K-SVD字典学习方法得到与各子区域结构特征相适应的学习字典.然后根据图像自相似性特点构建能够描述图像块空间组织结构关系的全局稀疏最大期望值表示模型,迭代地使用该模型交替更新图像块的组织结构关系和损坏图像的估计直到修复结果趋于稳定.实验结果表明,方法对于图像的纹理细节、结构信息都能起到好的修复作用.  相似文献   

11.
Steganography is concerned with communicating hidden messages in such a way that no one apart from the sender and the intended recipient can detect the very existence of the message. We study the syndrome coding method (sometimes also called the “matrix embedding method”), which uses a linear code as an ingredient. Among all codes of a fixed block length and fixed dimension (and thus of a fixed information rate), an optimal code is one that makes it most difficult for an eavesdropper to detect the presence of the hidden message. We show that the average distance to code is the appropriate concept that replaces the covering radius for this particular application. We completely classify the optimal codes in the cases when the linear code used in the syndrome coding method is a one- or two-dimensional code over . In the steganography application this translates to cases when the code carries a high payload (has a high information rate).  相似文献   

12.
针对具有工艺路径柔性的车间调度问题,提出基于OR子图和子路径的工艺路径柔性描述方法,该描述方法形式简单且允许OR子图多层嵌套。以此为基础,设计了基于遗传算法的工艺路径柔性调度算法,并采用以工艺路径编码、机器编码和工件调度编码为基础的三维染色体编码策略,其中,工艺路径编码和机器编码分别通过最大子路径数量和最大机器数量随机产生,其优势在于任意染色体均表示可行解,并可以使用简单的交叉算子和变异算子实现遗传操作且其后代亦为可行解。最后通过实验证明了算法的优化能力。  相似文献   

13.
Two partial orders that play an important role in the combinatorics of words, can be defined in a natural way on the free monoid X * generated by the finite alphabet X: the infix and the embedding orders. A set C of nonempty words is called an infix code (hypercode) over X if C is an antichain with respect to the infix (embedding) order. A set of words is said to be e-convex if it is convex with respect to the embedding order. Two characterizations of the e-convex infix codes are given as well as a sufficient condition for such codes to be finite. It is shown that the family EIC(X) of the e-convex infix codes with the empty word forms, under the operation of concatenation, a free submonoid of the free monoid B(X) of the biprefix codes and that the generating alphabet of EIC(X) is a sub-alphabet of the generating alphabet of B(X).This research was supported by Grant A7877 of the Natural Sciences and Engineering Council of Canada.  相似文献   

14.
The genetic code is the interface between the genetic information stored in DNA molecules and the proteins. Considering the hypothesis that the genetic code evolved to its current structure, some researches use optimization algorithms to find hypothetical codes to be compared to the canonical genetic code. For this purpose, a function with only one objective is employed to evaluate the codes, generally a function based on the robustness of the code against mutations. Very few random codes are better than the canonical genetic code when the evaluation function based on robustness is considered. However, most codons are associated with a few amino acids in the best hypothetical codes when only robustness is employed to evaluate the codes, what makes hard to believe that the genetic code evolved based on only one objective, i.e., the robustness against mutations. In this way, we propose here to use entropy as a second objective for the evaluation of the codes. We propose a Pareto approach to deal with both objectives. The results indicate that the Pareto approach generates codes closer to the canonical genetic code when compared to the codes generated by the approach with only one objective employed in the literature.  相似文献   

15.
Reed-Solomon codes are widely used to establish a reliable channel to transmit information in digital communication which has a strong error correction capability and a variety of efficient decoding algorithm.Usually we use the maximum likelihood decoding(MLD) algorithm in the decoding process of Reed-Solomon codes.MLD algorithm relies on determining the error distance of received word.Dür,Guruswami,Wan,Li,Hong,Wu,Yue and Zhu et al.got some results on the error distance.For the Reed-Solomon code C,the received word u is called an ordinary word of C if the error distance d(u,C) =n-deg u(x) with u(x) being the Lagrange interpolation polynomial of u.We introduce a new method of studying the ordinary words.In fact,we make use of the result obtained by Y.C.Xu and S.F.Hong on the decomposition of certain polynomials over the finite field to determine all the ordinary words of the standard Reed-Solomon codes over the finite field of q elements.This completely answers an open problem raised by Li and Wan in[On the subset sum problem over finite fields,Finite Fields Appl.14 (2008) 911-929].  相似文献   

16.
This paper presents explicit constructions of fingerprinting codes. The proposed constructions use a class of codes called almost secure frameproof codes. An almost secure frameproof code is a relaxed version of a secure frameproof code, which in turn is the same as a separating code. This relaxed version is the object of our interest because it gives rise to fingerprinting codes of higher rate than fingerprinting codes derived from separating codes. The construction of almost secure frameproof codes discussed here is based on weakly biased arrays, a class of combinatorial objects tightly related to weakly dependent random variables.  相似文献   

17.
一种基于人工语言系统的模糊语言近似方法   总被引:1,自引:0,他引:1  
提出一种基于人工语言系统的模糊语言近似方法 ,包括对单个词和合成词的语义的语言近似算法、相关定理及证明 ,实现由语义可能性分布表示向自然语言中的词及句子的转换。  相似文献   

18.
We continue the investigation of locally testable codes, i.e., error‐correcting codes for which membership of a given word in the code can be tested probabilistically by examining it in very few locations. We give two general results on local testability: First, motivated by the recently proposed notion of robust probabilistically checkable proofs, we introduce the notion of robust local testability of codes. We relate this notion to a product of codes introduced by Tanner and show a very simple composition lemma for this notion. Next, we show that codes built by tensor products can be tested robustly and somewhat locally by applying a variant of a test and proof technique introduced by Raz and Safra in the context of testing low‐degree multivariate polynomials (which are a special case of tensor codes). Combining these two results gives us a generic construction of codes of inverse polynomial rate that are testable with poly‐logarithmically many queries. We note that these locally testable tensor codes can be obtained from any linear error correcting code with good distance. Previous results on local testability, albeit much stronger quantitatively, rely heavily on algebraic properties of the underlying codes. © 2006 Wiley Periodicals, Inc. Random Struct. Alg., 2006  相似文献   

19.
In this article a method is given for embedding a finitely generated free monoid as a dense subset of the unit interval. This gives an order topology for the monoid such that the submonoids generated by an important class of maximal codes occur as “thick” subsets. As an ordered topological space, the notion of thickness in a frec monoid can be interpreted in a number of ways. One such notion is that of density. In particular, subsets of a free monoid that fail to meet all two sided ideals (the thin sets, of which recognizable codes are an example) are shown (corollary 4.2) to be nowhere dense. Furthermore, it is shown (corollary 5.1) that a thin code is maximal if and only if the submonoid that it generates is dense on some interval. Thus thin codes that are maximal are precisely those that generate thick submonoids. Another notion of thickness is that of category. The embedding allows the free monoid to be viewed as a subspace of the unit interval. In theorem 5.6 it is shown that a thin code is maximal just in case the closure of the submonoid that it generates is second category in the unit interval. A mild connection with Lebesque measure is then made. In what follows, all free monoids are assumed to be generated by a finite set of at least two elements. IfA is such a set, thenA * denotes the free monoid generated byA. The setA is called an alphabet, the elements ofA * are called words, ande denotes the empty word inA *. Topological terminology and notation follows that of Kelley [2].  相似文献   

20.
We consider upper bounds on two fundamental parameters of a code; minimum distance and covering radius. New upper bounds on the covering radius of non-binary linear codes are derived by generalizing a method due to S. Litsyn and A. Tietäväinen lt:newu and combining it with a new upper bound on the asymptotic information rate of non-binary codes. The upper bound on the information rate is an application of a shortening method of a code and is an analogue of the Shannon-Gallager-Berlekamp straight line bound on error probability. These results improve on the best presently known asymptotic upper bounds on minimum distance and covering radius of non-binary codes in certain intervals.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号