排序方式: 共有122条查询结果,搜索用时 0 毫秒
1.
Xiang Sun Lu Liu Ayodeji Ayorinde John Panneerselvam 《Digital Communications & Networks》2021,7(4):559-569
Online social media networks are gaining attention worldwide, with an increasing number of people relying on them to connect, communicate and share their daily pertinent event-related information. Event detection is now increasingly leveraging online social networks for highlighting events happening around the world via the Internet of People. In this paper, a novel Event Detection model based on Scoring and Word Embedding (ED-SWE) is proposed for discovering key events from a large volume of data streams of tweets and for generating an event summary using keywords and top-k tweets. The proposed ED-SWE model can distill high-quality tweets, reduce the negative impact of the advent of spam, and identify latent events in the data streams automatically. Moreover, a word embedding algorithm is used to learn a real-valued vector representation for a predefined fixed-sized vocabulary from a corpus of Twitter data. In order to further improve the performance of the Expectation-Maximization (EM) iteration algorithm, a novel initialization method based on the authority values of the tweets is also proposed in this paper to detect live events efficiently and precisely. Finally, a novel automatic identification method based on the cosine measure is used to automatically evaluate whether a given topic can form a live event. Experiments conducted on a real-world dataset demonstrate that the ED-SWE model exhibits better efficiency and accuracy than several state-of-art event detection models. 相似文献
2.
The task of topic tracking is to monitor a stream of stories and find all subsequent stories that discuss the same topic. Using Bayesian belief network we give three topic tracking models: a static topic model BSTM and two dynamic topic models BDTM-I, BDTM-II. BDTM-II merges the advantages of BSTM and BDTM-I, has better tracking performance than the former two, and effectively alleviates topic drift phenomenon. Applying unrelated coming stories to update BDTM-I and BDTM-II can filter noises existed in topics. Experiments on TDT corpora show that BSTM decreases (Cdet)norm by 5.5% comparing to VSM, BDTM-II decreases (Cdet)norm by 6.3% and 6.0% comparing to BSTM and BDTM-I respectively, using unrelated stories can improve the tracking performance. 相似文献
3.
一种基于主题相关度的网页排序算法 总被引:1,自引:0,他引:1
针对现有基于链接结构的PageRank算法的不足,提出了基于网页主题相关度的改进PageRank算法.通过分析网页内容,提取出网页中的链接及其对应的锚文本,建立网页链接库,利用向量空间模型(VSM)计算链接锚文本和网页内容的相关度,在此基础上实现离线计算改进后的PageRank算法.理论分析和仿真实验表明,改进的PageRank算法使用户能方便地找到所需网页,提高了网页查询效率. 相似文献
4.
The problem of "rich topics get richer"(RTGR) is popular to the topic models,which will bring the wrong topic distribution if the distributing process has not been intervened.In standard LDA(Latent Dirichlet Allocation) model,each word in all the documents has the same statistical ability.In fact,the words have different impact towards different topics.Under the guidance of this thought,we extend ILDA(Infinite LDA) by considering the bias role of words to divide the topics.We propose a self-adaptive topic model to overcome the RTGR problem specifically.The model proposed in this paper is adapted to three questions:(1) the topic number is changeable with the collection of the documents,which is suitable for the dynamic data;(2) the words have discriminating attributes to topic distribution;(3) a selfadaptive method is used to realize the automatic re-sampling.To verify our model,we design a topic evolution analysis system which can realize the following functions:the topic classification in each cycle,the topic correlation in the adjacent cycles and the strength calculation of the sub topics in the order.The experiment both on NIPS corpus and our self-built news collections showed that the system could meet the given demand,the result was feasible. 相似文献
5.
本文讨论了中文文本挖掘的三个问题:分词、关键词提取和文本分类。对分词问题,介绍了基于层叠隐马尔可夫模型的ICTCLAS分词法,以及将词与词之间的分隔视为缺失数据并用EM算法求解的WDM方法;对关键词提取问题,提出了贝叶斯因子法,并介绍了使用稀疏回归的CCS方法;对文本分类问题,介绍了根据关键词频率建立分类器的方法,以及先建立主题模型再根据主题概率建立分类器的方法。本文通过两组文本数据对上述方法进行比较,并给出使用建议。 相似文献
6.
7.
基于主题模型的半监督网络文本情感分类研究 总被引:1,自引:0,他引:1
针对网络评论文本的情感分类问题中存在的数据的不平衡性、无标记性和不规范性问题,提出一种基于主题的闽值调整的半监督学习模型,通过从非结构化文本中提取主题特征,对少量标注情感的文本训练分类器并优化指标调整闽值,达到识别用户评论的情感倾向的目的。仿真研究证明阈值调整的半监督模型对数据非平衡性和无标记性具有较强的适应能力。在实证研究中,对酒店评论文本数据构建的文本情感分类器显示该模型可以有效预测少数类评论样本的情感极性,证实了基于主题模型的闽值调整半监督网络评论文本情感分类模型在实际问题中的适用性与可行性。 相似文献
8.
为了保证抽取信息的全面性,主题划分成了不可或缺的工作。借助同义词词林,从词语的语义角度计算文本中各个段落间的相似度,建立段落文本关系图。基于文本关系图对归一化割分割准则中权值矩阵的构建做出调整,使之更能体现出段落间的相似程度,并使用该准则对文本进行主题划分。结果表明,该方法无论是对连续段落还是跨段落表达同一主题的主题划分均较为有效。 相似文献
9.
Miriam Louise Carnot Jorge Bernardino Nuno Laranjeiro Hugo Gonalo Oliveira 《Entropy (Basel, Switzerland)》2020,22(11)
The dependability of systems and networks has been the target of research for many years now. In the 1970s, what is now known as the top conference on dependability—The IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)—emerged gathering international researchers and sparking the interest of the scientific community. Although it started in niche systems, nowadays dependability is viewed as highly important in most computer systems. The goal of this work is to analyze the research published in the proceedings of well-established dependability conferences (i.e., DSN, International Symposium on Software Reliability Engineering (ISSRE), International Symposium on Reliable Distributed Systems (SRDS), European Dependable Computing Conference (EDCC), Latin-American Symposium on Dependable Computing (LADC), Pacific Rim International Symposium on Dependable Computing (PRDC)), while using Natural Language Processing (NLP) and namely the Latent Dirichlet Allocation (LDA) algorithm to identify active, collapsing, ephemeral, and new lines of research in the dependability field. Results show a strong emphasis on terms, like ‘security’, despite the general focus of the conferences in dependability and new trends that are related with ’machine learning’ and ‘blockchain’. We used the PRDC conference as a use case, which showed similarity with the overall set of conferences, although we also found specific terms, like ‘cyber-physical’, being popular at PRDC and not in the overall dataset. 相似文献
10.
Meizi Li Yang Xiang Bo Zhang Fazhuan Wei Qianqian Song 《International Journal of Communication Systems》2018,31(1)
Social network sites (SNS) presently face the task of grouping users into small subsets within themselves. In this study, an organizing scheme for single‐topic user groups is proposed for facilitating user sharing and communicating under common interests. The main rationales of the proposed scheme are (1) only an influential single topic is selected through its impact evaluation to attract users; (2) only the users having high degree of interest, explicit or implicit, on the topic should be grouped; and (3) trustworthy relationships among users are taken into consideration to enlarge the scale of user group. The proposed organizing scheme comprises 3 features: topic impact evaluation, interest degree measurement, and trust chain‐based organizing. The main structure of our proposed scheme is (1) an overview of the proposed scheme and its formal related definitions; (2) a topic impact evaluation method, ie, an importance evaluation and a popularity calculation; (3) a user interest degree measurement method, ie, explicit and implicit interest evaluation with dynamic factors included; (4) a trust chain calculation method based on the topology features of the trust chain; (5) an organizing algorithm for single topic user group, and finally, some experimental results and discussions to illustrate the effectiveness and feasibility of our scheme. 相似文献