普通文件和HTML文件及XML文件信息检索过程探析 Research on the Information Retrieval Process of Plain Text, HTML and XML期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

普通文件和HTML文件及XML文件信息检索过程探析

引用本文：	陈桂鸿. 普通文件和HTML文件及XML文件信息检索过程探析[J]. 科技情报开发与经济, 2009, 19(11)

作者姓名：	陈桂鸿

作者单位：	中山大学资讯管理系,广东广州,510275

摘要：	通过对普通文件(Plain Text)、HTML文件和XML文件结构的分析,以经典的VSM为例,探讨了3种文件在信息检索过程中所采用的不同处理技术.同时针对传统VSM的不足以及HTML文件和XML文件的结构特点.讨论了N-Level VSM对经典VSM的改进.
关键词：	普通文件 XML文件 HTML文件信息检索
Research on the Information Retrieval Process of Plain Text, HTML and XML

CHEN Gui-hong. Research on the Information Retrieval Process of Plain Text, HTML and XML[J]. Sci-Tech Information Development & Economy, 2009, 19(11)

Authors:	CHEN Gui-hong

Abstract:	Through analyzing the file structure of plain text, HTML and XML, this paper probes into the different technologies of the three kinds of files used in the information retrieval process taking the classical VSM, and discusses the improvement of N-Level VSM to the classical VSM based on the shortages of traditional VSM and the structural features of HTML and XML.

Keywords:	VSM N-Level VSM
本文献已被 CNKI 维普万方数据等数据库收录！