PDF文件文本内容提取研究 Research on the Text Extraction from PDF Files期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

PDF文件文本内容提取研究

引用本文：	张秀秀,张立峰.PDF文件文本内容提取研究[J].科技情报开发与经济,2008,18(36):118-120.

作者姓名：	张秀秀张立峰

作者单位：	中国科学院国家科学图书馆兰州分馆,甘肃,兰州,730000

基金项目：	中国科学院知识创新工程青年人才领域前沿项目“元数据自动抽取工具在数字知识库建设中的应用研究与开发”的研究成果之一

摘要：	介绍了PDF的文件结构，在此基础上，给出了PDF文件的解析流程，以及从解析后的内容流中提取文本内容的方法。
关键词：	PDF 文件解析文本提取
Research on the Text Extraction from PDF Files

ZHANG Xiu-xiu,ZHANG Li-feng.Research on the Text Extraction from PDF Files[J].Sci-Tech Information Development & Economy,2008,18(36):118-120.

Authors:	ZHANG Xiu-xiu ZHANG Li-feng

Institution:	ZHANG Xiu-xiu, ZHANG Li-feng

Abstract:	This paper introduces the structure of PDF documents,and shows the procedures for file parsing and text extraction from the parsed content streams.

Keywords:	PDF file parsing text extraction
本文献已被 CNKI 维普万方数据等数据库收录！