首页 | 本学科首页   官方微博 | 高级检索  
     

基于CouchDB和Elastic Search的高性能化学结构搜索引擎与数据库的构建
引用本文:李任之,李博杰,张国桢,江俊,罗毅. 基于CouchDB和Elastic Search的高性能化学结构搜索引擎与数据库的构建[J]. 化学物理学报, 2018, 31(3): 341-349
作者姓名:李任之  李博杰  张国桢  江俊  罗毅
作者单位:中国科学技术大学化学与材料科学科学学院, 合肥微尺度物质科学国家研究中心, 合肥 230026,中国科学技术大学化学与材料科学科学学院, 合肥微尺度物质科学国家研究中心, 合肥 230026,中国科学技术大学化学与材料科学科学学院, 合肥微尺度物质科学国家研究中心, 合肥 230026,中国科学技术大学化学与材料科学科学学院, 合肥微尺度物质科学国家研究中心, 合肥 230026,中国科学技术大学化学与材料科学科学学院, 合肥微尺度物质科学国家研究中心, 合肥 230026
摘    要:计算机辅助的化学结构搜索在化学信息学中地位十分重要,本文设计了一套高性能的化学结构和化学数据搜索系统,称为DCAIKU.DCAIKU基于CouchDB无模式数据库和ElasticSearch基础架构构建,通过将结构相似性搜索变换为文字搜索实现了高性能和高灵活性的检索引擎:在满足化学信息存储的高灵活性条件下,仍然可以做到低延迟和高准确性,同时拥有良好的伸缩性,可以大规模并行化和集群化.

关 键 词:搜索引擎  化学信息学  结构检索  无模式数据库
收稿时间:2017-11-06
修稿时间:2017-12-25

A High-Performance and Flexible Chemical Structure & Data Search Engine Built on CouchDB & ElasticSearch
Ren-zhi Li,Bo-jie Li,Guo-zhen Zhang,Jun Jiang and Yi Luo. A High-Performance and Flexible Chemical Structure & Data Search Engine Built on CouchDB & ElasticSearch[J]. Chinese Journal of Chemical Physics, 2018, 31(3): 341-349
Authors:Ren-zhi Li  Bo-jie Li  Guo-zhen Zhang  Jun Jiang  Yi Luo
Affiliation:Hefei National Laboratory for Physical Sciences at the Microscale, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, China,Hefei National Laboratory for Physical Sciences at the Microscale, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, China,Hefei National Laboratory for Physical Sciences at the Microscale, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, China,Hefei National Laboratory for Physical Sciences at the Microscale, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, China and Hefei National Laboratory for Physical Sciences at the Microscale, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, China
Abstract:Computer-assisted chemical structure searching plays a critical role for efficient structure screening in cheminformatics. We designed a high-performance chemical structure & data search engine called DCAIKU, built on CouchDB and ElasticSearch engines. DCAIKU converts the chemical structure similarity search problem into a general text search problem to utilize off-the-shelf full-text search engines. DCAIKU also supports flexible document structures and heterogeneous datasets with the help of schema-less document database. Our evaluations show that DCAIKU can handle both keyword search and structural search against millions of records with both high accuracy and low latency. We expect that DCAIKU will lay the foundation towards large-scale and cost-effective structural search in materials science and chemistry research.
Keywords:Search engine  Cheminformatics  Structural search  Schema-less databases
点击此处可从《化学物理学报》浏览原始摘要信息
点击此处可从《化学物理学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号