首页 | 本学科首页   官方微博 | 高级检索  
     


SinaPlot: An Enhanced Chart for Simple and Truthful Representation of Single Observations Over Multiple Classes
Authors:Nikos Sidiropoulos  Sina Hadi Sohi  Thomas Lin Pedersen  Bo Torben Porse  Ole Winther  Nicolas Rapin
Affiliation:1. The Finsen Laboratory, Rigshospitalet, Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark;2. Biotech Research and Innovation Centre (BRIC), University of Copenhagen, Copenhagen, Denmark;3. The Bioinformatics Centre, Department of Biology, Faculty of Natural Sciences, University of Copenhagen, Denmark;4. The Bioinformatics Centre, Department of Biology, Faculty of Natural Sciences, University of Copenhagen, Denmark;5. DTU Compute, Technical University of Denmark, Lyngby, Denmark;6. Gubra ApS, H?rsholm, Denmark;7. Novo Nordisk Foundation Center for Stem Cell Biology, DanStem, University of Copenhagen, Copenhagen, Denmark;8. Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark
Abstract:Recent developments in data-driven science have led researchers to integrate data from several sources, over diverse experimental procedures, or databases. This alone poses a major challenge in truthfully visualizing data, especially when the number of data points varies between classes. To aid the representation of datasets with differing sample size, we have developed a new type of plot overcoming limitations of current standard visualization charts. SinaPlot is inspired by the strip chart and the violin plot and operates by letting the normalized density of points restrict the jitter along the x-axis. The plot displays the same contour as a violin plot but resembles a simple strip chart for a small number of data points. By normalizing jitter over all classes, the plot provides a fair representation for comparison between classes with a varying number of samples. In this way, the plot conveys information of both the number of data points, the density distribution, outliers and data spread in a very simple, comprehensible, and condensed format. The package for producing the plots is available for R through the CRAN network using base graphics package and as geom for ggplot through ggforce. We also provide access to a web-server accepting excel sheets to produce the plots (http://servers.binf.ku.dk:8890/sinaplot/).
Keywords:Big data  Bioinformatics  Visualization
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号