Abstract: | The traffic on an Internet link is a packet stream: packets of varying sizes arriving for transmission on the link. Each packet has an arrival time, and contained within the packet are headers that carry many critical variables. Packet traces, which consist of captured headers and measurements of the arrival times, convey substantial information about the Internet—security, usage, network performance, and the performance of engineering protocols. This article discusses strategies for the analysis of very large databases of packet traces, and the architecture of a software system that facilitates the use of these strategies. The system has a pipeline: (1) raw packet traces; (2) a database with objects tailored to ensuing analyses; and (3) an environment with tools for data analysis: statistical methods, model fitting, and visualization. The pipeline addresses the full set of tasks in the study of packet streams, from the initial processing of raw packet traces to the final output, often a visual display. S-Net—an extensible, open-source software implementation of this architecture—is based on the R implementation of the S language for graphics and data analysis, and has been developed on Linux. |