Overview

The data set consists of 50,000 observations of the time between arrival of packets of data over a two-minute period from the Digital Equipment Servers on March 8th, 1995.

Details

Messages that flow from a source to a destination through the internet network are also known as traffic which travels according to some protocols. This traffic and the network conditions are extremely random in nature. The most common protocol for traffic is TCP or transfer control protocol, which acknowledges packet receipt. One of the ways to capture this traffic is by looking at the logs that servers keep of all incoming traffic or tcpdumps. The data set dec-pkt-1 comes from one such log, and allows us to measure the number of packets per unit of time, as well as the interarrival time between packets. These are two very important messages to determine traffic congestion and network performance.

The data is a set of 50,000 (1.3 MB) observations containing roughly 2 minutes of traffic from a one-hour time period. The data came from the logs of Digital Equipment corporation servers and was archived in the Internet Traffic Archive, http://ita.ee.lbl.gov/html/traces.html, after being sanitized, i.e., all content that was private was removed. The trace from that public archive that is used here is called dec-pkt-1 and corresponds to DEC-WRL-1 in the article “Wide-Area Traffic: The Failure of Poisson Modeling” by V. Paxson and S. Floyd (IEEE/ACM Transactions on Networking, 3(3), pp. 226-244, 1995). The original dec-pkt-1.tcp data summarize traces of one hour’s worth of TCP packet traffic (2,153,462 rows), between Digital Equipment Corporation and the rest of the world on March 8th, 1995. It contains information about 2,153,462 million packets. The data is the processed version of the raw logs of traffic that the server keeps.

Data Description

Variable Description
timestamp Time of packet arrivals, in minutes since the last hour
source Source of the packet or host, with a code for confidentiality reasons
destination Destination host with code for confidentiality reasons
sourceport Source TCP port
destport Destination TCP port
databytes Number of data bytes in the packet, or 0 if none\(^*\)

\(^*\) This can happen for packets that only acknowledge data sent by the other side. The 0 bytes packets are usually removed.

traffic_data = read.table("data/packetdata.dat", header = TRUE)
head(traffic_data)

Data Files

Objectives

We will consider the number of packets that arrive per second. This quantity can be computed from the information in the data set (see R code below). The goal is to find a distribution that fits the number of packets that arrive per second.

pac_per_second = matrix(c(rep(0, 3600))) # 3600 seconds in two minutes
for (i in 1:3600) {
  # count the number of packets per second
  pac_per_second[i] = length(traffic_data$timestamp[(floor(traffic_data$timestamp)) == i])
}  
pac_per_second = pac_per_second[pac_per_second > 0]  # remove 0s
head(pac_per_second)
## [1] 264 322 197 345 488 620