Overview

This data set consists of 50,000 observations of the lengths of visits to the MSNBC website on September 28, 1999.

Details

Once a user enters a web site how many pages or links within the site does that user visit? The answer to this question may suggest actions to improve the site. If similar distributions for the number of pages visited per user are observed at different web sites, then maybe some laws can be established for all sites. Research efforts in this area are directed at finding these laws. This is a small part of the current effort to understand human behavior on the web.

The data set gives a random sample of the length of visits of users entering the msnbc.com web site during September 28, 1999. The length of the visit is an estimate of the total number of clicks or pages seen by each user and is based on web server logs, thus it counts only pages recorded by the server. Pages cached in the user’s browser or in a cache proxy server are unknown. The data were extracted from the clickstream data set in the UCI KDD Archive which itself comes from Internet Information Server (IIS) logs for msnbc.com and news-related portions of msn.com processed by Heckerman (2003).

Data Description

The data set contains only a single variable, the length of the user visit to the msnbc.com website.

msnbc_visits = read.table("data/msnbclength.dat")
colnames(msnbc_visits) = c("length")
head(msnbc_visits)

Data Files

Objectives

The goal is to find a distribution that fits the length of the visits to the msnbc.com website. In doing this, please address the following: