Hands-On Exploratory Data Analysis with R
上QQ阅读APP看书,第一时间看更新

About the dataset

The dataset that we will be focusing on throughout this chapter is the Auto.MPG dataset, which is used predominantly with the R language. This dataset gives the complete details of fuel economy data for the years 1999 and 2008 for 38 popular car models. This dataset also comes with the ggplot2 package, which we will cover in the coming chapters.

For now, we will focus on importing the dataset from the CSV file, which you can download from the following link: 

https://github.com/PacktPublishing/Hands-On-Exploratory-Data-Analysis-with-R/tree/master/ch03

For more details pertaining to the dataset, you can refer to the following link:

https://archive.ics.uci.edu/ml/datasets/auto+mpg

Once the download is complete, we can import the CSV file into the dataset. With this conversion, we can include the dataset in the R workspace:

> mpg <-read.csv("highway_mpg.csv", stringsAsFactors = FALSE)
> View(mpg)

From this, we get the following output:

As shown in the preceding screenshot, the Auto.MPG dataset includes various attributes, as follows:

The dataset, which is represented in tabular format, is as follows:

The description, including data types for each attribute of the dataset, can be achieved with the following command:

> str(mpg)   
'data.frame':  234 obs. of  11 variables:   
 $ manufacturer: chr  "audi" "audi"   "audi" "audi" ...   
 $ model       : chr  "a4" "a4"   "a4" "a4" ...   
 $ displ       : num  1.8 1.8 2 2 2.8 2.8   3.1 1.8 1.8 2 ...   
 $ year        : int  1999 1999 2008 2008   1999 1999 2008 1999 1999 2008 ...   
 $ cyl         : int  4 4 4 4 6 6 6 4 4 4   ...   
 $ trans       : chr  "auto(l5)"   "manual(m5)" "manual(m6)" "auto(av)" ...   
 $ drv         : chr  "f" "f"   "f" "f" ...   
 $ cty         : int  18 21 20 21 16 18   18 18 16 20 ...   
 $ hwy         : int  29 29 31 30 26 26   27 26 25 28 ...   
 $ fl          : chr  "p" "p"   "p" "p" ...   
 $ class       : chr  "compact"   "compact" "compact" "compact" ...   

The str function is declared as an alternative to the summary function. It displays the internal structure of an R object in a compact manner.