
上QQ阅读APP看书,第一时间看更新
Examining, cleaning, and filtering data
The next steps after importing the data are to examine it and check for missing or erroneous data. We then need to clean the data and apply filters and selections. Different kinds of datasets need different approaches to carry out these steps. R has powerful packages to handle this and some of them are as follows:
- dplyr: dplyr is a powerful R package that provides methods to make examining, cleaning, and filtering data fast and easy.
- tidyr: The tidyr package helps to organize messy data for easier data analysis.
- stringr: The stringr package provides methods and techniques of working with string data efficiently.
- forcats: Factors are widely used while doing data analysis in R. The forcats package makes it easy to work with factors.
- lubridate: lubridate makes wrangling date-time data quick and easy.
- hms: hms is a great package for handling datasets that include data with time of day values.
- blob: Not all data always comes stored in plain ASCII text; you sometimes have to deal with binary data formats. The blob package makes this easy.