Bioinformatics with Python Cookbook
上QQ阅读APP看书,第一时间看更新

How to do it...

This recipe is an aggressive simplification of the previous one because it illustrates the conciseness and elegance of R magics:

  1. The first thing you need to do is load R magics and ggplot2:
import rpy2.robjects as robjects
import rpy2.robjects.lib.ggplot2 as ggplot2%load_ext rpy2.ipython

Note that the % starts an IPython-specific directive. Just as a simple example, you can write %R print(c(1, 2)) on a Jupyter cell.

Check out how easy it is to execute the R code without using the robjects package. Actually, rpy2 is being used to look under the hood.

  1. Let's read the sequence.index file that was downloaded in the previous recipe:
%%R
seq.data <- read.delim('sequence.index', header=TRUE, stringsAsFactors=FALSE)
seq.data$READ_COUNT <- as.integer(seq.data$READ_COUNT)
seq.data$BASE_COUNT <- as.integer(seq.data$BASE_COUNT)

You can then specify that the whole cell should be interpreted as an R code by using %%R (note the double %%).

  1. We can now transfer the variable to the Python namespace:
seq_data = %R seq.data
print(type(seq_data)) # pandas dataframe!

The type of the DataFrame is not a standard Python object, but a pandas DataFrame. This is a departure from previous versions of the R magic interface.

  1. As we have a pandas DataFrame, we can operate on it quite easily using pandas' interface:
my_col = list(seq_data.columns).index("CENTER_NAME")
seq_data['CENTER_NAME'] = seq_data['CENTER_NAME'].apply(lambda x: x.upper())
  1. Let's put this DataFrame back in the R namespace, as follows:
%R -i seq_data
%R print(colnames(seq_data))

The -i argument informs the magic system that the variable that follows on the Python space is to be copied in the R namespace. The second line just shows that the DataFrame is indeed available in R. The name that we are using is different from the original—it's seq_data instead of seq.data.

  1. Let's do some final cleanup (for details, see the precious recipe) and print the same bar chart as before:
%%R
bar <- ggplot(seq_data) + aes(factor(CENTER_NAME)) + geom_bar() + theme(axis.text.x = element_text(angle = 90, hjust = 1))
print(bar)

The R magic system also allows you to reduce code, as it changes the behavior of the interaction of R with IPython. For example, in the ggplot2 code of the previous recipe, you do not need to use the .png and dev.off R functions, as the magic system will take care of this for you. When you tell R to print a chart, it will magically appear in your Notebook or graphical console.