R for all data visualisation in biology

About me

My name is Dorien Pastoors, I'm a PhD student at Erasmus MC in Rotterdam and on this website I'm sharing tips and tutorials on data visualisation in R. During my PhD I'm working on gene regulation in leukemia.

GGPlot for all biology data visualisation

Many people know they can use R and GGPLot for visualsation of large datasets. However, they are also often much more suitable for the analysis of actually any data you generate in a lab!

Advantages of using GGplot

  • Data grouping is a lot easier in R than in Prism, which costs less time, and makes mistakes much less likely to happen

  • Beautiful figures that are more customizable

It can be quite hard to get started on which packages to use so on this page I've collected some tutorials I've made based on analysis I used to do Prism/Excel/FlowJo/...

All files you need to do the tutorials yourself are posted on my github. If you're unfamiliar with github, here are some instructions to help you get started.

Gating in of flow cytometry data in FlowJO and visualisation with ggplot/R

While FlowJO is great for making and inspecting manual gating, its layout editor can be frustrating at times. However, doing the gating in Flowjo and the rest with ggplot is actually not very difficult at all as I hope to show you here!

In order to conviently be able to plot flow cytometry data with R, we would like to be able to use gating created in external software, such as Flowjo workspaces, and combine this with all the plotting options from ggplot.

Using this, you can:

  • Make multi-graph overlays as shown on the right

  • Easily arrange and annotate your dotplots and show gates


The drawing that could have been a graph

When using drawings instead of graphs, almost always, you will loose information in the drawing that is contained in the data underlying (as simplification is often the point of using drawings in the first place). This tutorial is an example of this dilemma, where I first wanted to make an illustration showing the similarity between two proteins, but then decided this could, and should in fact be, a graph.

While there are probably a thousand ways to visualise protein alignments already out there, I could not find one that was simple enough for an introductory slide so I decided to make my own. I wanted to be able to see in one view you where two proteins diverge in sequence and how this relates to the location of their functional domains. While you can of course show a domainogram of two proteins and just say “they are most conserved within their functional domains” , or “domain X is not conserved but domain Y is”, you can also actually visualise this! And for me this made it actually also much more clear for myself.


+ data import from 96-wells plate reader data

Frequently, dose response data is collected in excel files from colorimetric plate readers. Getting them into Graphpad requires quite a lot of paste-transpose! This takes a lot of time, and the more you have to do it, the more room for error arises.

Using this script, you can:

  • Import data directly from an excel file in 96-well format with an excel plate layout

  • Using dplyr, it is easy to group your data, normalise, make heatmaps, and identify outliers

  • Calculate the IC50 using the drc package.


How to get started in visualising your own data in R?

Obviously there are many different types of data you might collect and visualise using GGPlot that are not covered anywhere else. This page gives some tips in tricks on how to get started. The important/ most difficult thing to start is often how to import your data. For most lab-based experiments data is collected in ways that are not immediately compatible with import into R. For example, plate reader data is often in the form of a plate layout, where each well corresponds to a specific sample, while flow cytometry data has 1 file for each sample. For most dataset, the following 5 tips are very important I think.

  • Learn from packages + google everything [ probably someone already tried to do the same ]

  • Use a design table to keep an overview of which file is coming from where and facilitate data import

  • Collect your data in long format, if possible

  • Check your input data

  • Writing functions to import multiple files