Yesterday (on the opening day of the new Batman movie) I search the Internet for the Batman formula and it's implementations in R. I found several links that used the ggplot2 library however only one was working with the latest version of the package. With minor changes, this is the result that I got for the Batman curve.
In order to distinguish the codes for the individual curves you can run this script in R which will result to the following image
In this post I show how groupScatterPlot(), function of the rnatoolbox R package can be used for plotting the individual values in several groups together with their mean (or other statistics). I think this is a useful function for plotting grouped data when some groups (or all groups) have few data points ! You may be wondering why to include such function in the rnatoolbox package ?! Well ! I happen to use it quit a bit for plotting expression values of different groups of genes/transcripts in a sample or expression levels of a specific gene/transcript in several sample groups. These expression value are either FPKM, TPM, LCPM, or PSI values (Maybe I should go through these different normalizations later in a different post 😐!). But of course its application is not restricted to gene expression or RNAseq data analysis.
In this post I show how classifySex(), function of the rnatoolbox R package can be used for inferring the sex of the studied subjects from their binary alignment bam files. The sex can be a source of unwanted variation within the data, for which you may want to adjust your differential gene expression or splicing analysis. However, complete metadata are unfortunately not always available. Furthermore, sometimes details within metadata are incorrect or have been misplaced due to manual error. Therefore, it is a good practice to quickly double check some details within the data to either complete the missing metadata information or to make sure that the prior stages have been performed without any accidental mix-ups. For muscle tissues, this showed to be useful on our ribo-depleted RNAseq data.
NOTE! Earlier the function referred to in this post was named differently(i.e. getGender). Since version 0.2.1 classifySex() is used.
Recently I have started to organize my commonly used functions related to quality assessment and analyzing RNAseq data into an R package. It is called rnatoolbox and it is available here. In this post I introduce getMappedReadsCount(), i.e. a function that can be used for checking the number of aligned/mapped fragments in several bam files and detecting the outliers. The outliers are the bam files with oddly high (i.e. exceeding1.5 times the interquartile) and oddly low (i.e. lower than 1.5 times the interquartile) number of mapped fragments.
Many times, in our projects, we may need to compare different measured factors in our samples to one another, and study whether they are linearly dependent. These information can also help us to detect covariates and factors that affect our studies but we would like to adjust for/remove their effects (more on this at sometime later). Here, I mention several functions that can be used to perform correlation tests. All of these functions do support both Pearson and ranked (Spearman) methods. Note that in the end of this post I will focus on these two different methods (i.e. Pearson vs Spearman) and show their differences in application.
Occasionally when indexing data frames the format is converted, leading to confusing consequences. As for instance, when indexing to select a single column the result is a 'numeric' or 'integer' vector. The following demonstrates this :
Example of a fastq file in read 1 (in paired read sequencing) is as follows:
@SRR3117565.1.1 1 length=100
NCAAAACAGCTCTCCCTCCTTTGATCTGATGGTCTGCAGAGGTCCTCAAATCCACACACTGCCACTCTTCAAGACCAACCACTGGGCCTTCTTAATCTCA
+SRR3117565.1.1 1 length=100
#1=BDDDDHFHHAHDE?GFEEDHG@HFHEECDCGHE:FDFHD*?DHFDEHHF>;;B<;A;>=A=??@CCCC>5>>AC
@SRR3117565.2.1 2 length=100
NTCCTGACTCACACGCCACAACCATGACTGGCTCAGCTCCCTTAATTCCAGCTTCCCTTACATGACGCAATTCCTTCTCAGATTCGGGTTTTCAGCTGAG
+SRR3117565.2.1 2 length=100
#4BDFFFFHHHHHJJJJJJJJJJJJJJJJJJJJJJIJJJJJJJJJJJJJJJJJJJJJJJJJJJJJIHHFEFEEEEEEDDEDDDDDDDDBDDDDDDDDDDD
Yesterday (on the opening day of the new Batman movie) I search the Internet for the Batman formula and it's implementations in R. I found several links that used the ggplot2 library however only one was working with the latest version of the package. With minor changes, this is the result that I got for the Batman curve.
Add a comment