Sometimes painstaking text modifications specially when one is dealing with large data could easily be solved using few lines of scripts of a programming language. In the following I'll issue an R function that could be used to add a text to the end of the sequence names of a multi/single fasta file. The lines of the fasta file that represent the sequence names start with '>' sign. you can also download the script from here .

addTextToFastaName<-function(fileIn, fileOut, addText){

    # Read fasta file line by line into a vector
    tmp=scan(fileIn, what='character')

    # Replace each element of the vector that starts with '>', i.e. the sequence names, with the same names whilest addText added to them
    res= gsub('(^>.*)', paste('\\1',addText,sep=""), tmp)

    # Write the result line by line to the output file
    cat(res, file=fileOut, sep='\n')
}

0

Add a comment

In this post I show how groupScatterPlot(), function of the rnatoolbox R package can be used for plotting the individual values in several groups toge

In this post I show how classifySex(), function of the rnatoolbox R package can be used for inferring the sex of  the studied subjects from their bina

2
Recently I have started to organize my commonly used functions related to quality assessment and analyzing RNAseq data into an R package.

Many times, in our projects, we may need to compare different measured factors in our samples to one another, and study whether they are linearly depe

2
Many times, in our projects, we may need to compare different measured factors in our samples to one another, and study whether they are linearly depe
Many times, in our projects, we may need to compare different measured factors in our samples to one another, and study whether they are linearly depe
Many times, in our projects, we may need to compare different measured factors in our samples to one another, and study whether they are linearly depe
Occasionally when indexing data frames the format is converted, leading to confusing consequences.

Example of a fastq file in read 1 (in paired read sequencing) is as follows:

@SRR3117565.1.1 1 length=100

NCAAAACAGCTCTCCCTCCTTTGATCTGATGGTCTGCAGAGG

Yesterday (on the opening day of the new Batman movie) I search the Internet for the Batman formula and it's implementations in R.

Yesterday (on the opening day of the new Batman movie) I search the Internet for the Batman formula and it's implementations in R.
Labels
Blog Archive
About Me
About Me
My Photo
I am a Postdoc researcher at the Neuromuscular Disorders Research lab and Genetic Determinants of Osteoporosis Research lab, in University of Helsinki and Folkhälsan RC. I specialize in Bioinformatics. I am interested in Machine learning and multi-omics data analysis. My go-to programming language is R.
My Blog List
My Blog List
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.