Sometimes painstaking text modifications specially when one is dealing with large data could easily be solved using few lines of scripts of a programming language. In the following I'll issue an R function that could be used to add a text to the end of the sequence names of a multi/single fasta file. The lines of the fasta file that represent the sequence names start with '>' sign. you can also download the script from here .

addTextToFastaName<-function(fileIn, fileOut, addText){

    # Read fasta file line by line into a vector
    tmp=scan(fileIn, what='character')

    # Replace each element of the vector that starts with '>', i.e. the sequence names, with the same names whilest addText added to them
    res= gsub('(^>.*)', paste('\\1',addText,sep=""), tmp)

    # Write the result line by line to the output file
    cat(res, file=fileOut, sep='\n')
}

0

Add a comment

Labels
Blog Archive
About Me
About Me
My Photo
I am a Postdoc researcher at the Neuromuscular Disorders Research lab and Genetic Determinants of Osteoporosis Research lab, in University of Helsinki and Folkhälsan RC. I specialize in Bioinformatics. I am interested in Machine learning and multi-omics data analysis. My go-to programming language is R.
My Blog List
My Blog List
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.