Example of a fastq file in read 1 (in paired read sequencing) is as follows:

@SRR3117565.1.1 1 length=100
NCAAAACAGCTCTCCCTCCTTTGATCTGATGGTCTGCAGAGGTCCTCAAATCCACACACTGCCACTCTTCAAGACCAACCACTGGGCCTTCTTAATCTCA
+SRR3117565.1.1 1 length=100
#1=BDDDDHFHHAHDE?GFEEDHG@HFHEECDCGHE:FDFHD*?DHFD<GEDEGIIIIIA=AFFGACHDH>EHHF>;;B<;A;>=A=??@CCCC>5>>AC
@SRR3117565.2.1 2 length=100
NTCCTGACTCACACGCCACAACCATGACTGGCTCAGCTCCCTTAATTCCAGCTTCCCTTACATGACGCAATTCCTTCTCAGATTCGGGTTTTCAGCTGAG
+SRR3117565.2.1 2 length=100
#4BDFFFFHHHHHJJJJJJJJJJJJJJJJJJJJJJIJJJJJJJJJJJJJJJJJJJJJJJJJJJJJIHHFEFEEEEEEDDEDDDDDDDDBDDDDDDDDDDD


Example of a fastq file in read 2 is as follows:
@SRR3117565.1.2 1 length=100
TGGTTTTTTTTTTTGTCCCTCAAATTTTTGGACTCCGTAACATCAACCAGTTTGGAGTGGGATGACAGAGAGAATGCCCAATTTTGTGAGGCCCATGATT
+SRR3117565.1.2 1 length=100
59(2(3(2=9/>;?/=)))))().8<8))'-).6..8:(',)..)(((((-(53/(,,(((((,(((+(+(+2((((+((+((+23++(+2+++2++(((
@SRR3117565.2.2 2 length=100
CTGGCTTGTTATAACGCAAAGCTTGGTTGTTTATGCAACTCTATCTTAAGAACTGCCCAGCCTCAGCTGAAAACCCGAATCTGAGAAGGAATTGCGTCAT
+SRR3117565.2.2 2 length=100
CCCFFFFFHHHHHJJJJJJJJJJJJJIJJIJJJJJJJJJJJJJJJJJJJJJJJJIJJJJHIJJJHHHHHFFFFFDDDDDDDDEDDDDDDDDDDDDDDDDB
Following script fix it such that both fastq files (corresponding to the paired sequencing reads) of sample SRR3117565 have similar read names.
nohup sed 's/\([@|\+]SRR.*\)\.1.*/\1/' ./SRR3117565_1.fastq > ./SRR3117565_1.correctId.fastq &
nohup sed 's/\([@|\+]SRR.*\)\.2.*/\1/'./SRR3117565_2.fastq > ./SRR3117565_2.correctId.fastq &

Modify read names in bam files in Linux

In this example I would show how to remove ":1" and ":2" at the end of the query/read-names that show the first and the second paired read. If the reads/query names start with "SRR" the following scripts can be used:

samtools view ./file.bam | perl -pe 's/(^SRR.*?):[1-2]\t/\1\t/g' > ./file.sam
cat ./file.sam | samtools view -bS - > ./file_modified.bam

or all together the following script can be run:

samtools view ./file.bam | perl -pe 's/(^SRR.*?):[1-2]\t/\1\t/g' | samtools view -bS - > ./file_modified.bam

In the end the string modification may mix up the header of the bam files, as a solution to the problem the header could be seprataed the then reattached as following:
samtools view -H /netapp/seqRawData/eugeneMouse/DIV0.bam > /netapp/seqRawData/eugeneMouse/DIV0_head.sam &
samtools view /netapp/seqRawData/eugeneMouse/DIV0.bam | perl -pe 's/(^.*?):[1-2]\t/\1\t/g' > /netapp/seqRawData/eugeneMouse/DIV0M.sam &
cat /netapp/seqRawData/eugeneMouse/DIV0_head.sam /netapp/seqRawData/eugeneMouse/DIV0M.sam |samtools view -bS - > /netapp/seqRawData/eugeneMouse/DIV0M.bam

0

Add a comment

In this post I show how groupScatterPlot(), function of the rnatoolbox R package can be used for plotting the individual values in several groups toge

In this post I show how classifySex(), function of the rnatoolbox R package can be used for inferring the sex of  the studied subjects from their bina

2
Recently I have started to organize my commonly used functions related to quality assessment and analyzing RNAseq data into an R package.

Many times, in our projects, we may need to compare different measured factors in our samples to one another, and study whether they are linearly depe

2
Many times, in our projects, we may need to compare different measured factors in our samples to one another, and study whether they are linearly depe
Many times, in our projects, we may need to compare different measured factors in our samples to one another, and study whether they are linearly depe
Many times, in our projects, we may need to compare different measured factors in our samples to one another, and study whether they are linearly depe
Occasionally when indexing data frames the format is converted, leading to confusing consequences.

Example of a fastq file in read 1 (in paired read sequencing) is as follows:

@SRR3117565.1.1 1 length=100

NCAAAACAGCTCTCCCTCCTTTGATCTGATGGTCTGCAGAGG

Yesterday (on the opening day of the new Batman movie) I search the Internet for the Batman formula and it's implementations in R.

Yesterday (on the opening day of the new Batman movie) I search the Internet for the Batman formula and it's implementations in R.
Labels
Blog Archive
About Me
About Me
My Photo
I am a Postdoc researcher at the Neuromuscular Disorders Research lab and Genetic Determinants of Osteoporosis Research lab, in University of Helsinki and Folkhälsan RC. I specialize in Bioinformatics. I am interested in Machine learning and multi-omics data analysis. My go-to programming language is R.
My Blog List
My Blog List
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.