1.  

    Example of a fastq file in read 1 (in paired read sequencing) is as follows:

    @SRR3117565.1.1 1 length=100
    NCAAAACAGCTCTCCCTCCTTTGATCTGATGGTCTGCAGAGGTCCTCAAATCCACACACTGCCACTCTTCAAGACCAACCACTGGGCCTTCTTAATCTCA
    +SRR3117565.1.1 1 length=100
    #1=BDDDDHFHHAHDE?GFEEDHG@HFHEECDCGHE:FDFHD*?DHFD<GEDEGIIIIIA=AFFGACHDH>EHHF>;;B<;A;>=A=??@CCCC>5>>AC
    @SRR3117565.2.1 2 length=100
    NTCCTGACTCACACGCCACAACCATGACTGGCTCAGCTCCCTTAATTCCAGCTTCCCTTACATGACGCAATTCCTTCTCAGATTCGGGTTTTCAGCTGAG
    +SRR3117565.2.1 2 length=100
    #4BDFFFFHHHHHJJJJJJJJJJJJJJJJJJJJJJIJJJJJJJJJJJJJJJJJJJJJJJJJJJJJIHHFEFEEEEEEDDEDDDDDDDDBDDDDDDDDDDD


    Example of a fastq file in read 2 is as follows:
    @SRR3117565.1.2 1 length=100
    TGGTTTTTTTTTTTGTCCCTCAAATTTTTGGACTCCGTAACATCAACCAGTTTGGAGTGGGATGACAGAGAGAATGCCCAATTTTGTGAGGCCCATGATT
    +SRR3117565.1.2 1 length=100
    59(2(3(2=9/>;?/=)))))().8<8))'-).6..8:(',)..)(((((-(53/(,,(((((,(((+(+(+2((((+((+((+23++(+2+++2++(((
    @SRR3117565.2.2 2 length=100
    CTGGCTTGTTATAACGCAAAGCTTGGTTGTTTATGCAACTCTATCTTAAGAACTGCCCAGCCTCAGCTGAAAACCCGAATCTGAGAAGGAATTGCGTCAT
    +SRR3117565.2.2 2 length=100
    CCCFFFFFHHHHHJJJJJJJJJJJJJIJJIJJJJJJJJJJJJJJJJJJJJJJJJIJJJJHIJJJHHHHHFFFFFDDDDDDDDEDDDDDDDDDDDDDDDDB
    Following script fix it such that both fastq files (corresponding to the paired sequencing reads) of sample SRR3117565 have similar read names.
    nohup sed 's/\([@|\+]SRR.*\)\.1.*/\1/' ./SRR3117565_1.fastq > ./SRR3117565_1.correctId.fastq &
    nohup sed 's/\([@|\+]SRR.*\)\.2.*/\1/'./SRR3117565_2.fastq > ./SRR3117565_2.correctId.fastq &

    Modify read names in bam files in Linux

    In this example I would show how to remove ":1" and ":2" at the end of the query/read-names that show the first and the second paired read. If the reads/query names start with "SRR" the following scripts can be used:

    samtools view ./file.bam | perl -pe 's/(^SRR.*?):[1-2]\t/\1\t/g' > ./file.sam
    cat ./file.sam | samtools view -bS - > ./file_modified.bam

    or all together the following script can be run:

    samtools view ./file.bam | perl -pe 's/(^SRR.*?):[1-2]\t/\1\t/g' | samtools view -bS - > ./file_modified.bam

    In the end the string modification may mix up the header of the bam files, as a solution to the problem the header could be seprataed the then reattached as following:
    samtools view -H /netapp/seqRawData/eugeneMouse/DIV0.bam > /netapp/seqRawData/eugeneMouse/DIV0_head.sam &
    samtools view /netapp/seqRawData/eugeneMouse/DIV0.bam | perl -pe 's/(^.*?):[1-2]\t/\1\t/g' > /netapp/seqRawData/eugeneMouse/DIV0M.sam &
    cat /netapp/seqRawData/eugeneMouse/DIV0_head.sam /netapp/seqRawData/eugeneMouse/DIV0M.sam |samtools view -bS - > /netapp/seqRawData/eugeneMouse/DIV0M.bam

    0

    Add a comment

Labels
Blog Archive
My Blog List
My Blog List
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.