Trimmomatic SE(single-end)(SE数据去低质量)

分析模块封装了Trimmomatic工具,Trimmomatic是一个针对Illumina高通量测序的reads trim工具,支持paired-end(双末端)和single-end(单末端)数据。


Trimmomatic包括如下功能:

l  ILLUMINACLIP: Cut adapter and other illumina-specific sequences from the read. 去除接头污染。

l  SLIDINGWINDOW: Perform a sliding window trimming, cutting once the average quality within the window falls below a threshold. 以固定窗口滑动,去除低质量。

l  MINLEN: Drop the read if it is below a specified length. 过滤长度过短的read

l  LEADING: Cut bases off the start of a read, if below a threshold quality. 去除read头部低质量。

l  TRAILING: Cut bases off the end of a read, if below a threshold quality. 去除read尾部低质量

l  CROP: Cut the read to a specified length. 去除read尾部序列,将read截成指定长度。

l  HEADCROP: Cut the specified number of bases from the start of the read. 去除read头部固定长度的序列。


输入:

对于single-end(单末端)数据,输入单个FASTQ文件。

对于paired-end(双末端)数据,输入两个FASTQ文件(R1和R2)。

设置质量值参数,Illumina 1.3-1.7 Phred+64 对应Illumina早期平台,Illumina 1.8+ Phred+33 对应Illumina最新平台,默认参数为:Illumina 1.8+ Phred+33


输出:

对于single-end(单末端)数据,输出修剪和过滤的clean data数据,为单个FASTQ文件。

对于paired-end(双末端)数据,输出四个文件,分别为:

两个FASTQ文件(R1-paired and R2-paired),包含read的两端pairR1R2)均通过数据质控的结果文件。

额外的两个FASTQ文件(R1-unpaired and R2-unpaired),包含read,其中一端pairR1 R2)通过数据质控,另一端无法通过数据质控,这样,就仅保留了一端的数据结果。


附录:

对于常规的RNADNA测序,HiSeq4000HiSeqXTen平台,PE100PE150,建议使用如下参数设置:

Perform initial ILLUMINACLIP stepYes

Maximum mismatch count which will still allow a full match to be performed2

How accurate the match between the two 'adapter ligated' reads must be for PE palindrome read alignment30

How accurate the match between any adapter etc. sequence must be against a read10

Perform Sliding window trimming (SLIDINGWINDOW)Yes

Number of bases to average across20

Average quality required20

Drop reads below a specified length (MINLEN)Yes

Minimum length of reads to be kept35

Cut bases off the end of a read, if below a threshold quality (TRAILING)Yes

Minimum quality required to keep a base20

即,去接头污染,比对允许的最大错配数为2palindrome模式下匹配碱基数阈值为30simple模式下的匹配碱基数阈值为10。过滤read尾部质量值20以下的碱基,设置20bp的窗口,如果窗口内的平均质量值低于20,从窗口开始截去后端碱基,过滤质控后35bp以下的read

分析模块引用了Trimmomatic v0.32 软件( http://www.usadellab.org/cms/index.php?page=trimmomatic )


相关文献如下所示:

Bolger, A.M., Lohse, M., & Usadel, B. (2014). Trimmomatic: A flexible trimmer for Illumina Sequence Data. Bioinformatics, btu170.