软件版本:v.0.4.1-alpha
报错信息:
1
| [0a/15eb38] NOTE: Process egapx:rnaseq_long_plane:rename_fasta_ids (1) terminated with an error exit status (1) -- Execution is retried (1)
|
报错信息的大概意思是在处理三代转录组的时候讲fastq文件转换为fasta文件的时候报错了。
egapx/nf/subworkflows/ncbi/setup/main.nf文件中原始脚本的处理逻辑:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
| process rename_fasta_ids { input: tuple val(sampleID), path(fastx, stageAs: "reads/*") val srr_id output: tuple val(sampleID), path ('output/*') , emit: 'fasta_pair_list' script: file_name = fastx.getBaseName() + '.fasta' """ #!/usr/bin/env python3 import os os.makedirs('output', exist_ok=True) with open('${fastx}', 'r') as infile, open('output/${file_name}', 'w') as outfile: rec_cnt = 1 skip_next = False for line in infile: line = line.lstrip() if not line: continue if line[0] in {'>', '@', '+'}: new_id = f"gnl|SRA|SRR{${srr_id}:08d}.{rec_cnt}.1" if line[0] in {'>', '@'}: outfile.write(f">{new_id}{os.linesep}") if line[0] in {'>', '+'}: rec_cnt += 1 if line[0] == '+': skip_next = True elif skip_next: skip_next = False else: outfile.write(line) """ stub: file_name = fastx.getBaseName() + '.fasta' """ mkdir -p output echo $srr_id > output/$file_name """ }
|
这个脚本只能处理未压缩的文件,但是我输入的是压缩后的格式。
解决方法:把上面这段代码替换为下面的代码,能够自动识别是压缩的还是为压缩的文件。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
| process rename_fasta_ids { input: tuple val(sampleID), path(fastx, stageAs: "reads/*") val srr_id output: tuple val(sampleID), path ('output/*') , emit: 'fasta_pair_list' script: file_name = fastx.getBaseName() + '.fasta' """ #!/usr/bin/env python3 import os import gzip # <--- [新增] 引入 gzip 模块
os.makedirs('output', exist_ok=True)
input_path = '${fastx}' # <--- [新增] 判断文件后缀并选择打开方式 if input_path.endswith('.gz'): open_func = gzip.open mode = 'rt' # read text mode else: open_func = open mode = 'r'
# <--- [修改] 使用 open_func 替代 open with open_func(input_path, mode) as infile, open('output/${file_name}', 'w') as outfile: rec_cnt = 1 skip_next = False for line in infile: line = line.lstrip() if not line: continue if line[0] in {'>', '@', '+'}: # 注意:这里 ${srr_id} 是 Nextflow 变量插值,保留原样 new_id = f"gnl|SRA|SRR{${srr_id}:08d}.{rec_cnt}.1" if line[0] in {'>', '@'}: outfile.write(f">{new_id}{os.linesep}") if line[0] in {'>', '+'}: rec_cnt += 1 if line[0] == '+': skip_next = True elif skip_next: skip_next = False else: outfile.write(line) """ stub: file_name = fastx.getBaseName() + '.fasta' """ mkdir -p output echo $srr_id > output/$file_name """ }
|
后续的minimap2还是会报错:
1
| NOTE: Process `egapx:rnaseq_long_plane:minimap2:minimap2_wnode (50)` terminated with an error exit status (3) -- Execution is retried (3)
|
找到一个解决方案:https://github.com/ncbi/egapx/issues/166
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
| process rename_fasta_ids { input: tuple val(sampleID), path(fastx, stageAs: "reads/*") val srr_id output: tuple val(sampleID), path ('output/*') , emit: 'fasta_pair_list' script: file_name = fastx.getBaseName() + '.fasta' def srrFmt = String.format('SRR%08d', (srr_id as int)) """ mkdir -p output seqkit fq2fa -j 32 '${fastx}' | seqkit replace -j 32 -w 0 -p '.*' -r 'gnl|SRA|${srrFmt}.{nr}.1' > output/${file_name} """ stub: file_name = fastx.getBaseName() + '.fasta' """ mkdir -p output echo $srr_id > output/$file_name """ }
|