Python提取fasta文件成单行文件

R语言对fasta这种超大的字符文件进行处理真的是太慢了,Python是真的香啊!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
import os
import time

start = time.time()

os.chdir('C:/Users/Administrator/Desktop/')
print(os.getcwd())

res_dict = {}

with open('ylg.protein.pep','r') as pep:
for line in pep:
if line.startswith('>'):
name = line.strip().split()[0]
res_dict[name] = ''
else:
res_dict[name] += line.replace('\n','')

print(len(res_dict))

for cds_id, sequence in res_dict.items():
#print(cds_id)
#print(sequence)
#time.sleep(2)
with open('pep.seq.txt', 'a') as file:
file.write(cds_id.replace('>','') + "\t" + sequence + "\n")

end = time.time()
print(end - start)

4万多个基于32万多行,耗时5.12s。和R相比真的是很快了。

💌lixiang117423@foxmail.com
💌lixiang117423@gmail.com


Python提取fasta文件成单行文件
https://lixiang117423.github.io/article/7136a0b7/
作者
小蓝哥
发布于
2021年4月7日
许可协议