NCBI数据库BioProject中的Description

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
import requests
from bs4 import BeautifulSoup

file_path = "D:/OneDrive/NAS/科研相关/PhData/data/生信挖掘/水稻多效基因/data/NCBI.BioProject.Rice.txt"

file_out = open("D:/OneDrive/NAS/科研相关/PhData/data/生信挖掘/水稻多效基因/data/NCBI.BioProject.Rice.description.txt", "w", encoding='utf-8')

with open(file_path, "r") as f:
for line in f:
# print(line.replace("\n", ""))

# URL of the BioProject
url = "https://www.ncbi.nlm.nih.gov/bioproject/" + line.replace("\n", "")

# print(url)

# Send a GET request to the webpage
response = requests.get(url)

if response.status_code == 200:
# Parse the page content
soup = BeautifulSoup(response.content, "html.parser")

# Extract specific information
# title = soup.find("div", id="DescrAll").get_text(strip=True)

try:
description = soup.find("div", id="DescrAll").get_text(strip=True).replace("\n", " ")
except AttributeError:
description = "None"
except UnicodeEncodeError:
continue

# Print the extracted information
# print(f"Title: {title}")
# print(f"Description: {description}")

file_out.write(line.replace("\n", "") + "\t" + description + "\n")

print(line.replace("\n", "") + "\t" + description + "\n")

print("================================================")
else:
print(f"BioProject{line}: Failed to retrieve the webpage. Status code: {response.status_code}")

NCBI数据库BioProject中的Description
https://lixiang117423.github.io/article/ncbi.bioproject/
作者
李详【Xiang LI】
发布于
2024年6月16日
许可协议