结构变异可视化工具SVbyEye

1 Introduction

GitHub地址
文章链接

使用手册

大佬Evan E. Eichler团队基于Miropeats开发的用于基因组结构变异可视化的工具SVbyEye,使用ggplot2绘图语法实现。
Pasted image 20250412155600

2 PAF格式输入文件

需要输入的文件是PAF格式的,官方使用的比对软件是minimap2 (version >=2.24),但是任何sequence-to-sequence比对生成的结果只要可以转换为PAF格式都可以。如果使用minimap2进行比对的话不能设置-a这个参数,这个参数生成的结果是SAM格式;但是要设置-c这个参数将 CIGAR strings导出为PAF文件中的cg标签;建议设置–eqx参数将CIGAR string标记为=/X,这个是用来表示match/mismatch的。
Pasted image 20250412161314

2.1 assembly-to-reference

1
minimap2 -x asm20 -c –eqx –secondary=no {reference.fasta} {query.fasta} > {output.alignment}

2.2 sequence-to-sequence

1
minimap2 -x asm20 -c –eqx –secondary=no {target.fasta} {query.fasta} > {output.alignment}

2.3 all-versus-all

1
minimap2 -x asm20 -c –eqx -D -P –dual=no {input.multi.fasta} {input.multi.fasta} > {output.ava.alignment}

3 读取和过滤PAF文件

3.1 读取

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
## 获取PAF文件的路径

paf.file <- system.file("extdata", "test1.paf",package = "SVbyEye")

## 读取PAF文件
paf.table <- readPaf(

paf.file = paf.file,

include.paf.tags = TRUE, restrict.paf.tags = "cg"

)

## 查看读入的PAF文件
paf.table

3.2 过滤

1
2
## 基于比对的大小进行筛选
filterPaf(paf.table = paf.table, min.align.len = 100000)

3.3 转换

1
2
## 强制调换PAF比对结果中的链特征
flipPaf(paf.table = paf.table, force = TRUE)

4 sequence-to-sequence比对结果可视化

可视化的主要函数是plotMiro(),可以可视化一个序列比对到参考基因组的结果,也可以可视化两个及以上序列相互比对的结果。这个函数输出的图像是水平的,最上面的是target sequence,下面的是query sequence.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
library(SVbyEye)

## Get PAF to plot
paf.file <- system.file("extdata", "test1.paf",
package = "SVbyEye"
)

## Read in PAF
paf.table <- readPaf(
paf.file = paf.file,
include.paf.tags = TRUE, restrict.paf.tags = "cg"
)

paf.table

## Make a plot colored by alignment directionality
plotMiro(paf.table = paf.table, color.by = "direction")

Pasted image 20250412165205
1
2
# 设置颜色
plotMiro(paf.table = paf.table, color.palette = c("+" = "azure3", "-" = "yellow3"))

Pasted image 20250412165241
1
2
# 使用相似度上色,而不是方向
plotMiro(paf.table = paf.table, color.by = "identity")

Pasted image 20250412165313
1
2
# 设置划窗的大小
plotMiro(paf.table = paf.table, binsize = 1000)

Pasted image 20250412165121

4.1 添加注释信息

1
2
3
4
5
6
7
8
9
## Make a plot
plt <- plotMiro(paf.table = paf.table)
## Load target annotation file
target.annot <- system.file("extdata", "test1_target_annot.txt", package = "SVbyEye")
target.annot.df <- read.table(target.annot, header = TRUE, sep = "\t", stringsAsFactors = FALSE)
target.annot.gr <- GenomicRanges::makeGRangesFromDataFrame(target.annot.df)

## Add target annotation as arrowhead
addAnnotation(ggplot.obj = plt, annot.gr = target.annot.gr, coordinate.space = "target")

Pasted image 20250412165739

1
2
3
4
5
6
7
## Load query annotation file
query.annot <- system.file("extdata", "test1_query_annot.txt", package = "SVbyEye")
query.annot.df <- read.table(query.annot, header = TRUE, sep = "\t", stringsAsFactors = FALSE)
query.annot.gr <- GenomicRanges::makeGRangesFromDataFrame(query.annot.df)

## Add query annotation as rectangle
addAnnotation(ggplot.obj = plt, annot.gr = query.annot.gr, shape = "rectangle", coordinate.space = "query")

Pasted image 20250412165852

4.2 添加基因注释信息

1
2
3
4
5
6
7
8
9
## Create gene-like annotation
test.gr <- GenomicRanges::GRanges(
seqnames = 'target.region',
ranges = IRanges::IRanges(start = c(19000000,19030000,19070000),
end = c(19010000,19050000,19090000)),
ID = 'gene1')

## Add single gene annotation
addAnnotation(ggplot.obj = plt, annot.gr = test.gr, coordinate.space = "target",shape = 'rectangle', annotation.group = 'ID', fill.by = 'ID')

Pasted image 20250412170137

4.3 强调某个部分

1
2
3
4
5
6
7
## Make a plot
plt1 <- plotMiro(paf.table = paf.table, add.alignment.arrows = FALSE)

## Highlight alignment as a filled polygon
addAlignments(ggplot.obj = plt1, paf.table = paf.table[3,], fill.by = 'strand',
fill.palette = c('+' = 'red'))
addAlignments(ggplot.obj = plt1, paf.table = paf.table[4,], linetype = 'dashed')

4.4 插入和缺失检测与可视化

1
2
plotMiro(paf.table = paf.table, min.deletion.size = 50, highlight.sv = "fill") -> p
ggsave(p, file = "D:/OneDrive/桌面/4.pdf", width = 8, height = 4, dpi = 500, bg = "white")

Pasted image 20250412170634

4.5 热图展示GC含量

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
## Get PAF to plot
paf.file <- system.file("extdata", "test_getFASTA.paf",
package = "SVbyEye"
)

## Read in PAF
paf.table <- readPaf(
paf.file = paf.file,
include.paf.tags = TRUE, restrict.paf.tags = "cg"
)

## Make a plot colored by alignment directionality
plt <- plotMiro(paf.table = paf.table, color.by = "direction")

## Get query sequence composition
fa.file <- paf.file <- system.file("extdata", "test_getFASTA_query.fasta",
package = "SVbyEye"
)

## Calculate GC content per 5kbp bin
gc.content <- fasta2nucleotideContent(fasta.file = fa.file,
binsize = 5000, nucleotide.content = 'GC')

## Add GC content as annotation heatmap
addAnnotation(ggplot.obj = plt, annot.gr = gc.content, shape = 'rectangle',
fill.by = 'GC_nuc.freq', coordinate.space = 'query', annotation.label = 'GCcont')

Pasted image 20250412170915

4.6 多序列比对结果

1
2
3
4
5
6
7
## Get PAF to plot
paf.file <- system.file("extdata", "test_ava.paf", package = "SVbyEye")

## Read in PAF
paf.table <- readPaf(paf.file = paf.file, include.paf.tags = TRUE, restrict.paf.tags = "cg")

plotAVA(paf.table = paf.table, binsize = 20000)

Pasted image 20250412171154

5 全基因组比对可视化

1
2
3
4
5
6
7
8
9
## Get PAF to plot ##
paf.file <- system.file("extdata", "PTR_test.paf", package = "SVbyEye")

## Read in PAF
paf.table <- readPaf(paf.file = paf.file)

## Color by alignment directionality

plotGenome(paf.table = paf.table, chromosomes = paste0('chr', c(4, 5, 9, 12)), chromosome.bar.width = grid::unit(2, 'mm'), min.query.aligned.bp = 5000000)

Pasted image 20250412171533


结构变异可视化工具SVbyEye
https://lixiang117423.github.io/article/svbyeye/
作者
李详【Xiang LI】
发布于
2025年5月9日
许可协议