结构变异可视化工具SVbyEye

1 Introduction

使用手册

大佬Evan E. Eichler团队基于Miropeats开发的用于基因组结构变异可视化的工具SVbyEye，使用ggplot2绘图语法实现。
Pasted image 20250412155600

2 PAF格式输入文件

需要输入的文件是PAF格式的，官方使用的比对软件是minimap2 (version >=2.24)，但是任何sequence-to-sequence比对生成的结果只要可以转换为PAF格式都可以。如果使用minimap2进行比对的话不能设置-a这个参数，这个参数生成的结果是SAM格式；但是要设置-c这个参数将 CIGAR strings导出为PAF文件中的cg标签；建议设置–eqx参数将CIGAR string标记为=/X，这个是用来表示match/mismatch的。
Pasted image 20250412161314

2.1 assembly-to-reference

1	`minimap2 -x asm20 -c –eqx –secondary=no {reference.fasta} {query.fasta} > {output.alignment}`

2.2 sequence-to-sequence

1	`minimap2 -x asm20 -c –eqx –secondary=no {target.fasta} {query.fasta} > {output.alignment}`

2.3 all-versus-all

1	`minimap2 -x asm20 -c –eqx -D -P –dual=no {input.multi.fasta} {input.multi.fasta} > {output.ava.alignment}`

3 读取和过滤PAF文件

3.1 读取

## 获取PAF文件的路径

paf.file <- system.file("extdata", "test1.paf",package = "SVbyEye")

## 读取PAF文件
paf.table <- readPaf(

    paf.file = paf.file,

    include.paf.tags = TRUE, restrict.paf.tags = "cg"

)

## 查看读入的PAF文件
paf.table

3.2 过滤

1 2	`## 基于比对的大小进行筛选 filterPaf(paf.table = paf.table, min.align.len = 100000)`

3.3 转换

1 2	`## 强制调换PAF比对结果中的链特征 flipPaf(paf.table = paf.table, force = TRUE)`

4 sequence-to-sequence比对结果可视化

可视化的主要函数是plotMiro()，可以可视化一个序列比对到参考基因组的结果，也可以可视化两个及以上序列相互比对的结果。这个函数输出的图像是水平的，最上面的是target sequence，下面的是query sequence.

library(SVbyEye)

## Get PAF to plot
paf.file <- system.file("extdata", "test1.paf",
    package = "SVbyEye"
)

## Read in PAF
paf.table <- readPaf(
    paf.file = paf.file,
    include.paf.tags = TRUE, restrict.paf.tags = "cg"
)

paf.table

## Make a plot colored by alignment directionality
plotMiro(paf.table = paf.table, color.by = "direction")

1 2	`# 设置颜色 plotMiro(paf.table = paf.table, color.palette = c("+" = "azure3", "-" = "yellow3"))`

1 2	`# 使用相似度上色，而不是方向 plotMiro(paf.table = paf.table, color.by = "identity")`

1 2	`# 设置划窗的大小 plotMiro(paf.table = paf.table, binsize = 1000)`

4.1 添加注释信息

## Make a plot
plt <- plotMiro(paf.table = paf.table)
## Load target annotation file
target.annot <- system.file("extdata", "test1_target_annot.txt", package = "SVbyEye")
target.annot.df <- read.table(target.annot, header = TRUE, sep = "\t", stringsAsFactors = FALSE)
target.annot.gr <- GenomicRanges::makeGRangesFromDataFrame(target.annot.df)

## Add target annotation as arrowhead
addAnnotation(ggplot.obj = plt, annot.gr = target.annot.gr, coordinate.space = "target")

Pasted image 20250412165739

## Load query annotation file
query.annot <- system.file("extdata", "test1_query_annot.txt", package = "SVbyEye")
query.annot.df <- read.table(query.annot, header = TRUE, sep = "\t", stringsAsFactors = FALSE)
query.annot.gr <- GenomicRanges::makeGRangesFromDataFrame(query.annot.df)

## Add query annotation as rectangle
addAnnotation(ggplot.obj = plt, annot.gr = query.annot.gr, shape = "rectangle", coordinate.space = "query")

4.2 添加基因注释信息

## Create gene-like annotation
test.gr <- GenomicRanges::GRanges(
              seqnames = 'target.region',
              ranges = IRanges::IRanges(start = c(19000000,19030000,19070000),
                                        end = c(19010000,19050000,19090000)),
                                        ID = 'gene1')

## Add single gene annotation
addAnnotation(ggplot.obj = plt, annot.gr = test.gr, coordinate.space = "target",shape = 'rectangle', annotation.group = 'ID', fill.by = 'ID')

Pasted image 20250412170137

4.3 强调某个部分

## Make a plot
plt1 <- plotMiro(paf.table = paf.table, add.alignment.arrows = FALSE)

## Highlight alignment as a filled polygon
addAlignments(ggplot.obj = plt1, paf.table = paf.table[3,], fill.by = 'strand',
              fill.palette = c('+' = 'red'))
addAlignments(ggplot.obj = plt1, paf.table = paf.table[4,], linetype = 'dashed')

4.4 插入和缺失检测与可视化

1 2	`plotMiro(paf.table = paf.table, min.deletion.size = 50, highlight.sv = "fill") -> p ggsave(p, file = "D:/OneDrive/桌面/4.pdf", width = 8, height = 4, dpi = 500, bg = "white")`

Pasted image 20250412170634

4.5 热图展示GC含量

## Get PAF to plot
paf.file <- system.file("extdata", "test_getFASTA.paf",
    package = "SVbyEye"
)

## Read in PAF
paf.table <- readPaf(
    paf.file = paf.file,
    include.paf.tags = TRUE, restrict.paf.tags = "cg"
)

## Make a plot colored by alignment directionality
plt <- plotMiro(paf.table = paf.table, color.by = "direction")

## Get query sequence composition
fa.file <- paf.file <- system.file("extdata", "test_getFASTA_query.fasta",
    package = "SVbyEye"
)

## Calculate GC content per 5kbp bin
gc.content <- fasta2nucleotideContent(fasta.file = fa.file,
                                      binsize = 5000, nucleotide.content = 'GC')

## Add GC content as annotation heatmap
addAnnotation(ggplot.obj = plt, annot.gr = gc.content, shape = 'rectangle',
              fill.by = 'GC_nuc.freq', coordinate.space = 'query', annotation.label = 'GCcont')

Pasted image 20250412170915

4.6 多序列比对结果

## Get PAF to plot
paf.file <- system.file("extdata", "test_ava.paf", package = "SVbyEye")

## Read in PAF
paf.table <- readPaf(paf.file = paf.file, include.paf.tags = TRUE, restrict.paf.tags = "cg")

plotAVA(paf.table = paf.table, binsize = 20000)

Pasted image 20250412171154

5 全基因组比对可视化

## Get PAF to plot ##
paf.file <- system.file("extdata", "PTR_test.paf", package = "SVbyEye")

## Read in PAF
paf.table <- readPaf(paf.file = paf.file)

## Color by alignment directionality

plotGenome(paf.table = paf.table, chromosomes = paste0('chr', c(4, 5, 9, 12)), chromosome.bar.width = grid::unit(2, 'mm'), min.query.aligned.bp = 5000000)

Pasted image 20250412171533

生物信息学

#生物信息学

结构变异可视化工具SVbyEye

https://lixiang117423.github.io/article/svbyeye/

作者

李详【Xiang LI】

发布于

2025年5月9日

许可协议

基因注释软件BRAKER学习笔记上一篇

泛基因组工具vg 下一篇