图像/视频背景替换--BackgroundMattingV2 笔记

一、模型的构建

base -- 基类，允许不同input channel 和不同的output channel 不同的backbone，通过参数传入。

1、backbone--resnet101

建构resnet:

input channel 6

替换first conv layer 因为 input channel = 6

删除原来的fully-connected layer

注：

Variant : 特殊的数据类型，除了固定长度string数据以外的任何类型的数据。

2、ASPP

来自于deeplabv3

金字塔池化，增加了空间的感受野

input channel = 2048

output channel = 256

ASPPConv 继承了 nn.Sequential， supper 先找到ASPPConv的父类 nn.Sequential，然后用父类的初始化方法来对ASPPConv的属性进行初始化。Sequential 是一个容器，将模块依次传入容器中。

torch.cat(x,dim) 将不同空洞卷积的结果连接起来。

3、decoder

output channel 1 + 3 + 1 + 32=37

各层输出通道的维数逐渐减少。512 -->256 -->64--> 6

有skilp connection

4、Refiner

有来自原始图像的skilp connection

------------------------------------------------------------------------------------------------------------------------------

二、模型参数、数据的加载

model.load_state_dict(torch.load(model_checkpoint))

cv2.VideoCapture(videopath)

获取宽高帧率帧数等信息

使用单映变换匹配source image 和background, cv2.ORB_create 。对于超出bgr 部分的区域，就将source 对应的像素拷贝过去。

VideoWriter(output_dir, frame_rate, width,height) 按照输入的video 的参数保存数据。

-------------------------------------------------------------------------------------------------------------------------------------------------

执行推断的过程

with torch.no_grad():