iTPNHiViT¶
- class mmpretrain.models.selfsup.iTPNHiViT(arch='base', img_size=224, patch_size=16, inner_patches=4, stem_mlp_ratio=3.0, mlp_ratio=4.0, qkv_bias=True, qk_scale=None, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.0, norm_cfg={'eps': 1e-06, 'type': 'LN'}, ape=True, rpe=False, layer_scale_init_value=0.0, mask_ratio=0.75, reconstruction_type='pixel', **kwargs)[source]¶
HiViT for iTPN pre-training.
- Parameters:
inner_patches (int) – Inner patch. Defaults to 4.
stem_mlp_ratio (int) – Ratio of MLP hidden dim to embedding dim in the first two stages. Defaults to 3.
mlp_ratio (int) – Ratio of MLP hidden dim to embedding dim in the last stage. Defaults to 4.
qkv_bias (bool) – Enable bias for qkv projections if True.
qk_scale (float) – The number of divider after q@k. Default to None.
drop_rate (float) – Probability of an element to be zeroed. Defaults to 0.
attn_drop_rate (float) – The drop out rate for attention output weights. Defaults to 0.
drop_path_rate (float) – stochastic depth rate. Defaults to 0.
norm_cfg (dict) – Config dict for normalization layer. Defaults to
dict(type='LN').ape (bool) – If True, add absolute position embedding to the patch embedding.
rpe (bool) – If True, add relative position embedding to the patch embedding.
layer_scale_init_value (float) – Layer-scale init values. Defaults to 0.
mask_ratio (bool) – The ratio of total number of patches to be masked. Defaults to 0.75.
reconstruction_type (str) – The reconstruction of self-supervised learning. Defaults to ‘pixel’.
- forward(x, mask=True)[source]¶
Generate features for masked images.
The function supports two kind of forward behaviors. If the
maskisTrue, the function will generate mask to masking some patches randomly and get the hidden features for visible patches, which means the function will be executed as masked imagemodeling pre-training; if themaskisNoneorFalse, the forward function will callsuper().forward(), which extract features from images without mask.- Parameters:
x (torch.Tensor) – Input images, which is of shape B x C x H x W.
mask (bool, optional) – To indicate whether the forward function generating
maskor not.
- Returns:
Hidden features, mask and the ids to restore original image.
x(torch.Tensor): hidden features, which is of shape B x (L * mask_ratio) x C.mask(torch.Tensor): mask used to mask image.ids_restore(torch.Tensor): ids to restore original image.
- Return type:
Tuple[torch.Tensor, torch.Tensor, torch.Tensor]
- forward_clip(x, mask=True)[source]¶
Generate features for masked images.
The function supports two kind of forward behaviors. If the
maskisTrue, the function will generate mask to masking some patches randomly and get the hidden features for visible patches, which means the function will be executed as masked imagemodeling pre-training; if themaskisNoneorFalse, the forward function will callsuper().forward(), which extract features from images without mask.- Parameters:
x (torch.Tensor) – Input images, which is of shape B x C x H x W.
mask (bool, optional) – To indicate whether the forward function generating
maskor not.
- Returns:
Hidden features, mask and the ids to restore original image.
x(torch.Tensor): hidden features, which is of shape B x (L * mask_ratio) x C.mask(torch.Tensor): mask used to mask image.ids_restore(torch.Tensor): ids to restore original image.
- Return type:
Tuple[torch.Tensor, torch.Tensor, torch.Tensor]
- forward_pixel(x, mask=True)[source]¶
Generate features for masked images.
The function supports two kind of forward behaviors. If the
maskisTrue, the function will generate mask to masking some patches randomly and get the hidden features for visible patches, which means the function will be executed as masked imagemodeling pre-training; if themaskisNoneorFalse, the forward function will callsuper().forward(), which extract features from images without mask.- Parameters:
x (torch.Tensor) – Input images, which is of shape B x C x H x W.
mask (bool, optional) – To indicate whether the forward function generating
maskor not.
- Returns:
Hidden features, mask and the ids to restore original image.
x(torch.Tensor): hidden features, which is of shape B x (L * mask_ratio) x C.mask(torch.Tensor): mask used to mask image.ids_restore(torch.Tensor): ids to restore original image.
- Return type:
Tuple[torch.Tensor, torch.Tensor, torch.Tensor]