MAEHiViT¶
- class mmpretrain.models.selfsup.MAEHiViT(arch='b', img_size=224, patch_size=16, inner_patches=4, out_indices=[23], drop_rate=0.0, drop_path_rate=0.0, norm_cfg={'eps': 1e-06, 'type': 'LN'}, ape=True, rpe=False, layer_scale_init_value=0.0, mask_ratio=0.75, init_cfg=None)[source]¶
HiViT for MAE pre-training.
A PyTorch implement of: HiViT: A Simple and More Efficient Design of Hierarchical Vision Transformer. This module implements the patch masking in MAE and initialize the position embedding with sine-cosine position embedding.
- Parameters:
arch (str | dict) – Vision Transformer architecture Default: ‘b’
patch_size (int | tuple) – The patch size Defaults to 4, to downsample 4x at the first stage
inner_patches (int) – The inner patches within a token Defaults to 4
out_indices (Sequence | int) – Output from which stages. Defaults to -1, means the last stage.
drop_rate (float) – Probability of an element to be zeroed. Defaults to 0.
drop_path_rate (float) – stochastic depth rate. Defaults to 0.
norm_cfg (dict) – Config dict for normalization layer. Defaults to
dict(type='LN').ape (bool) – the absolute position embedding
rpe (bool) – the relative position embedding Defaults to False
layer_scale_init_value (float) – the layer scale init value
mask_ratio (bool) – The ratio of total number of patches to be masked. Defaults to 0.75.
init_cfg (Union[List[dict], dict], optional) – Initialization config dict. Defaults to None.
- forward(x, mask=True)[source]¶
Generate features for masked images.
The function supports two kind of forward behaviors. If the
maskisTrue, the function will generate mask to masking some patches randomly and get the hidden features for visible patches, which means the function will be executed as masked imagemodeling pre-training; if themaskisNoneorFalse, the forward function will callsuper().forward(), which extract features from images without mask.- Parameters:
x (torch.Tensor) – Input images, which is of shape B x C x H x W.
mask (bool, optional) – To indicate whether the forward function generating
maskor not.
- Returns:
Hidden features, mask and the ids to restore original image.
x(torch.Tensor): hidden features, which is of shape B x (L * mask_ratio) x C.mask(torch.Tensor): mask used to mask image.ids_restore(torch.Tensor): ids to restore original image.
- Return type:
Tuple[torch.Tensor, torch.Tensor, torch.Tensor]
- masking_id(batch_size, mask_ratio)[source]¶
Generate the mask for MAE Pre-training.
- Parameters:
batch_size – The batch size of input data
mask_ratio – The mask ratio of total patches. Defaults to 0.75.
- Returns:
the ids for the tokens retained, the ids to restore original image, and the mask
- Return type:
Tuple[torch.Tensor, torch.Tensor, torch.Tensor]