MixMIMPretrainTransformer¶
- class mmpretrain.models.selfsup.MixMIMPretrainTransformer(arch='base', mlp_ratio=4, img_size=224, patch_size=4, in_channels=3, window_size=[14, 14, 14, 7], qkv_bias=True, patch_cfg={}, norm_cfg={'type': 'LN'}, drop_rate=0.0, drop_path_rate=0.0, attn_drop_rate=0.0, use_checkpoint=False, mask_ratio=0.5, range_mask_ratio=0.0, init_cfg=None)[source]¶
- MixMIM backbone for MixMIM pre-training. - A PyTorch implement of : ` MixMIM: Mixed and Masked Image Modeling for Efficient Visual Representation Learning <https://arxiv.org/abs/2205.13137>`_ - Parameters:
- MixMIM architecture. If use string, choose from ‘base’,’large’ and ‘huge’. If use dict, it should have below keys: - embed_dims (int): The dimensions of embedding. 
- depths (int): The number of transformer encoder layers. 
- num_heads (int): The number of heads in attention modules. 
 - Defaults to ‘base’. 
- mlp_ratio (int) – The mlp ratio in FFN. Defaults to 4. 
- img_size (int | tuple) – The expected input image shape. Because we support dynamic input shape, just set the argument to mlp_ratio the most common input image shape. Defaults to 224. 
- patch_size (int | tuple) – The patch size in patch embedding. Defaults to 16. 
- in_channels (int) – The num of input channels. Defaults to 3. 
- window_size (list) – The height and width of the window. 
- qkv_bias (bool) – Whether to add bias for qkv in attention modules. Defaults to True. 
- patch_cfg (dict) – Extra config dict for patch embedding. Defaults to an empty dict. 
- norm_cfg (dict) – Config dict for normalization layer. Defaults to - dict(type='LN').
- drop_rate (float) – Probability of an element to be zeroed. Defaults to 0. 
- drop_path_rate (float) – Stochastic depth rate. Defaults to 0. 
- attn_drop_rate (float) – Attention drop rate. Defaults to 0. 
- use_checkpoint (bool) – Whether use the checkpoint to reduce GPU memory cost. Defaults to False. 
- mask_ratio (bool) – The base ratio of total number of patches to be masked. Defaults to 0.5. 
- range_mask_ratio (float) – The range of mask ratio. Defaults to 0. 
- init_cfg (dict, optional) – Initialization config dict. Defaults to None. 
 
 - forward(x, mask=True)[source]¶
- Generate features for masked images. - This function generates mask and masks some patches randomly and get the hidden features for visible patches. - Parameters:
- x (torch.Tensor) – Input images, which is of shape B x C x H x W. 
- mask (bool, optional) – To indicate whether the forward containing - maskor not.
 
- Returns:
- x (torch.Tensor): hidden features, which is of shape B x L x C. 
- mask_s4 (torch.Tensor): the mask tensor for the last layer. 
 
- Return type:
- Tuple[torch.Tensor, torch.Tensor] 
 
 - random_masking(x, mask_ratio=0.5)[source]¶
- Generate the mask for MixMIM Pretraining. - Parameters:
- x (torch.Tensor) – Image with data augmentation applied, which is of shape B x L x C. 
- mask_ratio (float) – The mask ratio of total patches. Defaults to 0.5. 
 
- Returns:
- mask_s1 (torch.Tensor): mask with stride of self.encoder_stride // 8. 
- mask_s2 (torch.Tensor): mask with stride of self.encoder_stride // 4. 
- mask_s3 (torch.Tensor): mask with stride of self.encoder_stride // 2. 
- mask (torch.Tensor): mask with stride of self.encoder_stride. 
 
- Return type:
- Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]