SwinTransformerV2¶
- class mmpretrain.models.backbones.SwinTransformerV2(arch='tiny', img_size=256, patch_size=4, in_channels=3, window_size=8, drop_rate=0.0, drop_path_rate=0.1, out_indices=(3,), use_abs_pos_embed=False, interpolate_mode='bicubic', with_cp=False, frozen_stages=-1, norm_eval=False, pad_small_map=False, norm_cfg={'type': 'LN'}, stage_cfgs={}, patch_cfg={}, pretrained_window_sizes=[0, 0, 0, 0], init_cfg=None)[source]¶
- Swin Transformer V2. - A PyTorch implement of : Swin Transformer V2: Scaling Up Capacity and Resolution - Inspiration from https://github.com/microsoft/Swin-Transformer - Parameters:
- Swin Transformer architecture. If use string, choose from ‘tiny’, ‘small’, ‘base’ and ‘large’. If use dict, it should have below keys: - embed_dims (int): The dimensions of embedding. 
- depths (List[int]): The number of blocks in each stage. 
- num_heads (List[int]): The number of heads in attention modules of each stage. 
- extra_norm_every_n_blocks (int): Add extra norm at the end of main branch every n blocks. 
 - Defaults to ‘tiny’. 
- img_size (int | tuple) – The expected input image shape. Because we support dynamic input shape, just set the argument to the most common input image shape. Defaults to 224. 
- patch_size (int | tuple) – The patch size in patch embedding. Defaults to 4. 
- in_channels (int) – The num of input channels. Defaults to 3. 
- window_size (int | Sequence) – The height and width of the window. Defaults to 7. 
- drop_rate (float) – Dropout rate after embedding. Defaults to 0. 
- drop_path_rate (float) – Stochastic depth rate. Defaults to 0.1. 
- use_abs_pos_embed (bool) – If True, add absolute position embedding to the patch embedding. Defaults to False. 
- interpolate_mode (str) – Select the interpolate mode for absolute position embedding vector resize. Defaults to “bicubic”. 
- with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Defaults to False. 
- frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1. 
- norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Defaults to False. 
- pad_small_map (bool) – If True, pad the small feature map to the window size, which is common used in detection and segmentation. If False, avoid shifting window and shrink the window size to the size of feature map, which is common used in classification. Defaults to False. 
- norm_cfg (dict) – Config dict for normalization layer for all output features. Defaults to - dict(type='LN')
- stage_cfgs (Sequence[dict] | dict) – Extra config dict for each stage. Defaults to an empty dict. 
- patch_cfg (dict) – Extra config dict for patch embedding. Defaults to an empty dict. 
- pretrained_window_sizes (tuple(int)) – Pretrained window sizes of each layer. 
- init_cfg (dict, optional) – The Config for initialization. Defaults to None. 
 
 - Examples - >>> from mmpretrain.models import SwinTransformerV2 >>> import torch >>> extra_config = dict( >>> arch='tiny', >>> stage_cfgs=dict(downsample_cfg={'kernel_size': 3, >>> 'padding': 'same'})) >>> self = SwinTransformerV2(**extra_config) >>> inputs = torch.rand(1, 3, 224, 224) >>> output = self.forward(inputs) >>> print(output.shape) (1, 2592, 4)