MobileNet Explained | Depthwise Separable Convolution | CNN Architectures

Topic: MobileNet — Motivation, Architecture, and Efficiency
Core motivation: Existing CNNs (AlexNet, VGG-16, ResNet, InceptionNet) are computationally too heavy for mobile CPUs. MobileNet (2017) was designed specifically to run on mobile and Android devices with low computational cost.

Key concept — Depthwise Separable Convolution:
Standard convolution uses filters that span all input channels simultaneously. For a 6×6×3 input with 5 filters of size 3×3×3, the cost is 2,160 operations.
Depthwise separable convolution splits this into two steps:

Depthwise conv — one filter per channel independently → cost: 432
Pointwise conv — 1×1 filters to mix channels → cost: 240
Total: 672 — roughly 3× cheaper in this example, and ~10× cheaper at 512 filters in practice

The paper's formula: Cost ratio = 1/N'c + 1/F²

MobileNet V1 (Howard et al., 2017):
Stacks the depthwise separable block 13 times → Pooling → Fully Connected → Softmax. No skip connections.
MobileNet V2 (Sandler et al., 2019):
Introduces the inverted residual bottleneck block repeated 17 times:

Pointwise conv — expand channels (e.g. 3 → 18)
Depthwise conv — spatial filtering
Pointwise conv — project back (e.g. 18 → 3)
Skip connection added

This expand-then-compress design gives richer computation while keeping memory usage minimal — ideal for memory-constrained mobile hardware.

Видео MobileNet Explained | Depthwise Separable Convolution | CNN Architectures канала AKAdemy