In src/brevitas_examples/imagenet_classification/models/vgg.py, the classifier is defined as:
self.avgpool = TruncAvgPool2d(kernel_size=(7, 7), stride=1, bit_width=bit_width)
self.classifier = nn.Sequential(
QuantLinear(
512 * 7 * 7,
...
TruncAvgPool2d with kernel_size=(7,7) and stride=1 on a 7×7 spatial input
(produced by VGG's 5 MaxPool2d layers from 224×224) outputs a 1×1 spatial map.
After torch.flatten(x, 1), the feature vector has size 512, not 51277=25088.
The 51277 value appears to be inherited from the original torchvision VGG,
which uses AdaptiveAvgPool2d((7,7)) and preserves the 7×7 spatial dimensions.
Proposed fix:
QuantLinear(512, 4096, ...)
This would cause a shape mismatch at runtime with any standard 224×224 input.
In
src/brevitas_examples/imagenet_classification/models/vgg.py, the classifier is defined as:TruncAvgPool2d with kernel_size=(7,7) and stride=1 on a 7×7 spatial input
(produced by VGG's 5 MaxPool2d layers from 224×224) outputs a 1×1 spatial map.
After torch.flatten(x, 1), the feature vector has size 512, not 51277=25088.
The 51277 value appears to be inherited from the original torchvision VGG,
which uses AdaptiveAvgPool2d((7,7)) and preserves the 7×7 spatial dimensions.
Proposed fix:
This would cause a shape mismatch at runtime with any standard 224×224 input.