Image created by author.

The Clever Trick Behind Google’s Inception: The 1×1 Convolution

What does a 1×1 conv even do?

6 min readAug 23, 2020

--

Google’s Inception architecture has had lots of success in the image classification world —and much of it is owed to a clever trick known as 1×1 convolution, central to the model’s design.

One notices immediately that the 1×1 convolution is an essential part of the Inception module. It precedes any other convolution (3×3 and 5×5) and used four times in a single module, more than any other element.

Inception module. Source: Inception paper.

What a 1×1 convolution even does, however, can be confusing. If convolutions are meant to apply a sliding window over several consecutive pixels, exactly what is a sliding window the same size as a pixel doing? To answer this question, let’s begin with a definition of what a convolution is.

An a×a convolution refers to the size of the ‘filter’, or the sliding window, which multiplies and sums elements to form a convolved output. The ‘stride’ of the convolution refers to how many pixels it moves before calculating the next value. The number of pixels in the output convolved feature is a simple function of both the size of the filter and the stride. In…

--

--