Member-only story

Compress Your Deep Learning Models with No Code, No Hassle

Nota’s (Free) NetsPresso Compression Toolkit

5 min readJan 20, 2022

You’ve worked hard to build a deep learning model that performs well. Now it’s time to take it out of the massive GPU centers and into the standard everyday devices it’ll be used on. Let’s see how it performs!

[System crashed]

Uh-oh! Your model is too big. “Dammit,” you groan. “This model worked so well, but now I have to start from scratch with a new, smaller architecture that doesn’t crash my system. I’m not even sure if it will even obtain decent performance, not to mention reaching equivalent performance to the first model.”

Never fear! Compression-Man is here to help. Armed with his arsenal of weapons — filter decomposition in his right hand and pruning in the left — he’s ready to take down any unruly model and squeeze them down.

(Okay, that might have been a little bit exaggerated. But the struggle is real.)

In the context of deep learning, model compression refers to the techniques and processes by which a smaller representation of a model can be derived with no or negligible decrease in performance.

There are two particularly important model compression technique families: Pruning and Filter Decomposition.

Structured Pruning is the removal of network group entities, like neurons/nodes, layers, or channels. To determine which group entities to remove, one must specify a criterion to rank the importance of each network entity. Generally, a higher aggregated magnitude of weights associated with a group entity (e.g. sum of ingoing and outgoing weights to a node or layer) indicates that group entity’s greater importance and significance. You can use different calculation methods like L2 Norm, the Nuclear Norm, and the Geometric Median to determine how weights are aggregated.

Compress Your Deep Learning Models with No Code, No Hassle

Nota’s (Free) NetsPresso Compression Toolkit

Written by Andre Ye

No responses yet