Hello Chadrick, the interesting finding of the article is that adding nonlinear activations make the network solutions simpler — which is very related to the linearity of a network’s learning patterns, although not entirely. The authors propose adding linear layers as more directly reducing the rank of solutions.