Dimensional Analysis on Loss Functions

A half-baked idea I had while checking my equations


One of my favourite tricks to minimize errors while doing high school physics was to use dimensional analysis as a sanity check on all my equations. It is easy to see that gradient descent is not dimensionally sound, and so I have some thoughts on whether we can reinvent concepts like loss functions using dimensional analysis of gradient descent. For example, if we’re trying to minimize the loss on a physical quantity like length, what dimensions would the gradients be, and what dimensions would the learning rates be. What happens to dimensions in neural networks? This is something for me to think about and update later. Watch this space!