Why Machine Learning Is A Metaphor For Life – Adit Deshpande – Engineering at Forward | UCLA CS '19
Why Machine Learning Is A Metaphor For Life
Seriously. Hear me out on this.
The more I learn about ML, the more I see the number of similarities there are between life and machine learning concepts.
Specifically, let’s think about neural networks .
Let's think of a neural net that has a bunch of input nodes and one output node. The input features are encapsulated in some input vector x, and we’d like to come up with a single scalar prediction ŷ. The way you compute ŷ is by passing x through a series of linear operations with weight matrices mixed in with a couple activation functions. Simple enough. But in order to get accurate ŷ predictions, we need to first train our network.
When training neural networks, the goal is to be able to minimize a loss function .
That’s the name of the game. We want to be able to minimize the difference between the actual labels in our training set and our predictions, and adjust the weights of our network in hope of training it well enough so that it can generalize to test examples.
I realized that this idea of a minimizing a loss function kind of relates to our daily lives . Think about it. When you drove to work/school this morning, did you take the fastest route possible or did you take a couple of random highways and a few side streets, eventually ending up at your destination? When you have an important project due, do you waste time on Facebook or do you buckle down and put away your phone (Hopefully there's only one right answer here). When you're a coder who's working on a couple of difficult assignments, do you calmly split up the jobs and work on each one at a time or do you panic and work for 36 hours straight to get them done (Yes, there is a masochistic answer but we're not going with that).
The idea here is that we’re all, consciously or subconsciously, solving optimization problems every single day . We’re always trying to minimize the amount of stress, the number of distractions, and the time it takes to do something.
The first time you drove to work/school, it's very possible that you could've taken an inefficient and long route. However, over time, you start to recognize traffic and shortcuts, and eventually start to minimize the time spent driving.
What about those times where you can't seem to break the habit of checking your News Feed every hour, or the times where you feel like there's no way to decrease the amount of soul crushing stress of school or work? Well, we can think of those as just local minima or saddle points . In ML, these are points in the vector space where the gradient is close to 0, thus making it difficult to minimize f(x) any further. In life, these are moments where we feel like there's nothing we can really do, no changes we can make to alleviate the current situation. Fighting these local minima and saddle points requires second order optimization and some other complex techniques. From a high level, they look for ways to escape the unfavorable points by computing additional information in the form of Hessians, curvature info, etc. So, in life, when you feel like you're in one of those unpleasant local minimas, just remember to always look around because there could always useful information to help you out of your current situation and further minimize your f(x).
Alright, let's look at a different part of the ML pipeline now.
One of the other main components in your ML model is your training dataset. You have some set X where each xi ∈ X is a particular collection of features (in vector form) describing the input example, and you have some set Y where each yi ∈ Y is a label that describes the category for classification or a real valued number for regression. These examples are there to help us train our network to output accurate ŷ values.
If our lives have so many optimization problems, what are the xi’s and yi’s? In ML problems, xi can be a matrix of pixel values, a word vector, or the spectrogram of an audio clip, while yi is often a category or a real valued number.
Let’s take the example of driving to work. In real life, we can think of that input vector as describing the set of circumstances surrounding that day. Was there an accident on the major freeway? Is there bad weather right now? Am I leaving during rush hour? All of this information is encapsulated in some mental representation in our brains, and is processed to be able to determine a driving route R. Based on that route, we record the actual time it took to reach the destination, which we'll denote f(x, R). We train our mental model so that f(x, R) gets as close to yi, the optimal amount of time it takes to get to the destination.
You can think of every single day as one example pair (xi, yi) in our training set (X,Y). Each xi represents the conditions and each yi represents the optimal time...