Softmax: Why neural networks need non-linearity? life isn't straight-line simple

Softmax Activation Function

What is an Activation Function? An activation (or transfer) function maps a neuron’s weighted inputs plus bias to its output, adding non-linearity so the model can learn complex patterns beyond simple linear ones. Activation Functions are also known as Transfer Function in the context of Neural Networks. Math functions that calculate weighted sum of inputs and adds bias to give non-linearity to output of neuron. Decides whether a neuron should be activated (“fired”) or not. This helps Neural Network to use important information and suppress not so useful data points. Adds non-linearity to Neural Network to tackle complex problems. Real-world problems are non-linear. Recognizing cats vs. dogs Without activation functions, f(z) = z, linear regression model, multiple linear layers form up to one big linear equation; useless for non-linear problems. What are linear and non-linear problems? A linear pattern is like a straight-line rule of thumb. If you study twice as long, you score twice as high in your exams. Simple analogy, neat and slightly predictable. A non-linear (complex) pattern is more like real life. Studying a little earlier before exams could help a lot at first, then extra hours give smaller dopamine boosts, and maybe after a point you may burn out and your exam does not go well; your scores are average. The scenario bends, twists, and changes depending on the real life events, not just a straight line. That’s why neural networks need non-linearity: life isn't straight-line simple. Softmax Function non-linear, extension of Sigmoid Softmax converts a vector of raw scores (logits) which could be any real numbers, positive or negative into a probability distribution. Used in the last layer (output layer) of neural network for multi-class classification problems Output range (0,1) and normalizes positive values that sum to 1. Specially for selecting one class out of many classes. Outputs a vector of probabilities: Class with highest probability value is chosen with confidence. Softmax - Mathematical Derivation Combination of multiple Sigmoid/ Logistic functions. Calculates the relative probabilities of each Sigmoids. Numerator exponentiates the input Denominator makes all outputs sum to 1.

softmax

exp

Given logits,

the softmax function for class i is: z i = sigmoids at any particular neuron exp(z i) = exponential of zi ∑ j exp(z j) = summation of all exp(zj) where j is all sigmoids in the network. How to apply Softmax? Assume 3 classes, i.e. 3 neurons in the output layer. Suppose our output from the neurons is [3.2, 1.2, 0.5] . Applying Softmax function Input: [3.2, 1.2, 0.5] - logits Step 1: Subtract the max from all Max value is 3.2, so subtract from each[ 3.2 - 3.2, 1.2 - 3.2, 0.5 - 3.2 ] Step 2: Exponentiate e0 = 1.0 , e-2 = 0.1353, e-2.7 = 0.0672 Step 3: Sum of exponentials 1.0 + 0.1353 + 0.0672 = 1.2025 Step 4: Divide exponential by the sum z1 = 1.0/ 1.2025 = 0.8317z2 = 0.1353 / 1.2025 = 0.1125z3 = 0.0672 / 1.2025 = 0.0558 For eg: To classify image into one of three classes: [bird, fruit, flower] If softmax(output): [0.8317, 0.1125, 0.0558] Class 1: 83.17% probability Class 2: 11.25% probability Class 3: 5.58% probability Show algorithm an image of a bird. Algorithm thinks 83% probability that its a bird, 11% fruit and 5% flower. Algorithm will predict bird. Now, let’s try this example in Python Code with NumPy, PyTorch and TensorFlow. How to implement Softmax Function in Python? We will write simple code for implementing Softmax activation function in 3 most popular platforms viz. Numpy, PyTorch and TensorFlow. All code samples are executable in Google Colab easily. Softmax in Numpy import numpy as np

def softmax(x): e_x = np.exp(x - np.max(x)) return e_x / e_x.sum(axis=0)

logits = np.array([3.2, 1.2, 0.5]) probabilities = softmax(logits)

print(probabilities)Softmax in PyTorch import torch import torch.nn.functional as F

logits = torch.tensor([3.2, 1.2, 0.5]) probabilities = F.softmax(logits, dim=0) print(probabilities)Softmax in TensorFlow import tensorflow as tf

logits = tf.constant([3.2, 1.2, 0.5]) probabilities = tf.nn.softmax(logits) print(probabilities) Applications of Softmax Multi-class classification problems NLP - next word prediction Reinforcement Learning (train robot) Distillation - teach smaller models Sentiment analysis (+ve, -ve, neutral) A primary example for the use case of Softmax can be MNIST dataset - 70k grayscale images of handwritten digits (0-9) MNIST Dataset Sample3-4-2 Neural NetworkSoftmax Neural Network Advancements in Softmax Function Adaptive Softmax Faster, memory efficient for large number of classes For eg: Instead of treating all words equally, it treats frequent and rare words differently. Candidate Sampling Sample a few positive & negative examples (called candidates) during training. Calculate for small or random set of candidate classes. Sparsemax Produces sparse outputs Cuts off small values to...

Softmax: Why neural networks need non-linearity? life isn't straight-line simple

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

It's Not Just X. It's Y

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy