The infamous slope/derivative — Intuition
It seems like you have finally decided that you will understand this term today at any cost. You must have seen it multiple times in backward propagation and wondered why we use the derivative in the weight updates. Good job at choosing this topic. It’s always good to get your basics right.
First thing first, remember anything from school about derivatives? Your teacher must have said that the derivative of y is the change in y for a small change in x. Does that ring a bell?
So, what does the above sentence mean?
Let’s take a simple equation
When x = 1, y = 1. Now we need to see what happens when there is a small change in x. So, when x = 1.1 then y = 1.1. This implies that when x shifts by 0.1 (1.1–1), y changes by 0.1 (1.1–1). Therefore the change in y for a small change in x is given by
For the above equation, the derivative is 1. Notice that y = 1*x.
Let us take another example
When x = 1, y= 1, when x = 1.01, y = 1.0201
When x = 3, y= 9, when x = 3.01, y = 9.0601
We see that by the definition of derivative/slope we gave, for the above equation, we can generalize that 2x is the derivative.
We have now got a basic understanding of the slope.
However, why do we need it?
Consider the second equation
The minimum value y takes is 0 for x = 0. So the minimum of the equation is obtained at x = 0
Note, in the below paragraphs, minimum refers to the value of x at which y is minimum.
As you can see in the above table. If the x value is greater than the minimum (0), the slope is positive and if the x value is lesser than the minimum (0), the slope is negative. Also, the farther the x value is from the minimum, the higher is the absolute slope value.
If x > minimum, subtracting a value proportionate to the slope will take us closer to the minimum. As seen above, in this case, slope > 0, therefore the updated value of x will be smaller than before and hence closer to the minimum.
If x <minimum, adding a value proportionate to the slope will take us closer to the minimum. As seen above, in this case, slope < 0, therefore the updated value of x will be greater than before and hence closer to the minimum.
Note that the role of choosing the right alpha is crucial to the above-mentioned process.
I hope this article has given you a basic understanding of slope/derivative and it will help you understand why we use it in weight updates in backward propagation.