3.1.7. clumsygrad.grad#
This module contains backward functions for various tensor operations in a computational graph.
All functions have a generic signature that takes a tensor and the any/tensor’s gradient as inputs, and returns a tuple of gradients for each parent tensor.
- param tensor:
Result tensor from the forward operation
- param grad:
Gradient flowing back from the next layer
- returns:
Tuple of gradients for each parent tensor
- clumsygrad.grad.GradientTuple#
A tuple of gradients for each parent tensor.
alias of
Tuple[ndarray, …]
- clumsygrad.grad.transpose_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#
Backward function for transpose operation.
For \(z = x^T\):
\[\frac{\partial z}{\partial x} = \text{grad}^T\]
- clumsygrad.grad.add_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#
Backward function for addition operation.
For \(z = x + y\):
\[\frac{\partial z}{\partial x} = 1, \quad \frac{\partial z}{\partial y} = 1\]
- clumsygrad.grad.add_scalar_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#
Backward function for tensor + scalar operation.
For \(z = x + c\) where \(c\) is scalar:
\[\frac{\partial z}{\partial x} = 1\]
- clumsygrad.grad.sub_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#
Backward function for subtraction operation.
For \(z = x - y\):
\[\frac{\partial z}{\partial x} = 1, \quad \frac{\partial z}{\partial y} = -1\]
- clumsygrad.grad.sub_scalar_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#
Backward function for tensor - scalar operation.
For \(z = x - c\) where \(c\) is scalar:
\[\frac{\partial z}{\partial x} = 1\]
- clumsygrad.grad.mul_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#
Backward function for element-wise multiplication.
For \(z = x \odot y\):
\[\frac{\partial z}{\partial x} = y, \quad \frac{\partial z}{\partial y} = x\]
- clumsygrad.grad.mul_scalar_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#
Backward function for tensor * scalar operation.
For \(z = x \cdot c\) where \(c\) is scalar:
\[\frac{\partial z}{\partial x} = c\]
- clumsygrad.grad.matmul_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#
Backward function for matrix multiplication.
For \(Z = X \cdot Y\):
\[\frac{\partial Z}{\partial X} = \text{grad} \cdot Y^T, \quad \frac{\partial Z}{\partial Y} = X^T \cdot \text{grad}\]
- clumsygrad.grad.power_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#
Backward function for power operation.
For \(z = x^n\):
\[\frac{\partial z}{\partial x} = n \cdot x^{n-1}\]
- clumsygrad.grad.negate_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#
Backward function for negation operation.
For \(z = -x\):
\[\frac{\partial z}{\partial x} = -1\]
- clumsygrad.grad.abs_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#
Backward function for absolute value operation.
For \(z = |x|\):
\[\begin{split}\frac{\partial z}{\partial x} = \text{sign}(x) = \begin{cases} 1 & \text{if } x > 0 \\ -1 & \text{if } x < 0 \\ 0 & \text{if } x = 0 \end{cases}\end{split}\]
- clumsygrad.grad.reshape_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#
Backward function for reshape operation.
For \(z = \text{reshape}(x, \text{new\_shape})\):
\[\frac{\partial z}{\partial x} = \text{reshape}(\text{grad}, \text{original\_shape})\]Since reshape only changes the view of the data without changing values, the gradient is simply reshaped back to the original shape.
- clumsygrad.grad.sum_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#
Backward function for sum operation.
For \(z = \sum_{i \in \text{axis}} x_i\):
\[\begin{split}\frac{\partial z}{\partial x_i} = \begin{cases} 1 & \text{if } i \in \text{axis} \\ 0 & \text{otherwise} \end{cases}\end{split}\]This function handles dimension reduction by broadcasting gradients back to the original shape.
- clumsygrad.grad.mean_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#
Backward function for mean operation.
For \(z = \frac{1}{n}\sum_{i \in \text{axis}} x_i\):
\[\begin{split}\frac{\partial z}{\partial x_i} = \begin{cases} \frac{1}{n} & \text{if } i \in \text{axis} \\ 0 & \text{otherwise} \end{cases}\end{split}\]where \(n\) is the number of elements being averaged.
- clumsygrad.grad.exp_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#
Backward function for exponential operation.
For \(z = e^x\):
\[\frac{\partial z}{\partial x} = e^x\]
- clumsygrad.grad.log_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#
Backward function for natural logarithm operation.
For \(z = \ln(x)\):
\[\frac{\partial z}{\partial x} = \frac{1}{x}\]
- clumsygrad.grad.sqrt_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#
Backward function for square root operation.
For \(z = \sqrt{x}\):
\[\frac{\partial z}{\partial x} = \frac{1}{2\sqrt{x}}\]
- clumsygrad.grad.sin_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#
Backward function for sine operation.
For \(z = \sin(x)\):
\[\frac{\partial z}{\partial x} = \cos(x)\]
- clumsygrad.grad.cos_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#
Backward function for cosine operation.
For \(z = \cos(x)\):
\[\frac{\partial z}{\partial x} = -\sin(x)\]
- clumsygrad.grad.tan_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#
Backward function for tangent operation.
For \(z = \tan(x)\):
\[\frac{\partial z}{\partial x} = \sec^2(x)\]
- clumsygrad.grad.relu_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#
Backward function for ReLU activation.
For \(z = \text{ReLU}(x) = \max(0, x)\):
\[\begin{split}\frac{\partial z}{\partial x} = \begin{cases} 1 & \text{if } x > 0 \\ 0 & \text{otherwise} \end{cases}\end{split}\]
- clumsygrad.grad.sigmoid_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#
Backward function for Sigmoid activation.
For \(z = \sigma(x) = \frac{1}{1 + e^{-x}}\):
\[\frac{\partial z}{\partial x} = \sigma(x) \cdot (1 - \sigma(x))\]
- clumsygrad.grad.tanh_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#
Backward function for hyperbolic tangent activation.
For \(z = \tanh(x)\):
\[\frac{\partial z}{\partial x} = 1 - \tanh^2(x)\]
- clumsygrad.grad.softmax_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#
Backward function for softmax activation.
For \(z_i = \frac{e^{x_i}}{\sum_{j} e^{x_j}}\):
\[\begin{split}\frac{\partial z_i}{\partial x_j} = \begin{cases} z_i(1 - z_i) & \text{if } i = j \\ -z_i z_j & \text{if } i \neq j \end{cases}\end{split}\]The gradient computation simplifies to:
\[\frac{\partial L}{\partial x} = z \odot \left(\text{grad} - \sum(z \odot \text{grad})\right)\]
- clumsygrad.grad.mse_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#
Backward function for Mean Squared Error loss.
For \(L = (\text{pred} - \text{target})^2\):
\[ \begin{align}\begin{aligned}\frac{\partial L}{\partial \text{pred}} = 2(\text{pred} - \text{target})\\\frac{\partial L}{\partial \text{target}} = -2(\text{pred} - \text{target})\end{aligned}\end{align} \]
- clumsygrad.grad.mae_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#
Backward function for Mean Absolute Error loss.
For \(L = |\text{pred} - \text{target}|\):
\[ \begin{align}\begin{aligned}\frac{\partial L}{\partial \text{pred}} = \frac{1}{n} \cdot \text{sign}(\text{pred} - \text{target})\\\frac{\partial L}{\partial \text{target}} = -\frac{1}{n} \cdot \text{sign}(\text{pred} - \text{target})\end{aligned}\end{align} \]where \(n\) is the number of elements.
- clumsygrad.grad.add_broadcast_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#
Backward function for broadcasted addition operation.
For \(z = x + y\) where \(x\) and \(y\) have different but broadcastable shapes:
\[ \begin{align}\begin{aligned}\frac{\partial z}{\partial x} = \text{reduce}(\text{grad}, \text{left\_shape})\\\frac{\partial z}{\partial y} = \text{reduce}(\text{grad}, \text{right\_shape})\end{aligned}\end{align} \]The gradients are reduced (summed) along broadcasted dimensions and reshaped to match the original tensor shapes.
- clumsygrad.grad.sub_broadcast_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#
Backward function for broadcasted subtraction operation.
For \(z = x - y\) where \(x\) and \(y\) have different but broadcastable shapes:
\[ \begin{align}\begin{aligned}\frac{\partial z}{\partial x} = \text{reduce}(\text{grad}, \text{left\_shape})\\\frac{\partial z}{\partial y} = \text{reduce}(-\text{grad}, \text{right\_shape})\end{aligned}\end{align} \]The gradient for the first operand is positive, for the second operand is negative. Both are reduced to match the original tensor shapes.
- clumsygrad.grad.mul_broadcast_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#
Backward function for broadcasted element-wise multiplication.
For \(z = x \odot y\) where \(x\) and \(y\) have different but broadcastable shapes:
\[ \begin{align}\begin{aligned}\frac{\partial z}{\partial x} = \text{reduce}(\text{grad} \odot \text{broadcast}(y), \text{left\_shape})\\\frac{\partial z}{\partial y} = \text{reduce}(\text{grad} \odot \text{broadcast}(x), \text{right\_shape})\end{aligned}\end{align} \]Each tensor’s gradient is the gradient times the other tensor’s values, then reduced to match the original tensor shapes.