3.1.7. clumsygrad.grad#

This module contains backward functions for various tensor operations in a computational graph.

All functions have a generic signature that takes a tensor and the any/tensor’s gradient as inputs, and returns a tuple of gradients for each parent tensor.

param tensor:

Result tensor from the forward operation

param grad:

Gradient flowing back from the next layer

returns:

Tuple of gradients for each parent tensor

clumsygrad.grad.GradientTuple#

A tuple of gradients for each parent tensor.

alias of Tuple[ndarray, …]

clumsygrad.grad.transpose_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#

Backward function for transpose operation.

For \(z = x^T\):

\[\frac{\partial z}{\partial x} = \text{grad}^T\]
clumsygrad.grad.add_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#

Backward function for addition operation.

For \(z = x + y\):

\[\frac{\partial z}{\partial x} = 1, \quad \frac{\partial z}{\partial y} = 1\]
clumsygrad.grad.add_scalar_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#

Backward function for tensor + scalar operation.

For \(z = x + c\) where \(c\) is scalar:

\[\frac{\partial z}{\partial x} = 1\]
clumsygrad.grad.sub_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#

Backward function for subtraction operation.

For \(z = x - y\):

\[\frac{\partial z}{\partial x} = 1, \quad \frac{\partial z}{\partial y} = -1\]
clumsygrad.grad.sub_scalar_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#

Backward function for tensor - scalar operation.

For \(z = x - c\) where \(c\) is scalar:

\[\frac{\partial z}{\partial x} = 1\]
clumsygrad.grad.mul_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#

Backward function for element-wise multiplication.

For \(z = x \odot y\):

\[\frac{\partial z}{\partial x} = y, \quad \frac{\partial z}{\partial y} = x\]
clumsygrad.grad.mul_scalar_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#

Backward function for tensor * scalar operation.

For \(z = x \cdot c\) where \(c\) is scalar:

\[\frac{\partial z}{\partial x} = c\]
clumsygrad.grad.matmul_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#

Backward function for matrix multiplication.

For \(Z = X \cdot Y\):

\[\frac{\partial Z}{\partial X} = \text{grad} \cdot Y^T, \quad \frac{\partial Z}{\partial Y} = X^T \cdot \text{grad}\]
clumsygrad.grad.power_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#

Backward function for power operation.

For \(z = x^n\):

\[\frac{\partial z}{\partial x} = n \cdot x^{n-1}\]
clumsygrad.grad.negate_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#

Backward function for negation operation.

For \(z = -x\):

\[\frac{\partial z}{\partial x} = -1\]
clumsygrad.grad.abs_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#

Backward function for absolute value operation.

For \(z = |x|\):

\[\begin{split}\frac{\partial z}{\partial x} = \text{sign}(x) = \begin{cases} 1 & \text{if } x > 0 \\ -1 & \text{if } x < 0 \\ 0 & \text{if } x = 0 \end{cases}\end{split}\]
clumsygrad.grad.reshape_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#

Backward function for reshape operation.

For \(z = \text{reshape}(x, \text{new\_shape})\):

\[\frac{\partial z}{\partial x} = \text{reshape}(\text{grad}, \text{original\_shape})\]

Since reshape only changes the view of the data without changing values, the gradient is simply reshaped back to the original shape.

clumsygrad.grad.sum_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#

Backward function for sum operation.

For \(z = \sum_{i \in \text{axis}} x_i\):

\[\begin{split}\frac{\partial z}{\partial x_i} = \begin{cases} 1 & \text{if } i \in \text{axis} \\ 0 & \text{otherwise} \end{cases}\end{split}\]

This function handles dimension reduction by broadcasting gradients back to the original shape.

clumsygrad.grad.mean_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#

Backward function for mean operation.

For \(z = \frac{1}{n}\sum_{i \in \text{axis}} x_i\):

\[\begin{split}\frac{\partial z}{\partial x_i} = \begin{cases} \frac{1}{n} & \text{if } i \in \text{axis} \\ 0 & \text{otherwise} \end{cases}\end{split}\]

where \(n\) is the number of elements being averaged.

clumsygrad.grad.exp_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#

Backward function for exponential operation.

For \(z = e^x\):

\[\frac{\partial z}{\partial x} = e^x\]
clumsygrad.grad.log_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#

Backward function for natural logarithm operation.

For \(z = \ln(x)\):

\[\frac{\partial z}{\partial x} = \frac{1}{x}\]
clumsygrad.grad.sqrt_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#

Backward function for square root operation.

For \(z = \sqrt{x}\):

\[\frac{\partial z}{\partial x} = \frac{1}{2\sqrt{x}}\]
clumsygrad.grad.sin_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#

Backward function for sine operation.

For \(z = \sin(x)\):

\[\frac{\partial z}{\partial x} = \cos(x)\]
clumsygrad.grad.cos_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#

Backward function for cosine operation.

For \(z = \cos(x)\):

\[\frac{\partial z}{\partial x} = -\sin(x)\]
clumsygrad.grad.tan_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#

Backward function for tangent operation.

For \(z = \tan(x)\):

\[\frac{\partial z}{\partial x} = \sec^2(x)\]
clumsygrad.grad.relu_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#

Backward function for ReLU activation.

For \(z = \text{ReLU}(x) = \max(0, x)\):

\[\begin{split}\frac{\partial z}{\partial x} = \begin{cases} 1 & \text{if } x > 0 \\ 0 & \text{otherwise} \end{cases}\end{split}\]
clumsygrad.grad.sigmoid_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#

Backward function for Sigmoid activation.

For \(z = \sigma(x) = \frac{1}{1 + e^{-x}}\):

\[\frac{\partial z}{\partial x} = \sigma(x) \cdot (1 - \sigma(x))\]
clumsygrad.grad.tanh_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#

Backward function for hyperbolic tangent activation.

For \(z = \tanh(x)\):

\[\frac{\partial z}{\partial x} = 1 - \tanh^2(x)\]
clumsygrad.grad.softmax_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#

Backward function for softmax activation.

For \(z_i = \frac{e^{x_i}}{\sum_{j} e^{x_j}}\):

\[\begin{split}\frac{\partial z_i}{\partial x_j} = \begin{cases} z_i(1 - z_i) & \text{if } i = j \\ -z_i z_j & \text{if } i \neq j \end{cases}\end{split}\]

The gradient computation simplifies to:

\[\frac{\partial L}{\partial x} = z \odot \left(\text{grad} - \sum(z \odot \text{grad})\right)\]
clumsygrad.grad.mse_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#

Backward function for Mean Squared Error loss.

For \(L = (\text{pred} - \text{target})^2\):

\[ \begin{align}\begin{aligned}\frac{\partial L}{\partial \text{pred}} = 2(\text{pred} - \text{target})\\\frac{\partial L}{\partial \text{target}} = -2(\text{pred} - \text{target})\end{aligned}\end{align} \]
clumsygrad.grad.mae_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#

Backward function for Mean Absolute Error loss.

For \(L = |\text{pred} - \text{target}|\):

\[ \begin{align}\begin{aligned}\frac{\partial L}{\partial \text{pred}} = \frac{1}{n} \cdot \text{sign}(\text{pred} - \text{target})\\\frac{\partial L}{\partial \text{target}} = -\frac{1}{n} \cdot \text{sign}(\text{pred} - \text{target})\end{aligned}\end{align} \]

where \(n\) is the number of elements.

clumsygrad.grad.add_broadcast_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#

Backward function for broadcasted addition operation.

For \(z = x + y\) where \(x\) and \(y\) have different but broadcastable shapes:

\[ \begin{align}\begin{aligned}\frac{\partial z}{\partial x} = \text{reduce}(\text{grad}, \text{left\_shape})\\\frac{\partial z}{\partial y} = \text{reduce}(\text{grad}, \text{right\_shape})\end{aligned}\end{align} \]

The gradients are reduced (summed) along broadcasted dimensions and reshaped to match the original tensor shapes.

clumsygrad.grad.sub_broadcast_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#

Backward function for broadcasted subtraction operation.

For \(z = x - y\) where \(x\) and \(y\) have different but broadcastable shapes:

\[ \begin{align}\begin{aligned}\frac{\partial z}{\partial x} = \text{reduce}(\text{grad}, \text{left\_shape})\\\frac{\partial z}{\partial y} = \text{reduce}(-\text{grad}, \text{right\_shape})\end{aligned}\end{align} \]

The gradient for the first operand is positive, for the second operand is negative. Both are reduced to match the original tensor shapes.

clumsygrad.grad.mul_broadcast_backward(tensor: Tensor, grad: ndarray) Tuple[ndarray, ...][source]#

Backward function for broadcasted element-wise multiplication.

For \(z = x \odot y\) where \(x\) and \(y\) have different but broadcastable shapes:

\[ \begin{align}\begin{aligned}\frac{\partial z}{\partial x} = \text{reduce}(\text{grad} \odot \text{broadcast}(y), \text{left\_shape})\\\frac{\partial z}{\partial y} = \text{reduce}(\text{grad} \odot \text{broadcast}(x), \text{right\_shape})\end{aligned}\end{align} \]

Each tensor’s gradient is the gradient times the other tensor’s values, then reduced to match the original tensor shapes.