3.1.6. clumsygrad.optimizer#
This module contains common optimizers for optimizing parameters in a computational graph.
- class clumsygrad.optimizer.Optimizer(parameters: List[Tensor])[source]#
Bases:
ABCAbstract base class for all optimizers.
- class clumsygrad.optimizer.SGD(parameters: List[Tensor], lr: float = 0.01)[source]#
Bases:
OptimizerStochastic Gradient Descent (SGD) optimizer.
This optimizer updates parameters using the formula: param -= lr * grad, where param is a parameter tensor, lr is the learning rate, and grad is the gradient of the parameter.
Reference: Robbins, H., & Monro, S. (1951). A stochastic approximation method.
- class clumsygrad.optimizer.Adam(parameters: List[Tensor], lr: float = 0.001, beta1: float = 0.9, beta2: float = 0.999, eps: float = 1e-08)[source]#
Bases:
OptimizerAdam (Adaptive Moment Estimation) optimizer.
This optimizer combines the advantages of AdaGrad and RMSprop by computing adaptive learning rates for each parameter using estimates of first and second moments of the gradients.
Reference: Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization.
- __init__(parameters: List[Tensor], lr: float = 0.001, beta1: float = 0.9, beta2: float = 0.999, eps: float = 1e-08)[source]#
Initialize the Adam optimizer.
- Parameters:
parameters (
List[Tensor]) – List of parameter tensors to optimize.lr (
float) – Learning rate. Default is 0.001.beta1 (
float) – Exponential decay rate for first moment estimates. Default is 0.9.beta2 (
float) – Exponential decay rate for second moment estimates. Default is 0.999.eps (
float) – Small constant for numerical stability. Default is 1e-8.