Module `light_labyrinth.hyperparams.optimization`

The light_labyrinth.hyperparams.optimization module includes Optimizer classes with predefined optimization algorithms that can be used for training Light Labyrinth models.

Expand source code

"""
The `light_labyrinth.hyperparams.optimization` module includes `Optimizer` classes
with predefined optimization algorithms that can be used for training Light Labyrinth models. 
"""

from ._optimization import Adam, GradientDescent, Nadam, RMSprop

__all__ = ["Adam", "GradientDescent", "Nadam", "RMSprop"]

Classes

class Adam (learning_rate=0.01, beta1=0.9, beta2=0.999, epsilon=1e-06)

Adam (Adaptive Moment Estimation) optimizer class for learning Light Labyrinth models.

In each iteration $k$ of the learning process, the loss function's gradient $\nabla\xi(W_{k}, X, y)$ is computed, and the model's weights $W_k$ are updated. Firstly the first $m_k$ and the second $v_k$ momentum of the gradient is computed $m_k = \beta_1 m_{k-1} + (1-\beta_1)\nabla\xi(W_{k}, X, y)$ $v_k = \beta_2 v_{k-1} + (1-\beta_2)(\nabla\xi(W_{k}, X, y))^2,$ and then the weights $W_{k}$ are updated following the formulas: $\hat{m_k} = \frac{m_k}{1-\beta_1^k}$ $\hat{v_k} = \frac{v_k}{1-\beta_2^k}$ $W_{k+1} = W_k - \frac{\alpha}{\sqrt{\hat{v_k}}+\epsilon}\hat{m_k},$ where $\alpha > 0$ is the learning rate and $\beta_1, \beta_2 \in [0,1)$ are decaying factors. The $\epsilon>0$ term ensures numerical stability and should not be too big.

Parameters

learning_rate : float, default=0.01: The learning rate $\alpha$ – a positive constant (usually not greater than 1.0) that controls the magnitude of steps taken in each iteration, and effectively the learning speed. Note that too high learning rate may lead to overshooting.
beta1 : float, default=0.9: The decaying factor $\beta_1$ for the first-order momentum.
beta2 : float, default=0.999: The decaying factor $\beta_2$ for the second-order momentum.
epsilon : float, default=1e-6: A smoothing term that avoids division by zero.

Attributes

learning_rate : float: The learning rate.
beta1 : float: The decaying factor.
beta2 : float: The decaying factor.
epsilon : float: A smoothing term.

References

Sebastian Ruder "An overview of gradient descent optimization algorithms", CoRR (2016) http://arxiv.org/abs/1609.04747

Examples

>>> from light_labyrinth.hyperparams.optimization import Adam
>>> from light_labyrinth.dim2 import LightLabyrinthClassifier
>>> model = LightLabyrinthClassifier(3, 3,
...                             optimizer=Adam(0.001))

Expand source code

class Adam(_OptimizerBase):
    """
    Adam (Adaptive Moment Estimation) optimizer class for learning Light Labyrinth models.

    In each iteration \\(k\\) of the learning process, the loss function's gradient \\(\\nabla\\xi(W_{k}, X, y)\\) is computed, and the model's weights \\(W_k\\) are updated.
    Firstly the first \\(m_k\\) and the second \\(v_k\\) momentum of the gradient is computed
    \\[m_k = \\beta_1 m_{k-1} + (1-\\beta_1)\\nabla\\xi(W_{k}, X, y)\\]
    \\[v_k = \\beta_2 v_{k-1} + (1-\\beta_2)(\\nabla\\xi(W_{k}, X, y))^2,\\]
    and then the weights \\(W_{k}\\) are updated following the formulas:
    \\[\hat{m_k} = \\frac{m_k}{1-\\beta_1^k}\\]
    \\[\hat{v_k} = \\frac{v_k}{1-\\beta_2^k}\\]
    \\[W_{k+1} = W_k - \\frac{\\alpha}{\\sqrt{\\hat{v_k}}+\\epsilon}\\hat{m_k},\\]
    where \\(\\alpha > 0\\) is the learning rate and \\(\\beta_1, \\beta_2 \\in [0,1)\\) are decaying factors.
    The \\(\\epsilon>0\\) term ensures numerical stability and should not be too big. 

    Parameters
    ----------
    ----------
    learning_rate : float, default=0.01
        The learning rate \\(\\alpha\\) -- a positive constant (usually not greater than 1.0) that controls the magnitude of steps taken in each iteration,
        and effectively the learning speed. Note that too high learning rate may lead to overshooting.

    beta1 : float, default=0.9
        The decaying factor \\(\\beta_1\\) for the first-order momentum.

    beta2 : float, default=0.999
        The decaying factor \\(\\beta_2\\) for the second-order momentum.

    epsilon : float, default=1e-6
        A smoothing term that avoids division by zero.

    Attributes
    ----------
    ----------
    learning_rate : float
        The learning rate.

    beta1 : float
        The decaying factor.

    beta2 : float
        The decaying factor.

    epsilon : float
        A smoothing term.

    References
    ----------
    ----------
    Sebastian Ruder
        "An overview of gradient descent optimization algorithms", CoRR (2016)
        <http://arxiv.org/abs/1609.04747>

    See Also
    --------
    light_labyrinth.hyperparams.optimization.RMSprop : RMSprop optimization algorithm.
    light_labyrinth.hyperparams.optimization.Nadam : Nadam optimization algorithm.

    Examples
    --------
    >>> from light_labyrinth.hyperparams.optimization import Adam
    >>> from light_labyrinth.dim2 import LightLabyrinthClassifier
    >>> model = LightLabyrinthClassifier(3, 3,
    ...                             optimizer=Adam(0.001))
    """

    def __init__(self, learning_rate=0.01, beta1=0.9, beta2=0.999, epsilon=1e-6):
        super().__init__("Adam", [learning_rate, beta1, beta2, epsilon])
        self._learning_rate = learning_rate
        self._beta1 = beta1
        self._beta2 = beta2
        self._epsilon = epsilon

    @property
    def learning_rate(self):
        return self._learning_rate

    @property
    def beta1(self):
        return self._beta1

    @property
    def beta2(self):
        return self._beta2

    @property
    def epsilon(self):
        return self._epsilon

Ancestors

light_labyrinth.hyperparams.optimization._optimization._OptimizerBase

Instance variables

var beta1

Expand source code

@property
def beta1(self):
    return self._beta1

var beta2

Expand source code

@property
def beta2(self):
    return self._beta2

var epsilon

Expand source code

@property
def epsilon(self):
    return self._epsilon

var learning_rate

Expand source code

@property
def learning_rate(self):
    return self._learning_rate

class GradientDescent (learning_rate=0.01, momentum=0.0)

Gradient Descent optimizer class for learning Light Labyrinth models.

In each iteration $k$ of the learning process, the loss function's gradient $\nabla\xi(W_{k}, X, y)$ is computed, and the model's weights $W_k$ are updated following the formulas: $\Delta W_{k} = \gamma \Delta W_{k-1} + \alpha \nabla\xi(W_{k}, X, y)\\ W_{k+1} = W_{k} - \Delta W_{k},$ where $\alpha$ is a positive constant called the learning rate and $\gamma \in [0,1)$ is a momentum coefficient.

Parameters

learning_rate : float, default=0.01: The learning rate $\alpha$ – a positive constant (usually not greater than 1.0) that controls the magnitude of steps taken in each iteration, and effectively the learning speed. Note that too high learning rate may lead to overshooting.
momentum : float, default=0.0: The momentum coefficient $\gamma \in [0,1)$ .

Attributes

learning_rate : float: The learning rate.

References

Sebastian Ruder "An overview of gradient descent optimization algorithms", CoRR (2016) http://arxiv.org/abs/1609.04747

Examples

>>> from light_labyrinth.hyperparams.optimization import GradientDescent
>>> from light_labyrinth.dim2 import LightLabyrinthClassifier
>>> model = LightLabyrinthClassifier(3, 3,
...                             optimizer=GradientDescent(0.001, 0.9))

Expand source code

class GradientDescent(_OptimizerBase):
    """
    Gradient Descent optimizer class for learning Light Labyrinth models.

    In each iteration \\(k\\) of the learning process, the loss function's gradient \\(\\nabla\\xi(W_{k}, X, y)\\) is computed, and the model's weights \\(W_k\\) are updated following the formulas:
    \\[\\Delta W_{k} = \\gamma \\Delta W_{k-1} + \\alpha \\nabla\\xi(W_{k}, X, y)\\\\
    W_{k+1} = W_{k} - \\Delta W_{k},\\]
    where \\(\\alpha\\) is a positive constant called the learning rate and \\(\\gamma \\in [0,1)\\) is a momentum coefficient.


    Parameters
    ----------
    ----------
    learning_rate : float, default=0.01
        The learning rate \\(\\alpha\\) -- a positive constant (usually not greater than 1.0) that controls the magnitude of steps taken in each iteration,
        and effectively the learning speed. Note that too high learning rate may lead to overshooting.

    momentum : float, default=0.0
        The momentum coefficient \\(\\gamma \\in [0,1) \\).

    Attributes
    ----------
    ----------
    learning_rate : float
        The learning rate.

    References
    ----------
    Sebastian Ruder
        "An overview of gradient descent optimization algorithms", CoRR (2016)
        <http://arxiv.org/abs/1609.04747>

    See Also
    --------
    light_labyrinth.hyperparams.optimization.Adam : Adam optimization algorithm.
    light_labyrinth.hyperparams.optimization.RMSprop : RMSprop optimization algorithm.

    Examples
    --------
    >>> from light_labyrinth.hyperparams.optimization import GradientDescent
    >>> from light_labyrinth.dim2 import LightLabyrinthClassifier
    >>> model = LightLabyrinthClassifier(3, 3,
    ...                             optimizer=GradientDescent(0.001, 0.9))
    """

    def __init__(self, learning_rate=0.01, momentum=0.0):
        super().__init__("Gradient_Descent", [learning_rate, momentum])
        self._learning_rate = learning_rate
        self._momentum = momentum

    @property
    def learning_rate(self):
        return self._learning_rate

    @property
    def momentum(self):
        return self._momentum

Ancestors

light_labyrinth.hyperparams.optimization._optimization._OptimizerBase

Instance variables

var learning_rate

Expand source code

@property
def learning_rate(self):
    return self._learning_rate

var momentum

Expand source code

@property
def momentum(self):
    return self._momentum

class Nadam (learning_rate=0.01, beta1=0.9, beta2=0.999, epsilon=1e-06)

Nadam (Nesterov-accelerated Adaptive Moment Estimation) optimizer class for learning Light Labyrinth models.

A modified version of the Adam optimizer. At each iteration $k$ model's weights $W_k$ are updated following the formula: $W_{k+1} = W_k - \frac{\alpha}{\sqrt{\hat{v_k}}+\epsilon}\Big(\beta_1\hat{m_k} + \frac{1-\beta_1}{1-\beta_1^k}\nabla\xi(W_{k}, X, y)\Big)$ For further details see Adam optimizer.

Parameters

learning_rate : float, default=0.01: The learning rate $\alpha$ – a positive constant (usually not greater than 1.0) that controls the magnitude of steps taken in each iteration, and effectively the learning speed. Note that too high learning rate may lead to overshooting.
beta1 : float, default=0.9: The decaying factor $\beta_1$ for the first-order momentum.
beta2 : float, default=0.999: The decaying factor $\beta_2$ for the second-order momentum.
epsilon : float, default=1e-6: A smoothing term that avoids division by zero.

Attributes

learning_rate : float: The learning rate.
beta1 : float: The decaying factor.
beta2 : float: The decaying factor.
epsilon : float: A smoothing term.

References

Sebastian Ruder "An overview of gradient descent optimization algorithms", CoRR (2016) http://arxiv.org/abs/1609.04747

Examples

>>> from light_labyrinth.hyperparams.optimization import Nadam
>>> from light_labyrinth.dim2 import LightLabyrinthClassifier
>>> model = LightLabyrinthClassifier(3, 3,
...                             optimizer=Nadam(0.001))

Expand source code

class Nadam(_OptimizerBase):
    """
    Nadam (Nesterov-accelerated Adaptive Moment Estimation) optimizer class for learning Light Labyrinth models.

    A modified version of the Adam optimizer. At each iteration \\(k\\) model's weights \\(W_k\\) are updated following the formula:
    \\[W_{k+1} = W_k - \\frac{\\alpha}{\\sqrt{\\hat{v_k}}+\\epsilon}\\Big(\\beta_1\\hat{m_k} + \\frac{1-\\beta_1}{1-\\beta_1^k}\\nabla\\xi(W_{k}, X, y)\\Big)\\]
    For further details see `light_labyrinth.hyperparams.optimization.Adam` optimizer.

    Parameters
    ----------
    ----------
    learning_rate : float, default=0.01
        The learning rate \\(\\alpha\\) -- a positive constant (usually not greater than 1.0) that controls the magnitude of steps taken in each iteration,
        and effectively the learning speed. Note that too high learning rate may lead to overshooting.

    beta1 : float, default=0.9
        The decaying factor \\(\\beta_1\\) for the first-order momentum.

    beta2 : float, default=0.999
        The decaying factor \\(\\beta_2\\) for the second-order momentum.

    epsilon : float, default=1e-6
        A smoothing term that avoids division by zero.

    Attributes
    ----------
    ----------
    learning_rate : float
        The learning rate.

    beta1 : float
        The decaying factor.

    beta2 : float
        The decaying factor.

    epsilon : float
        A smoothing term.

    References
    ----------
    ----------
    Sebastian Ruder
        "An overview of gradient descent optimization algorithms", CoRR (2016)
        <http://arxiv.org/abs/1609.04747>

    See Also
    --------
    light_labyrinth.hyperparams.optimization.Adam : Adam optimization algorithm.
    light_labyrinth.hyperparams.optimization.RMSprop : RMSprop optimization algorithm.

    Examples
    --------
    >>> from light_labyrinth.hyperparams.optimization import Nadam
    >>> from light_labyrinth.dim2 import LightLabyrinthClassifier
    >>> model = LightLabyrinthClassifier(3, 3,
    ...                             optimizer=Nadam(0.001))
    """

    def __init__(self, learning_rate=0.01, beta1=0.9, beta2=0.999, epsilon=1e-6):
        super().__init__("Nadam", [learning_rate, beta1, beta2, epsilon])
        self._learning_rate = learning_rate
        self._beta1 = beta1
        self._beta2 = beta2
        self._epsilon = epsilon

    @property
    def learning_rate(self):
        return self._learning_rate

    @property
    def beta1(self):
        return self._beta1

    @property
    def beta2(self):
        return self._beta2

    @property
    def epsilon(self):
        return self._epsilon

Ancestors

light_labyrinth.hyperparams.optimization._optimization._OptimizerBase

Instance variables

var beta1

Expand source code

@property
def beta1(self):
    return self._beta1

var beta2

Expand source code

@property
def beta2(self):
    return self._beta2

var epsilon

Expand source code

@property
def epsilon(self):
    return self._epsilon

var learning_rate

Expand source code

@property
def learning_rate(self):
    return self._learning_rate

class RMSprop (learning_rate=0.01, rho=0.9, momentum=0.0, epsilon=1e-06)

RMSprop optimizer class for learning Light Labyrinth models.

In each iteration $k$ of the learning process, the loss function's gradient $\nabla\xi(W_{k}, X, y)$ is computed, and the model's weights $W_k$ are updated following the formula: $v_{k} = \rho v_{k-1} + (1 - \rho) (\nabla\xi(W_{k}, X, y))^2 \\ \Delta W_{k} = -\gamma \Delta W_{k-1} + \frac{\alpha}{\sqrt{v_{k+1} + \epsilon}}\nabla\xi(W_{k}, X, y) \\ W_{k+1} = W_{k} - \Delta W_{k}$ Where $\alpha>0$ is the learning rate, $\rho \in (0,1)$ is the forgetting factor, and $\gamma \in [0,1)$ a momentum coefficient. The $\epsilon>0$ term ensures numerical stability and should not be too big.

Parameters

learning_rate : float, default=0.01: The learning rate $\alpha$ – a positive constant (usually not greater than 1.0) that controls the magnitude of steps taken in each iteration, and effectively the learning speed. Note that too high learning rate may lead to overshooting.
rho : float, default=0.9: The forgetting factor $\rho \in (0,1)$ .
momentum : float, default=0.0: The momentum coefficient $\gamma \in [0,1)$ .
epsilon : float, default=1e-6: A smoothing term that avoids division by zero.

Attributes

learning_rate : float: The learning rate.
rho : float: The forgetting factor.
momentum : float: The momentum coefficient.
epsilon : float: A smoothing term.

References

Sebastian Ruder "An overview of gradient descent optimization algorithms", CoRR (2016) http://arxiv.org/abs/1609.04747

Examples

>>> from light_labyrinth.hyperparams.optimization import RMSprop
>>> from light_labyrinth.dim2 import LightLabyrinthClassifier
>>> model = LightLabyrinthClassifier(3, 3,
...                             optimizer=RMSprop(0.001))

Expand source code

class RMSprop(_OptimizerBase):
    """
    RMSprop optimizer class for learning Light Labyrinth models.

    In each iteration \\(k\\) of the learning process, the loss function's gradient \\(\\nabla\\xi(W_{k}, X, y)\\) is computed, and the model's weights \\(W_k\\) are updated following the formula:
    \\[v_{k} = \\rho v_{k-1}  + (1 - \\rho) (\\nabla\\xi(W_{k}, X, y))^2 \\\\
    \\Delta W_{k} = -\\gamma \\Delta W_{k-1} + \\frac{\\alpha}{\\sqrt{v_{k+1} + \\epsilon}}\\nabla\\xi(W_{k}, X, y) \\\\
    W_{k+1} = W_{k} - \\Delta W_{k}\\]
    Where \\(\\alpha>0\\) is the learning rate, \\(\\rho \\in (0,1)\\) is the forgetting factor, and \\(\\gamma \\in [0,1)\\) a momentum coefficient.
    The \\(\\epsilon>0\\) term ensures numerical stability and should not be too big. 


    Parameters
    ----------
    ----------
    learning_rate : float, default=0.01
        The learning rate \\(\\alpha\\) -- a positive constant (usually not greater than 1.0) that controls the magnitude of steps taken in each iteration,
        and effectively the learning speed. Note that too high learning rate may lead to overshooting.

    rho : float, default=0.9
        The forgetting factor \\(\\rho \\in (0,1)\\).

    momentum : float, default=0.0
        The momentum coefficient \\(\\gamma \\in [0,1)\\).

    epsilon : float, default=1e-6
        A smoothing term that avoids division by zero.

    Attributes
    ----------
    ----------
    learning_rate : float
        The learning rate.

    rho : float
        The forgetting factor.

    momentum : float
        The momentum coefficient.

    epsilon : float
        A smoothing term.

    References
    ----------
    ----------
    Sebastian Ruder
        "An overview of gradient descent optimization algorithms", CoRR (2016)
        <http://arxiv.org/abs/1609.04747>

    See Also
    --------
    light_labyrinth.hyperparams.optimization.Adam : Adam optimization algorithm.
    light_labyrinth.hyperparams.optimization.GradientDescent : GradientDescent optimization algorithm.

    Examples
    --------
    >>> from light_labyrinth.hyperparams.optimization import RMSprop
    >>> from light_labyrinth.dim2 import LightLabyrinthClassifier
    >>> model = LightLabyrinthClassifier(3, 3,
    ...                             optimizer=RMSprop(0.001))
    """

    def __init__(self, learning_rate=0.01, rho=0.9, momentum=0.0, epsilon=1e-6):
        super().__init__("RMSprop", [learning_rate, rho, momentum, epsilon])
        self._learning_rate = learning_rate
        self._rho = rho
        self._momentum = momentum
        self._epsilon = epsilon

    @property
    def learning_rate(self):
        return self._learning_rate

    @property
    def rho(self):
        return self._rho

    @property
    def momentum(self):
        return self._momentum

    @property
    def epsilon(self):
        return self._epsilon

Ancestors

light_labyrinth.hyperparams.optimization._optimization._OptimizerBase

Instance variables

var epsilon

Expand source code

@property
def epsilon(self):
    return self._epsilon

var learning_rate

Expand source code

@property
def learning_rate(self):
    return self._learning_rate

var momentum

Expand source code

@property
def momentum(self):
    return self._momentum

var rho

Expand source code

@property
def rho(self):
    return self._rho