Module light_labyrinth.hyperparams.optimization

The light_labyrinth.hyperparams.optimization module includes Optimizer classes with predefined optimization algorithms that can be used for training Light Labyrinth models.

Expand source code
"""
The `light_labyrinth.hyperparams.optimization` module includes `Optimizer` classes
with predefined optimization algorithms that can be used for training Light Labyrinth models. 
"""

from ._optimization import Adam, GradientDescent, Nadam, RMSprop

__all__ = ["Adam", "GradientDescent", "Nadam", "RMSprop"]

Classes

class Adam (learning_rate=0.01, beta1=0.9, beta2=0.999, epsilon=1e-06)

Adam (Adaptive Moment Estimation) optimizer class for learning Light Labyrinth models.

In each iteration k of the learning process, the loss function's gradient \nabla\xi(W_{k}, X, y) is computed, and the model's weights W_k are updated. Firstly the first m_k and the second v_k momentum of the gradient is computed m_k = \beta_1 m_{k-1} + (1-\beta_1)\nabla\xi(W_{k}, X, y) v_k = \beta_2 v_{k-1} + (1-\beta_2)(\nabla\xi(W_{k}, X, y))^2, and then the weights W_{k} are updated following the formulas: \hat{m_k} = \frac{m_k}{1-\beta_1^k} \hat{v_k} = \frac{v_k}{1-\beta_2^k} W_{k+1} = W_k - \frac{\alpha}{\sqrt{\hat{v_k}}+\epsilon}\hat{m_k}, where \alpha > 0 is the learning rate and \beta_1, \beta_2 \in [0,1) are decaying factors. The \epsilon>0 term ensures numerical stability and should not be too big.

Parameters


learning_rate : float, default=0.01
The learning rate \alpha – a positive constant (usually not greater than 1.0) that controls the magnitude of steps taken in each iteration, and effectively the learning speed. Note that too high learning rate may lead to overshooting.
beta1 : float, default=0.9
The decaying factor \beta_1 for the first-order momentum.
beta2 : float, default=0.999
The decaying factor \beta_2 for the second-order momentum.
epsilon : float, default=1e-6
A smoothing term that avoids division by zero.

Attributes


learning_rate : float
The learning rate.
beta1 : float
The decaying factor.
beta2 : float
The decaying factor.
epsilon : float
A smoothing term.

References


Sebastian Ruder "An overview of gradient descent optimization algorithms", CoRR (2016) http://arxiv.org/abs/1609.04747

See Also

RMSprop
RMSprop optimization algorithm.
Nadam
Nadam optimization algorithm.

Examples

>>> from light_labyrinth.hyperparams.optimization import Adam
>>> from light_labyrinth.dim2 import LightLabyrinthClassifier
>>> model = LightLabyrinthClassifier(3, 3,
...                             optimizer=Adam(0.001))
Expand source code
class Adam(_OptimizerBase):
    """
    Adam (Adaptive Moment Estimation) optimizer class for learning Light Labyrinth models.

    In each iteration \\(k\\) of the learning process, the loss function's gradient \\(\\nabla\\xi(W_{k}, X, y)\\) is computed, and the model's weights \\(W_k\\) are updated.
    Firstly the first \\(m_k\\) and the second \\(v_k\\) momentum of the gradient is computed
    \\[m_k = \\beta_1 m_{k-1} + (1-\\beta_1)\\nabla\\xi(W_{k}, X, y)\\]
    \\[v_k = \\beta_2 v_{k-1} + (1-\\beta_2)(\\nabla\\xi(W_{k}, X, y))^2,\\]
    and then the weights \\(W_{k}\\) are updated following the formulas:
    \\[\hat{m_k} = \\frac{m_k}{1-\\beta_1^k}\\]
    \\[\hat{v_k} = \\frac{v_k}{1-\\beta_2^k}\\]
    \\[W_{k+1} = W_k - \\frac{\\alpha}{\\sqrt{\\hat{v_k}}+\\epsilon}\\hat{m_k},\\]
    where \\(\\alpha > 0\\) is the learning rate and \\(\\beta_1, \\beta_2 \\in [0,1)\\) are decaying factors.
    The \\(\\epsilon>0\\) term ensures numerical stability and should not be too big. 

    Parameters
    ----------
    ----------
    learning_rate : float, default=0.01
        The learning rate \\(\\alpha\\) -- a positive constant (usually not greater than 1.0) that controls the magnitude of steps taken in each iteration,
        and effectively the learning speed. Note that too high learning rate may lead to overshooting.

    beta1 : float, default=0.9
        The decaying factor \\(\\beta_1\\) for the first-order momentum.

    beta2 : float, default=0.999
        The decaying factor \\(\\beta_2\\) for the second-order momentum.

    epsilon : float, default=1e-6
        A smoothing term that avoids division by zero.

    Attributes
    ----------
    ----------
    learning_rate : float
        The learning rate.

    beta1 : float
        The decaying factor.

    beta2 : float
        The decaying factor.

    epsilon : float
        A smoothing term.

    References
    ----------
    ----------
    Sebastian Ruder
        "An overview of gradient descent optimization algorithms", CoRR (2016)
        <http://arxiv.org/abs/1609.04747>

    See Also
    --------
    light_labyrinth.hyperparams.optimization.RMSprop : RMSprop optimization algorithm.
    light_labyrinth.hyperparams.optimization.Nadam : Nadam optimization algorithm.

    Examples
    --------
    >>> from light_labyrinth.hyperparams.optimization import Adam
    >>> from light_labyrinth.dim2 import LightLabyrinthClassifier
    >>> model = LightLabyrinthClassifier(3, 3,
    ...                             optimizer=Adam(0.001))
    """

    def __init__(self, learning_rate=0.01, beta1=0.9, beta2=0.999, epsilon=1e-6):
        super().__init__("Adam", [learning_rate, beta1, beta2, epsilon])
        self._learning_rate = learning_rate
        self._beta1 = beta1
        self._beta2 = beta2
        self._epsilon = epsilon

    @property
    def learning_rate(self):
        return self._learning_rate

    @property
    def beta1(self):
        return self._beta1

    @property
    def beta2(self):
        return self._beta2

    @property
    def epsilon(self):
        return self._epsilon

Ancestors

  • light_labyrinth.hyperparams.optimization._optimization._OptimizerBase

Instance variables

var beta1
Expand source code
@property
def beta1(self):
    return self._beta1
var beta2
Expand source code
@property
def beta2(self):
    return self._beta2
var epsilon
Expand source code
@property
def epsilon(self):
    return self._epsilon
var learning_rate
Expand source code
@property
def learning_rate(self):
    return self._learning_rate
class GradientDescent (learning_rate=0.01, momentum=0.0)

Gradient Descent optimizer class for learning Light Labyrinth models.

In each iteration k of the learning process, the loss function's gradient \nabla\xi(W_{k}, X, y) is computed, and the model's weights W_k are updated following the formulas: \Delta W_{k} = \gamma \Delta W_{k-1} + \alpha \nabla\xi(W_{k}, X, y)\\ W_{k+1} = W_{k} - \Delta W_{k}, where \alpha is a positive constant called the learning rate and \gamma \in [0,1) is a momentum coefficient.

Parameters


learning_rate : float, default=0.01
The learning rate \alpha – a positive constant (usually not greater than 1.0) that controls the magnitude of steps taken in each iteration, and effectively the learning speed. Note that too high learning rate may lead to overshooting.
momentum : float, default=0.0
The momentum coefficient \gamma \in [0,1) .

Attributes


learning_rate : float
The learning rate.

References

Sebastian Ruder "An overview of gradient descent optimization algorithms", CoRR (2016) http://arxiv.org/abs/1609.04747

See Also

Adam
Adam optimization algorithm.
RMSprop
RMSprop optimization algorithm.

Examples

>>> from light_labyrinth.hyperparams.optimization import GradientDescent
>>> from light_labyrinth.dim2 import LightLabyrinthClassifier
>>> model = LightLabyrinthClassifier(3, 3,
...                             optimizer=GradientDescent(0.001, 0.9))
Expand source code
class GradientDescent(_OptimizerBase):
    """
    Gradient Descent optimizer class for learning Light Labyrinth models.

    In each iteration \\(k\\) of the learning process, the loss function's gradient \\(\\nabla\\xi(W_{k}, X, y)\\) is computed, and the model's weights \\(W_k\\) are updated following the formulas:
    \\[\\Delta W_{k} = \\gamma \\Delta W_{k-1} + \\alpha \\nabla\\xi(W_{k}, X, y)\\\\
    W_{k+1} = W_{k} - \\Delta W_{k},\\]
    where \\(\\alpha\\) is a positive constant called the learning rate and \\(\\gamma \\in [0,1)\\) is a momentum coefficient.


    Parameters
    ----------
    ----------
    learning_rate : float, default=0.01
        The learning rate \\(\\alpha\\) -- a positive constant (usually not greater than 1.0) that controls the magnitude of steps taken in each iteration,
        and effectively the learning speed. Note that too high learning rate may lead to overshooting.

    momentum : float, default=0.0
        The momentum coefficient \\(\\gamma \\in [0,1) \\).

    Attributes
    ----------
    ----------
    learning_rate : float
        The learning rate.

    References
    ----------
    Sebastian Ruder
        "An overview of gradient descent optimization algorithms", CoRR (2016)
        <http://arxiv.org/abs/1609.04747>

    See Also
    --------
    light_labyrinth.hyperparams.optimization.Adam : Adam optimization algorithm.
    light_labyrinth.hyperparams.optimization.RMSprop : RMSprop optimization algorithm.

    Examples
    --------
    >>> from light_labyrinth.hyperparams.optimization import GradientDescent
    >>> from light_labyrinth.dim2 import LightLabyrinthClassifier
    >>> model = LightLabyrinthClassifier(3, 3,
    ...                             optimizer=GradientDescent(0.001, 0.9))
    """

    def __init__(self, learning_rate=0.01, momentum=0.0):
        super().__init__("Gradient_Descent", [learning_rate, momentum])
        self._learning_rate = learning_rate
        self._momentum = momentum

    @property
    def learning_rate(self):
        return self._learning_rate

    @property
    def momentum(self):
        return self._momentum

Ancestors

  • light_labyrinth.hyperparams.optimization._optimization._OptimizerBase

Instance variables

var learning_rate
Expand source code
@property
def learning_rate(self):
    return self._learning_rate
var momentum
Expand source code
@property
def momentum(self):
    return self._momentum
class Nadam (learning_rate=0.01, beta1=0.9, beta2=0.999, epsilon=1e-06)

Nadam (Nesterov-accelerated Adaptive Moment Estimation) optimizer class for learning Light Labyrinth models.

A modified version of the Adam optimizer. At each iteration k model's weights W_k are updated following the formula: W_{k+1} = W_k - \frac{\alpha}{\sqrt{\hat{v_k}}+\epsilon}\Big(\beta_1\hat{m_k} + \frac{1-\beta_1}{1-\beta_1^k}\nabla\xi(W_{k}, X, y)\Big) For further details see Adam optimizer.

Parameters


learning_rate : float, default=0.01
The learning rate \alpha – a positive constant (usually not greater than 1.0) that controls the magnitude of steps taken in each iteration, and effectively the learning speed. Note that too high learning rate may lead to overshooting.
beta1 : float, default=0.9
The decaying factor \beta_1 for the first-order momentum.
beta2 : float, default=0.999
The decaying factor \beta_2 for the second-order momentum.
epsilon : float, default=1e-6
A smoothing term that avoids division by zero.

Attributes


learning_rate : float
The learning rate.
beta1 : float
The decaying factor.
beta2 : float
The decaying factor.
epsilon : float
A smoothing term.

References


Sebastian Ruder "An overview of gradient descent optimization algorithms", CoRR (2016) http://arxiv.org/abs/1609.04747

See Also

Adam
Adam optimization algorithm.
RMSprop
RMSprop optimization algorithm.

Examples

>>> from light_labyrinth.hyperparams.optimization import Nadam
>>> from light_labyrinth.dim2 import LightLabyrinthClassifier
>>> model = LightLabyrinthClassifier(3, 3,
...                             optimizer=Nadam(0.001))
Expand source code
class Nadam(_OptimizerBase):
    """
    Nadam (Nesterov-accelerated Adaptive Moment Estimation) optimizer class for learning Light Labyrinth models.

    A modified version of the Adam optimizer. At each iteration \\(k\\) model's weights \\(W_k\\) are updated following the formula:
    \\[W_{k+1} = W_k - \\frac{\\alpha}{\\sqrt{\\hat{v_k}}+\\epsilon}\\Big(\\beta_1\\hat{m_k} + \\frac{1-\\beta_1}{1-\\beta_1^k}\\nabla\\xi(W_{k}, X, y)\\Big)\\]
    For further details see `light_labyrinth.hyperparams.optimization.Adam` optimizer.

    Parameters
    ----------
    ----------
    learning_rate : float, default=0.01
        The learning rate \\(\\alpha\\) -- a positive constant (usually not greater than 1.0) that controls the magnitude of steps taken in each iteration,
        and effectively the learning speed. Note that too high learning rate may lead to overshooting.

    beta1 : float, default=0.9
        The decaying factor \\(\\beta_1\\) for the first-order momentum.

    beta2 : float, default=0.999
        The decaying factor \\(\\beta_2\\) for the second-order momentum.

    epsilon : float, default=1e-6
        A smoothing term that avoids division by zero.

    Attributes
    ----------
    ----------
    learning_rate : float
        The learning rate.

    beta1 : float
        The decaying factor.

    beta2 : float
        The decaying factor.

    epsilon : float
        A smoothing term.

    References
    ----------
    ----------
    Sebastian Ruder
        "An overview of gradient descent optimization algorithms", CoRR (2016)
        <http://arxiv.org/abs/1609.04747>

    See Also
    --------
    light_labyrinth.hyperparams.optimization.Adam : Adam optimization algorithm.
    light_labyrinth.hyperparams.optimization.RMSprop : RMSprop optimization algorithm.

    Examples
    --------
    >>> from light_labyrinth.hyperparams.optimization import Nadam
    >>> from light_labyrinth.dim2 import LightLabyrinthClassifier
    >>> model = LightLabyrinthClassifier(3, 3,
    ...                             optimizer=Nadam(0.001))
    """

    def __init__(self, learning_rate=0.01, beta1=0.9, beta2=0.999, epsilon=1e-6):
        super().__init__("Nadam", [learning_rate, beta1, beta2, epsilon])
        self._learning_rate = learning_rate
        self._beta1 = beta1
        self._beta2 = beta2
        self._epsilon = epsilon

    @property
    def learning_rate(self):
        return self._learning_rate

    @property
    def beta1(self):
        return self._beta1

    @property
    def beta2(self):
        return self._beta2

    @property
    def epsilon(self):
        return self._epsilon

Ancestors

  • light_labyrinth.hyperparams.optimization._optimization._OptimizerBase

Instance variables

var beta1
Expand source code
@property
def beta1(self):
    return self._beta1
var beta2
Expand source code
@property
def beta2(self):
    return self._beta2
var epsilon
Expand source code
@property
def epsilon(self):
    return self._epsilon
var learning_rate
Expand source code
@property
def learning_rate(self):
    return self._learning_rate
class RMSprop (learning_rate=0.01, rho=0.9, momentum=0.0, epsilon=1e-06)

RMSprop optimizer class for learning Light Labyrinth models.

In each iteration k of the learning process, the loss function's gradient \nabla\xi(W_{k}, X, y) is computed, and the model's weights W_k are updated following the formula: v_{k} = \rho v_{k-1} + (1 - \rho) (\nabla\xi(W_{k}, X, y))^2 \\ \Delta W_{k} = -\gamma \Delta W_{k-1} + \frac{\alpha}{\sqrt{v_{k+1} + \epsilon}}\nabla\xi(W_{k}, X, y) \\ W_{k+1} = W_{k} - \Delta W_{k} Where \alpha>0 is the learning rate, \rho \in (0,1) is the forgetting factor, and \gamma \in [0,1) a momentum coefficient. The \epsilon>0 term ensures numerical stability and should not be too big.

Parameters


learning_rate : float, default=0.01
The learning rate \alpha – a positive constant (usually not greater than 1.0) that controls the magnitude of steps taken in each iteration, and effectively the learning speed. Note that too high learning rate may lead to overshooting.
rho : float, default=0.9
The forgetting factor \rho \in (0,1).
momentum : float, default=0.0
The momentum coefficient \gamma \in [0,1).
epsilon : float, default=1e-6
A smoothing term that avoids division by zero.

Attributes


learning_rate : float
The learning rate.
rho : float
The forgetting factor.
momentum : float
The momentum coefficient.
epsilon : float
A smoothing term.

References


Sebastian Ruder "An overview of gradient descent optimization algorithms", CoRR (2016) http://arxiv.org/abs/1609.04747

See Also

Adam
Adam optimization algorithm.
GradientDescent
GradientDescent optimization algorithm.

Examples

>>> from light_labyrinth.hyperparams.optimization import RMSprop
>>> from light_labyrinth.dim2 import LightLabyrinthClassifier
>>> model = LightLabyrinthClassifier(3, 3,
...                             optimizer=RMSprop(0.001))
Expand source code
class RMSprop(_OptimizerBase):
    """
    RMSprop optimizer class for learning Light Labyrinth models.

    In each iteration \\(k\\) of the learning process, the loss function's gradient \\(\\nabla\\xi(W_{k}, X, y)\\) is computed, and the model's weights \\(W_k\\) are updated following the formula:
    \\[v_{k} = \\rho v_{k-1}  + (1 - \\rho) (\\nabla\\xi(W_{k}, X, y))^2 \\\\
    \\Delta W_{k} = -\\gamma \\Delta W_{k-1} + \\frac{\\alpha}{\\sqrt{v_{k+1} + \\epsilon}}\\nabla\\xi(W_{k}, X, y) \\\\
    W_{k+1} = W_{k} - \\Delta W_{k}\\]
    Where \\(\\alpha>0\\) is the learning rate, \\(\\rho \\in (0,1)\\) is the forgetting factor, and \\(\\gamma \\in [0,1)\\) a momentum coefficient.
    The \\(\\epsilon>0\\) term ensures numerical stability and should not be too big. 


    Parameters
    ----------
    ----------
    learning_rate : float, default=0.01
        The learning rate \\(\\alpha\\) -- a positive constant (usually not greater than 1.0) that controls the magnitude of steps taken in each iteration,
        and effectively the learning speed. Note that too high learning rate may lead to overshooting.

    rho : float, default=0.9
        The forgetting factor \\(\\rho \\in (0,1)\\).

    momentum : float, default=0.0
        The momentum coefficient \\(\\gamma \\in [0,1)\\).

    epsilon : float, default=1e-6
        A smoothing term that avoids division by zero.

    Attributes
    ----------
    ----------
    learning_rate : float
        The learning rate.

    rho : float
        The forgetting factor.

    momentum : float
        The momentum coefficient.

    epsilon : float
        A smoothing term.

    References
    ----------
    ----------
    Sebastian Ruder
        "An overview of gradient descent optimization algorithms", CoRR (2016)
        <http://arxiv.org/abs/1609.04747>

    See Also
    --------
    light_labyrinth.hyperparams.optimization.Adam : Adam optimization algorithm.
    light_labyrinth.hyperparams.optimization.GradientDescent : GradientDescent optimization algorithm.

    Examples
    --------
    >>> from light_labyrinth.hyperparams.optimization import RMSprop
    >>> from light_labyrinth.dim2 import LightLabyrinthClassifier
    >>> model = LightLabyrinthClassifier(3, 3,
    ...                             optimizer=RMSprop(0.001))
    """

    def __init__(self, learning_rate=0.01, rho=0.9, momentum=0.0, epsilon=1e-6):
        super().__init__("RMSprop", [learning_rate, rho, momentum, epsilon])
        self._learning_rate = learning_rate
        self._rho = rho
        self._momentum = momentum
        self._epsilon = epsilon

    @property
    def learning_rate(self):
        return self._learning_rate

    @property
    def rho(self):
        return self._rho

    @property
    def momentum(self):
        return self._momentum

    @property
    def epsilon(self):
        return self._epsilon

Ancestors

  • light_labyrinth.hyperparams.optimization._optimization._OptimizerBase

Instance variables

var epsilon
Expand source code
@property
def epsilon(self):
    return self._epsilon
var learning_rate
Expand source code
@property
def learning_rate(self):
    return self._learning_rate
var momentum
Expand source code
@property
def momentum(self):
    return self._momentum
var rho
Expand source code
@property
def rho(self):
    return self._rho