Toggle search
Search
Toggle menu
notifications
Toggle personal menu
Editing
Learning Rate
(section)
From llamawiki.ai
Views
Read
Edit
Edit source
View history
associated-pages
Page
Discussion
More actions
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Learning Rate Scheduling == '''Learning rate scheduling''' or '''learning rate decay''' is a strategy to adjust the learning rate during the training process. The idea is to start with a relatively high learning rate to benefit from fast learning, and then reduce the learning rate as training progresses to allow the model to converge. There are several strategies for learning rate scheduling: * '''Step Decay''': The learning rate is reduced by a factor after a fixed number of epochs. * '''Exponential Decay''': The learning rate is reduced exponentially, often after each batch or epoch. * '''1/t Decay''': The learning rate is reduced as the inverse of the square root of the epoch number. * '''Warm-up and Cool-down periods''': The learning rate is initially increased for a certain number of epochs (warm-up), and then gradually decreased towards the end of training (cool-down). The choice of learning rate and schedule can greatly impact the quality of the final model and is often determined through experimentation and [[Hyperparameter tuning]].
Summary:
Please note that all contributions to llamawiki.ai may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
LlamaWiki:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)