Hyperparameter Tuning
Overview
Tuning a model often requires exploring the impact of changes to many hyperparameters. The best way to approach this is generally not by changing the source code of the training script as we did above, but instead by defining flags for key parameters then training over the combinations of those flags to determine which combination of flags yields the best model.
Training Flags
Here’s a declaration of 2 flags that control dropout rate within a model:
flag_numeric("dropout1", 0.4), flag_numeric("dropout2", 0.3)These flags are then used in the definition of the model here:
model %>% layer_dense(units = 128, activation = 'relu', input_shape = c(784)) %>% layer_dropout(rate = FLAGS$dropout1) %>% layer_dense(units = 128, activation = 'relu') %>% layer_dropout(rate = FLAGS$dropout2) %>%Once we’ve defined flags, we can pass alternate flag values to training_run()
as follows:
You aren’t required to specify all of the flags (any flags excluded will simply use their default value).
Flags make it very straightforward to systematically explore the impact of changes to hyperparameters on model performance, for example:
Flag values are automatically included in run data with a “flag_” prefix (e.g. flag_dropout1
, flag_dropout2
).
See the article on training flags for additional documentation on using flags.
Tuning Runs
Above we demonstrated writing a loop to call training_run()
with various different flag values. A better way to accomplish this is the tuning_run()
function, which allows you to specify multiple values for each flag, and executes training runs for all combinations of the specified flags. For example:
Data frame: 9 x 28
run_dir eval_loss eval_acc metric_loss metric_acc metric_val_loss metric_val_acc
9 runs/2018-01-26T13-21-03Z 0.1002 0.9817 0.0346 0.9900 0.1086 0.9794
6 runs/2018-01-26T13-23-26Z 0.1133 0.9799 0.0409 0.9880 0.1236 0.9778
5 runs/2018-01-26T13-24-11Z 0.1056 0.9796 0.0613 0.9826 0.1119 0.9777
4 runs/2018-01-26T13-24-57Z 0.1098 0.9788 0.0868 0.9770 0.1071 0.9771
2 runs/2018-01-26T13-26-28Z 0.1185 0.9783 0.0688 0.9819 0.1150 0.9783
3 runs/2018-01-26T13-25-43Z 0.1238 0.9782 0.0431 0.9883 0.1246 0.9779
8 runs/2018-01-26T13-21-53Z 0.1064 0.9781 0.0539 0.9843 0.1086 0.9795
7 runs/2018-01-26T13-22-40Z 0.1043 0.9778 0.0796 0.9772 0.1094 0.9777
1 runs/2018-01-26T13-27-14Z 0.1330 0.9769 0.0957 0.9744 0.1304 0.9751
# ... with 21 more columns:
# flag_batch_size, flag_dropout1, flag_dropout2, samples, validation_samples, batch_size,
# epochs, epochs_completed, metrics, model, loss_function, optimizer, learning_rate, script,
# start, end, completed, output, source_code, context, type
Note that the tuning_run()
function returns a data frame containing a summary of all of the executed training runs.
Experiment Scopes
By default all runs go into the “runs” sub-directory of the current working directory. For various types of ad-hoc experimentation this works well, but in some cases for a tuning run you may want to create a separate directory scope.
You can do this by specifying the runs_dir
argument:
Data frame: 9 x 28
run_dir eval_acc eval_loss metric_loss metric_acc metric_val_loss metric_val_acc
9 dropout_tuning/2018-01-26T13-38-02Z 0.9803 0.0980 0.0324 0.9902 0.1096 0.9789
6 dropout_tuning/2018-01-26T13-40-40Z 0.9795 0.1243 0.0396 0.9885 0.1341 0.9784
2 dropout_tuning/2018-01-26T13-43-55Z 0.9791 0.1138 0.0725 0.9813 0.1205 0.9773
7 dropout_tuning/2018-01-26T13-39-49Z 0.9786 0.1027 0.0796 0.9778 0.1053 0.9761
3 dropout_tuning/2018-01-26T13-43-08Z 0.9784 0.1206 0.0479 0.9871 0.1246 0.9775
4 dropout_tuning/2018-01-26T13-42-21Z 0.9784 0.1026 0.0869 0.9766 0.1108 0.9769
5 dropout_tuning/2018-01-26T13-41-31Z 0.9783 0.1086 0.0589 0.9832 0.1216 0.9764
8 dropout_tuning/2018-01-26T13-38-57Z 0.9780 0.1007 0.0511 0.9855 0.1100 0.9771
1 dropout_tuning/2018-01-26T13-44-41Z 0.9770 0.1178 0.1017 0.9734 0.1244 0.9757
# ... with 21 more columns:
# flag_batch_size, flag_dropout1, flag_dropout2, samples, validation_samples, batch_size, epochs,
# epochs_completed, metrics, model, loss_function, optimizer, learning_rate, script, start, end,
# completed, output, source_code, context, type
Sampling Flag Combinations
If the number of flag combinations is very large, you can also specify that only a random sample of combinations should be tried using the sample
parmaeter. For example: