Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
None
Description
There's a crash that can happen in madlib_keras_fit_multiple (and probably fit as well but I haven't tested it), when the loss ends up becoming nan for a model.
$$loss='categorical_crossentropy',optimizer='SGD(lr=0.05, momentum=1.1)',metrics=['accuracy']$$
Clearly, this was not a great choice for the momentum hyperparameter, but keras does accept it and trains through all the way with no errors or exceptions. The problem is, the loss ends up becoming infinite (or undefined?) at some point. All 8 models trained for 10 hours, printed out the results, and then madlib_keras_fit_multiple crashed while trying to write out the final info table:
Training set after iteration 1:
mst_key=7: metric=0.446168005466, loss=2.39643478394
mst_key=12: metric=0.00999999977648, loss=nan}}
mst_key=11: metric=0.165068000555, loss=4.0407166481}}
...
Validation set after iteration 1:
mst_key=7: metric=0.359100013971, loss=2.89618015289
mst_key=12: metric=0.00999999977648, loss=nan
mst_key=11: metric=0.151299998164, loss=4.0829615593}}
...
CONTEXT: PL/Python function "madlib_keras_fit_multiple_model"
psql:run_fit_mult100.sql:14:
ERROR: spiexceptions.UndefinedColumn: column "nan" does not exist
LINE 4: training_loss_final = nan,
^
QUERY:
UPDATE places100_mult_model_444_july7_info SET
training_metrics_final = 0.00999999977648,
training_loss_final = nan,
metrics_elapsed_time = ARRAY[33260.02720808983],
training_metrics = ARRAY[0.009999999776482582],
training_loss = ARRAY[nan]
WHERE mst_key = 12
CONTEXT: Traceback (most recent call last):
PL/Python function "madlib_keras_fit_multiple_model", line 23, in <module>
fit_obj = madlib_keras_fit_multiple_model.FitMultipleModel(**globals())
PL/Python function "madlib_keras_fit_multiple_model", line 42, in wrapper
PL/Python function "madlib_keras_fit_multiple_model", line 195, in _init_
PL/Python function "madlib_keras_fit_multiple_model", line 543, in insert_info_table
PL/Python function "madlib_keras_fit_multiple_model", line 539, in update_info_table
PL/Python function "madlib_keras_fit_multiple_model"