Details
Description
After https://reviews.apache.org/r/38288 we are unable to upgrade scheduler in one of our clusters due to the following failure on restart:
### Cause: org.h2.jdbc.JdbcSQLException: Numeric value out of range: "3174400000031744"; SQL statement: INSERT INTO task_configs ( job_key_id, creator_user, service, num_cpus, ram_mb, disk_mb, priority, max_task_failures, production, contact_email, executor_name, executor_data, tier ) VALUES ( ( SELECT ID FROM job_keys WHERE role = ? AND environment = ? AND name = ? ), ?, ?, ?, ?, ?, ?, ?, ?, ?,
This appears due to type mismatch between TaskConfig.diskMb (i64) and task_configs.disk_mb (INT).
A possible real-life scenario:
- user creates a job with an oversized resource requirement and the job fails to schedule
- user realizes the mistake and attempts to correct it by running aurora update start
- scheduler creates an instance of the JobUpdate with the oversized TaskConfig as its initial state and persists it in the log
- scheduler restarts to a new version (with the patch above) and attempts to reload job updates from the log but now instead of storing TaskConfigs as binary blobs it attempts to insert into task_configs table where resource columns have narrower type.
Attachments
Issue Links
- blocks
-
AURORA-1495 Consider changing TaskConfig resource types to reflect reasonable values
- Resolved