Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-19155

Day time saving cause Druid inserts to fail with org.apache.hive.druid.io.druid.java.util.common.UOE: Cannot add overlapping segments

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 3.1.0, 3.0.0
    • Druid integration
    • None

    Description

      If you try to insert data around the daylight saving time hour the query fails with following exception

      2018-04-10T11:24:58,836 ERROR [065fdaa2-85f9-4e49-adaf-3dc14d51be90 main] exec.DDLTask: Failed
      org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hive.druid.io.druid.java.util.common.UOE: Cannot add overlapping segments [2015-03-08T05:00:00.000Z/2015-03-09T05:00:00.000Z and 2015-03-09T04:00:00.000Z/2015-03-10T04:00:00.000Z] with the same version [2018-04-10T11:24:48.388-07:00]
              at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:914) ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:919) ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4831) [hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:394) [hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205) [hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) [hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2443) [hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2114) [hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1797) [hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1538) [hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1532) [hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157) [hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:204) [hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
              at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239) [hive-cli-3.1.0-SNAPSHOT.jar:?]
              at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188) [hive-cli-3.1.0-SNAPSHOT.jar:?]
              at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402) [hive-cli-3.1.0-SNAPSHOT.jar:?]
              at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:335) [hive-cli-3.1.0-SNAPSHOT.jar:?]
              at org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1455) [hive-it-util-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:1429) [hive-it-util-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
              at org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:177) [hive-it-util-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
              at org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:104) [hive-it-util-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
              at org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver(TestMiniDruidCliDriver.java:59) [test-classes/:?]
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_92]
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_92]
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_92]
      

      You can reproduce this using the following DDL

      create database druid_test;
      use druid_test;
      
      create table test_table(`timecolumn` timestamp, `userid` string, `num_l` float);
      
      insert into test_table values ('2015-03-08 00:00:00', 'i1-start', 4);
      insert into test_table values ('2015-03-08 23:59:59', 'i1-end', 1);
      
      insert into test_table values ('2015-03-09 00:00:00', 'i2-start', 4);
      insert into test_table values ('2015-03-09 23:59:59', 'i2-end', 1);
      
      insert into test_table values ('2015-03-10 00:00:00', 'i3-start', 2);
      insert into test_table values ('2015-03-10 23:59:59', 'i3-end', 2);
      
      CREATE TABLE druid_table
      STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
      TBLPROPERTIES ("druid.segment.granularity" = "DAY")
      AS
      select cast(`timecolumn` as timestamp with local time zone) as `__time`, `userid`, `num_l` FROM test_table;
      

      The fix is to always adjust the Druid segments identifiers to UTC.

      Attachments

        1. HIVE-19155.patch
          35 kB
          Slim Bouguerra

        Activity

          People

            bslim Slim Bouguerra
            bslim Slim Bouguerra
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: