Hive
  1. Hive
  2. HIVE-5700

enforce single date format for partition column storage

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.13.0
    • Component/s: None
    • Labels:
      None

      Description

      "inspired" by HIVE-5286.
      Partition column for dates should be stored as either integer, or as fixed representation e.g. yyyy-mm-dd. External representation can remain varied as is.

      1. HIVE-5700.patch
        13 kB
        Sergey Shelukhin
      2. HIVE-5700.02.patch
        18 kB
        Sergey Shelukhin
      3. HIVE-5700.01.patch
        15 kB
        Sergey Shelukhin

        Issue Links

          Activity

          Hide
          Sergey Shelukhin added a comment -

          preliminary patch. There's one unresolved comment for myself, I haven't fully groked the insert processing yet, need to avoid normalizing if partition with "wrong" date format already exists for insert overwrite. But insert code path is rather convoluted. Ashutosh Chauhan can you take a look? This approach may allow us to use string compares for dates and not depend on data stores converting arbitrary strings accepted by java.
          Otherwise I don't think date pushdown into SQL is really viable/robust enough.

          Show
          Sergey Shelukhin added a comment - preliminary patch. There's one unresolved comment for myself, I haven't fully groked the insert processing yet, need to avoid normalizing if partition with "wrong" date format already exists for insert overwrite. But insert code path is rather convoluted. Ashutosh Chauhan can you take a look? This approach may allow us to use string compares for dates and not depend on data stores converting arbitrary strings accepted by java. Otherwise I don't think date pushdown into SQL is really viable/robust enough.
          Hide
          Sergey Shelukhin added a comment -

          Changed the normalization to always happen, since path should be separate from values in tables, so it should be ok for existing partitions. Added upgrade scripts. Still need to test Oracle script somehow, and maybe write a test

          Show
          Sergey Shelukhin added a comment - Changed the normalization to always happen, since path should be separate from values in tables, so it should be ok for existing partitions. Added upgrade scripts. Still need to test Oracle script somehow, and maybe write a test
          Hide
          Hive QA added a comment -

          Overall: +1 all checks pass

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12612776/HIVE-5700.01.patch

          SUCCESS: +1 4594 tests passed

          Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/205/testReport
          Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/205/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          

          This message is automatically generated.

          ATTACHMENT ID: 12612776

          Show
          Hive QA added a comment - Overall : +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12612776/HIVE-5700.01.patch SUCCESS: +1 4594 tests passed Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/205/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/205/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase This message is automatically generated. ATTACHMENT ID: 12612776
          Hide
          Ashutosh Chauhan added a comment -

          Can you create a RB entry for this ?

          Show
          Ashutosh Chauhan added a comment - Can you create a RB entry for this ?
          Show
          Sergey Shelukhin added a comment - https://reviews.apache.org/r/15359/
          Hide
          Ashutosh Chauhan added a comment -

          You also need to add script for derby. Also, if you can test your Oracle script, that will be good. Also, a -ve test case which rejects date like 2013-1-1 as partitioning column will be good to include.

          Show
          Ashutosh Chauhan added a comment - You also need to add script for derby. Also, if you can test your Oracle script, that will be good. Also, a -ve test case which rejects date like 2013-1-1 as partitioning column will be good to include.
          Hide
          Sergey Shelukhin added a comment -

          I was hoping to avoid writing Derby upgrade script... it;s not DB structure - do people really need to upgrade derby? Hmm.

          As for negative tests, the problem is that date validation on JDK6 is going to kick in first and reject the date literal before this code executes... the only way to allow it is to run JDK7, for example. Let me think about more isolated test

          Show
          Sergey Shelukhin added a comment - I was hoping to avoid writing Derby upgrade script... it;s not DB structure - do people really need to upgrade derby? Hmm. As for negative tests, the problem is that date validation on JDK6 is going to kick in first and reject the date literal before this code executes... the only way to allow it is to run JDK7, for example. Let me think about more isolated test
          Hide
          Sergey Shelukhin added a comment -

          fixed Oracle script, added simple unit test

          Show
          Sergey Shelukhin added a comment - fixed Oracle script, added simple unit test
          Hide
          Hive QA added a comment -

          Overall: +1 all checks pass

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12612963/HIVE-5700.02.patch

          SUCCESS: +1 4599 tests passed

          Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/227/testReport
          Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/227/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          

          This message is automatically generated.

          ATTACHMENT ID: 12612963

          Show
          Hive QA added a comment - Overall : +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12612963/HIVE-5700.02.patch SUCCESS: +1 4599 tests passed Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/227/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/227/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase This message is automatically generated. ATTACHMENT ID: 12612963
          Hide
          Ashutosh Chauhan added a comment -

          +1

          Show
          Ashutosh Chauhan added a comment - +1
          Hide
          Ashutosh Chauhan added a comment -

          Committed to trunk. Thanks, Sergey!

          Show
          Ashutosh Chauhan added a comment - Committed to trunk. Thanks, Sergey!
          Hide
          Brock Noland added a comment -

          There is a similar problem to HIVE-8730 on postgres. Sergey Shelukhin - I noticed that this change does not update PARTITION.PART_NAME. It seems that should be done as well?

          Show
          Brock Noland added a comment - There is a similar problem to HIVE-8730 on postgres. Sergey Shelukhin - I noticed that this change does not update PARTITION.PART_NAME . It seems that should be done as well?
          Hide
          Brock Noland added a comment -

          Sergey Shelukhin - in my setup, I have a pre-upgrade table with date partitions in the form of 20141218. Sure enough after upgrading PARTITION.PART_NAME is not updated and as such I cannot do operations like modify the partitions location. If I manually fix PARTITION.PART_NAME I am able to use the partition.

          Furthermore we get even more interesting results on postgres. I am setting up an environment to reproduce this and will share additional details tomorrow.

          Show
          Brock Noland added a comment - Sergey Shelukhin - in my setup, I have a pre-upgrade table with date partitions in the form of 20141218 . Sure enough after upgrading PARTITION.PART_NAME is not updated and as such I cannot do operations like modify the partitions location. If I manually fix PARTITION.PART_NAME I am able to use the partition. Furthermore we get even more interesting results on postgres. I am setting up an environment to reproduce this and will share additional details tomorrow.
          Hide
          Brock Noland added a comment -

          Sergey Shelukhin Ashutosh Chauhan - please see HIVE-9445 where I plan to revert this commit.

          Show
          Brock Noland added a comment - Sergey Shelukhin Ashutosh Chauhan - please see HIVE-9445 where I plan to revert this commit.

            People

            • Assignee:
              Sergey Shelukhin
              Reporter:
              Sergey Shelukhin
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development