Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-8197

Tez and Vectorization Insert into ORC Table with timestamp column erroneously repeats the last row's column value

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Cannot Reproduce
    • None
    • None
    • None
    • None
    • Tez and Vectorization.

    Description

      In diagnosing why a only a Tez and Vectorized query with min and max aggregates was always returning the last row read's column value, discovered the problem was in creating the test table....

      CREATE TABLE alltypesorc_string STORED AS ORC AS SELECT
        ctinyint as ctinyint,
        to_utc_timestamp(ctimestamp1, 'America/Los_Angeles') as ctimestamp1,
        CAST(to_utc_timestamp(ctimestamp1, 'America/Los_Angeles') AS STRING) as stimestamp1
      FROM alltypesorc WHERE ctinyint > 0
      LIMIT 40;
      

      I think it is related what Prasanth mentioned as a possibility: Saving a Timestamp as a Writable object that gets overwritten. One suspect is the Writable[] records array in VectorFileSinkOperator in the ProcessOp method. Or, perhaps it is in VectorReduceSinkOperator.

      Attachments

        Issue Links

          Activity

            People

              mmccline Matt McCline
              mmccline Matt McCline
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: