Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-4160 Vectorized Query Execution in Hive
  3. HIVE-4758

NULLs and record separators broken with vectorization branch intermediate outputs

    XMLWordPrintableJSON

    Details

    • Release Note:
      Fix the NULL serialization and record separator insertion for VectorizedRowBatch.toString()

      Description

      Queries of type timestamp on partitioned tables return NULL for all rows of timestamp columns, if the first row in the column is NULL.

      This was tracked down to the failure of timestamp columns to parse the map output properly, which was due to differing format from the unvectorized code's output.

      The output file for vectorized code says

      (null)^A
      2013-02-12 21:05:29^A
      

      Where the unvectorized code outputs

      \N
      2013-02-12 21:05:29
      

      The vectorized code passes on the "(null)" string to the LazyTimestamp parser, which fails to parse it & returns "NULL", but slowed down massively by the IllegalArgumentException.

      And the extraneous ^A prevents the actual Timestamp from being parsed into valid timestamps.

        Attachments

        1. HIVE-4758-002.patch
          3 kB
          Gopal Vijayaraghavan
        2. HIVE-4758-001.patch
          3 kB
          Gopal Vijayaraghavan

          Activity

            People

            • Assignee:
              gopalv Gopal Vijayaraghavan
              Reporter:
              gopalv Gopal Vijayaraghavan
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: