Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-21328

Call To Hadoop Text getBytes() Without Call to getLength()

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 4.0.0, 3.2.0
    • Fix Version/s: None
    • Component/s: Query Planning
    • Labels:
      None

      Description

      I'm not sure if there is actually a bug, but this looks highly suspect:

        public Object set(final Object o, final Text text) {
          return new BytesWritable(text == null ? null : text.getBytes());
        }
      

      https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/primitive/ParquetStringInspector.java#L104-L106

      There are two components to a Text object. There are the internal bytes and the length of the bytes. The two are independent. I.e., a quick "reset" on the Text object simply sets the internal length counter to zero. This code is potentially looking at obsolete data that it shouldn't be seeing because it is not considering the length of the Text.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              belugabehr David Mollitor
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: