Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-357

Parquet-thrift generates wrong schema for Thrift binary fields

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.5.0, 1.6.0, 1.7.0, 1.8.0
    • Fix Version/s: 1.10.0
    • Component/s: parquet-mr
    • Labels:
      None

      Description

      Thrift doesn't have true BINARY type. The BINARY type is actually just an unencoded STRING. Quoted from Thrift Types section of official Thrift documentation:

      binary: a sequence of unencoded bytes

      N.B.: This is currently a specialized form of the string type above, added to provide better interoperability with Java. The current plan-of-record is to elevate this to a base type at some point.

      The consequence is that, Thrift BINARY and STRING are both passed to parquet-thrift as STRING, and are always encoded as BINARY (UTF8).

      This is really a problem on Thrift side. One possible workaround is to inspect binary fields in the actual generated Java classes to see whether the type is ByteBuffer.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                nkollar Nandor Kollar
                Reporter:
                lian cheng Cheng Lian
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: