Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-34816

Support for Parquet unsigned LogicalTypes

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.2.0
    • 3.2.0
    • SQL
    • None

    Description

      Parquet supports some unsigned datatypes. Here is the definition related in parquet.thrift

      /**
       * Common types used by frameworks(e.g. hive, pig) using parquet.  This helps map
       * between types in those frameworks to the base types in parquet.  This is only
       * metadata and not needed to read or write the data.
       */
      
        /**
         * An unsigned integer value.
         *
         * The number describes the maximum number of meaningful data bits in
         * the stored value. 8, 16 and 32 bit values are stored using the
         * INT32 physical type.  64 bit values are stored using the INT64
         * physical type.
         *
         */
        UINT_8 = 11;
        UINT_16 = 12;
        UINT_32 = 13;
        UINT_64 = 14;
      

      Spark does not support unsigned datatypes. In SPARK-10113, we emit an exception with a clear message for them.

      UInt8-[0:255]
      UInt16-[0:65535]
      UInt32-[0:4294967295]
      UInt64-[0:18446744073709551615]

      Unsigned types - may be used to produce smaller in-memory representations of the data. If the stored value is larger than the maximum allowed by int32 or int64, then the behavior is undefined.

      In this ticket, we try to read them as a higher precision signed type

      Attachments

        Issue Links

          There are no Sub-Tasks for this issue.

          Activity

            People

              Qin Yao Kent Yao 2
              Qin Yao Kent Yao 2
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: