Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-19256

UDF which shapes the input data according to the specified schema

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • 3.1.0
    • None
    • Hive
    • None

    Description

      We use this UDF a lot in our org. This UDF takes an object and a Hive schema and make sure the output object matches the schema completely. In some respects it is similar to {{named
      _struct}} UDF which can be used to select columns from a struct, but it is more general since it can work not only on structs, but all Hive data types (expect union). Also the schema can provide certain valid type conversions (int -> double etc)

      One scenario where this is quite useful is making sure that the Hive view created with a specific schema will have columns which will always match that schema. In Hive today when a view is created, new nested columns from the underlying table can leak out from the view, even though the user never wanted this behavior. Note that this leaking of columns is only for nested columns and not for top level columns, so in that regard this behavior of Hive is inconsistent.

      Sample usage of the UDF

      generic_project(col, "struct<a:array<struct<c:int,d:string>>>") // Returning data which matches the input schema. Here extra columns which are not part of the input will be removed
      
      generic_project(col, "struct<a:double>") //  If the input column had a struct with col a as int . It would type cast 'a' to double.
      

      Attachments

        1. HIVE-19256_1.patch
          24 kB
          Ratandeep Ratti
        2. HIVE-19256.patch
          22 kB
          Ratandeep Ratti

        Activity

          People

            rdsr Ratandeep Ratti
            rdsr Ratandeep Ratti
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: