Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-635

Default value in Avro schema must match type of first union type

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • Impala 1.0.1, Impala 2.3.0
    • None
    • Backend

    Description

      SUMMARY
      If a default value is provided for a union-type Avro field (i.e. a union of "null" and some other type, since other unions are not supported by Impala), the default value must match the first type in the union. Otherwise Impala will return the following error when trying to query the table:

      Failed to parse table schema: Invalid JSON integer in json_t_to_avro_value_helper
      

      For example, the following field definition will produce this error:

      {"name": "i", "type": ["int", "null"], "default": null}
      

      This is technically not a bug since this is what the Avro spec dictates. However, it isn't very user-friendly.

      WORKAROUND
      Switch the order of the types in the union before writing the files. If you have existing files written with a problematic schema, you may need to rewrite those files with the fixed schema because Avro embeds the schema in the file.

      For example, the following field definition can be queried successfully:

      {"name": "i", "type": ["null", "int"], "default": null}
      

      Original description
      I have an Avro backed table. HIVE and the avro tools jar can read the files and IMPALA can describe the table. However selecting from the table in IMPALA causes the several deamons to crash?

      I1021 11:01:18.022570 8623 status.cc:44] Failed to parse file schema: Invalid JSON float in json_t_to_avro_value_helper
      @ 0x83af7d (unknown)
      @ 0x922a00 (unknown)
      @ 0x92309b (unknown)
      @ 0x95e44d (unknown)
      @ 0x910a8f (unknown)
      @ 0x90a680 (unknown)
      @ 0x9a36c4 (unknown)
      @ 0x3681c07851 (unknown)
      @ 0x36818e811d (unknown)
      I1021 11:01:18.030833 5229 progress-updater.cc:56] Query 9c4f2e4eebf1c7a9:811b8dc272d75e8a: 6% Complete (1951 out of 29457)

      My schema is

      {
      "type" : "record",
      "name" : "points",
      "fields" : [

      { "name" : "c1", "type" : [ "double", "null" ], "default" : null }

      ,

      { "name" : "c2", "type" : [ "string", "null" ], "default" : null }

      ,

      { "name" : "c3", "type" : [ "string", "null" ], "default" : null }

      ,

      { "name" : "c4", "type" : [ "string", "null" ], "default" : null }

      ,

      { "name" : "c5", "type" : [ "double", "null" ], "default" : null }

      ,

      { "name" : "c6", "type" : [ "double", "null" ], "default" : null }

      ,

      { "name" : "c7", "type" : [ "string", "null" ], "default" : null }

      ,

      { "name" : "c8", "type" : [ "string", "null" ], "default" : null }

      ,

      { "name" : "c9", "type" : [ "double", "null" ], "default" : null }

      ,

      { "name" : "c10", "type" : [ "double", "null" ], "default" : null }

      ,

      { "name" : "c11", "type" : [ "double", "null" ], "default" : null }

      ,

      { "name" : "c12", "type" : [ "double", "null" ], "default" : null }

      ,

      { "name" : "c13", "type" : [ "double", "null" ], "default" : null }

      ,

      { "name" : "c14", "type" : [ "double", "null" ], "default" : null }

      ,

      { "name" : "c15", "type" : [ "double", "null" ], "default" : null }

      ,

      { "name" : "c16", "type" : [ "double", "null" ], "default" : null }

      ,

      { "name" : "c17", "type" : [ "double", "null" ], "default" : null }

      ,

      { "name" : "c18", "type" : [ "double", "null" ], "default" : null }

      ,

      { "name" : "id1", "type" : "int" }

      ,

      { "name" : "id2", "type" : "int" }

      ,

      { "name" : "root_id", "type" : "string" }

      ]
      }

      Describing table in impala works, the table is partition by columns not in the avro files (flume creates the directories).

      Query: describe points
      Query finished, fetching results ...
      ---------------------------------------------------

      name type comment

      ---------------------------------------------------

      c1 double from deserializer
      c2 string from deserializer
      c3 string from deserializer
      c4 string from deserializer
      c5 double from deserializer
      c6 double from deserializer
      c7 string from deserializer
      c8 string from deserializer
      c9 double from deserializer
      c10 double from deserializer
      c11 double from deserializer
      c12 double from deserializer
      c13 double from deserializer
      c14 double from deserializer
      c15 double from deserializer
      c16 double from deserializer
      c17 double from deserializer
      c18 double from deserializer
      id1 int from deserializer
      id2 int from deserializer
      root_id string from deserializer
      deployment string  
      date_id int  
      hour int  
      q_strategy string  
      q_fund string  
      q_expiry string  

      ---------------------------------------------------
      Returned 27 row(s) in 29.33s

      Attachments

        Activity

          People

            Unassigned Unassigned
            nong_impala_60e1 Nong Li
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: