Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-13572

HiveContext reads avro Hive tables incorrectly

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 1.5.2, 1.6.0, 1.6.1
    • None
    • PySpark
    • Hive 0.13.1, Spark 1.5.2

    Description

      I am using PySpark to read avro-based tables from Hive and while the avro tables can be read, some of the columns are incorrectly read - showing value None instead of the actual value.

      >>> results_df = sqlContext.sql("""SELECT * FROM trmdw_prod.opsconsole_ingest where year=2016 and month=2 and day=29 limit 3""")
      >>> results_df.take(3)
      [Row(kafkaoffsetgeneration=None, kafkapartition=None, kafkaoffset=None, uuid=None, mid=None, iid=None, product=None, utctime=None, statcode=None, statvalue=None, displayname=None, category=None, source_filename=u'ops-20160228_23_35_01.gz', year=2016, month=2, day=29),
       Row(kafkaoffsetgeneration=None, kafkapartition=None, kafkaoffset=None, uuid=None, mid=None, iid=None, product=None, utctime=None, statcode=None, statvalue=None, displayname=None, category=None, source_filename=u'ops-20160228_23_35_01.gz', year=2016, month=2, day=29),
       Row(kafkaoffsetgeneration=None, kafkapartition=None, kafkaoffset=None, uuid=None, mid=None, iid=None, product=None, utctime=None, statcode=None, statvalue=None, displayname=None, category=None, source_filename=u'ops-20160228_23_35_01.gz', year=2016, month=2, day=29)]
      

      Observe the None values at most of the fields. Surprisingly not all fields, only some of them are showing None instead of the real values. The table definition does not show anything specific about these columns.

      Running the same query in Hive:

      c:hive2://xyz.com:100> SELECT * FROM trmdw_prod.opsconsole_ingest where year=2016 and month=2 and day=29 limit 3;
      +------------------------------------------+-----------------------------------+--------------------------------+-----------------------------------+---------------------------------------+---------------------------------------+----------------------------+----------------------------+-----------------------------+------------------------------+--------------------------------+-----------------------------+------------------------------------+-------------------------+--------------------------+------------------------+--+
      | opsconsole_ingest.kafkaoffsetgeneration  | opsconsole_ingest.kafkapartition  | opsconsole_ingest.kafkaoffset  |      opsconsole_ingest.uuid       |         opsconsole_ingest.mid         |         opsconsole_ingest.iid         | opsconsole_ingest.product  | opsconsole_ingest.utctime  | opsconsole_ingest.statcode  | opsconsole_ingest.statvalue  | opsconsole_ingest.displayname  | opsconsole_ingest.category  | opsconsole_ingest.source_filename  | opsconsole_ingest.year  | opsconsole_ingest.month  | opsconsole_ingest.day  |
      +------------------------------------------+-----------------------------------+--------------------------------+-----------------------------------+---------------------------------------+---------------------------------------+----------------------------+----------------------------+-----------------------------+------------------------------+--------------------------------+-----------------------------+------------------------------------+-------------------------+--------------------------+------------------------+--+
      | 11.0                                     | 0.0                               | 3.83399394E8                   | EF0D03C409681B98646F316CA1088973  | 174f53fb-ca9b-d3f9-64e1-7631bf906817  | 00000000-0000-0000-0000-000000000000  | est                        | 2016-01-13T06:58:19        | 8                           | 3.0 SP11 (8.110.7601.18923)  | MSXML 3.0 Version              | PC Information              | ops-20160228_23_35_01.gz           | 2016                    | 2                        | 29                     |
      | 11.0                                     | 0.0                               | 3.83399395E8                   | EF0D03C409681B98646F316CA1088973  | 174f53fb-ca9b-d3f9-64e1-7631bf906817  | 00000000-0000-0000-0000-000000000000  | est                        | 2016-01-13T06:58:19        | 2                           | GenuineIntel                 | CPU Vendor                     | PC Information              | ops-20160228_23_35_01.gz           | 2016                    | 2                        | 29                     |
      | 11.0                                     | 0.0                               | 3.83399396E8                   | EF0D03C409681B98646F316CA1088973  | 174f53fb-ca9b-d3f9-64e1-7631bf906817  | 00000000-0000-0000-0000-000000000000  | est                        | 2016-01-13T06:58:19        | 141                         | 4                            | Screens                        | PC Information              | ops-20160228_23_35_01.gz           | 2016                    | 2                        | 29                     |
      +------------------------------------------+-----------------------------------+--------------------------------+-----------------------------------+---------------------------------------+---------------------------------------+----------------------------+----------------------------+-----------------------------+------------------------------+--------------------------------+-----------------------------+------------------------------------+-------------------------+--------------------------+------------------------+--+
      3 rows selected (1.252 seconds)
      

      Attached shows that no error or warning logs are generated by Spark.

      Also the table definition is attached.

      Attachments

        1. logs
          12 kB
          Zoltan Fedor
        2. table_definition
          5 kB
          Zoltan Fedor

        Issue Links

          Activity

            People

              Unassigned Unassigned
              zoltan.fedor Zoltan Fedor
              Votes:
              6 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: