Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Incomplete
-
1.5.2, 1.6.0, 1.6.1
-
None
-
Hive 0.13.1, Spark 1.5.2
Description
I am using PySpark to read avro-based tables from Hive and while the avro tables can be read, some of the columns are incorrectly read - showing value None instead of the actual value.
>>> results_df = sqlContext.sql("""SELECT * FROM trmdw_prod.opsconsole_ingest where year=2016 and month=2 and day=29 limit 3""") >>> results_df.take(3) [Row(kafkaoffsetgeneration=None, kafkapartition=None, kafkaoffset=None, uuid=None, mid=None, iid=None, product=None, utctime=None, statcode=None, statvalue=None, displayname=None, category=None, source_filename=u'ops-20160228_23_35_01.gz', year=2016, month=2, day=29), Row(kafkaoffsetgeneration=None, kafkapartition=None, kafkaoffset=None, uuid=None, mid=None, iid=None, product=None, utctime=None, statcode=None, statvalue=None, displayname=None, category=None, source_filename=u'ops-20160228_23_35_01.gz', year=2016, month=2, day=29), Row(kafkaoffsetgeneration=None, kafkapartition=None, kafkaoffset=None, uuid=None, mid=None, iid=None, product=None, utctime=None, statcode=None, statvalue=None, displayname=None, category=None, source_filename=u'ops-20160228_23_35_01.gz', year=2016, month=2, day=29)]
Observe the None values at most of the fields. Surprisingly not all fields, only some of them are showing None instead of the real values. The table definition does not show anything specific about these columns.
Running the same query in Hive:
c:hive2://xyz.com:100> SELECT * FROM trmdw_prod.opsconsole_ingest where year=2016 and month=2 and day=29 limit 3; +------------------------------------------+-----------------------------------+--------------------------------+-----------------------------------+---------------------------------------+---------------------------------------+----------------------------+----------------------------+-----------------------------+------------------------------+--------------------------------+-----------------------------+------------------------------------+-------------------------+--------------------------+------------------------+--+ | opsconsole_ingest.kafkaoffsetgeneration | opsconsole_ingest.kafkapartition | opsconsole_ingest.kafkaoffset | opsconsole_ingest.uuid | opsconsole_ingest.mid | opsconsole_ingest.iid | opsconsole_ingest.product | opsconsole_ingest.utctime | opsconsole_ingest.statcode | opsconsole_ingest.statvalue | opsconsole_ingest.displayname | opsconsole_ingest.category | opsconsole_ingest.source_filename | opsconsole_ingest.year | opsconsole_ingest.month | opsconsole_ingest.day | +------------------------------------------+-----------------------------------+--------------------------------+-----------------------------------+---------------------------------------+---------------------------------------+----------------------------+----------------------------+-----------------------------+------------------------------+--------------------------------+-----------------------------+------------------------------------+-------------------------+--------------------------+------------------------+--+ | 11.0 | 0.0 | 3.83399394E8 | EF0D03C409681B98646F316CA1088973 | 174f53fb-ca9b-d3f9-64e1-7631bf906817 | 00000000-0000-0000-0000-000000000000 | est | 2016-01-13T06:58:19 | 8 | 3.0 SP11 (8.110.7601.18923) | MSXML 3.0 Version | PC Information | ops-20160228_23_35_01.gz | 2016 | 2 | 29 | | 11.0 | 0.0 | 3.83399395E8 | EF0D03C409681B98646F316CA1088973 | 174f53fb-ca9b-d3f9-64e1-7631bf906817 | 00000000-0000-0000-0000-000000000000 | est | 2016-01-13T06:58:19 | 2 | GenuineIntel | CPU Vendor | PC Information | ops-20160228_23_35_01.gz | 2016 | 2 | 29 | | 11.0 | 0.0 | 3.83399396E8 | EF0D03C409681B98646F316CA1088973 | 174f53fb-ca9b-d3f9-64e1-7631bf906817 | 00000000-0000-0000-0000-000000000000 | est | 2016-01-13T06:58:19 | 141 | 4 | Screens | PC Information | ops-20160228_23_35_01.gz | 2016 | 2 | 29 | +------------------------------------------+-----------------------------------+--------------------------------+-----------------------------------+---------------------------------------+---------------------------------------+----------------------------+----------------------------+-----------------------------+------------------------------+--------------------------------+-----------------------------+------------------------------------+-------------------------+--------------------------+------------------------+--+ 3 rows selected (1.252 seconds)
Attached shows that no error or warning logs are generated by Spark.
Also the table definition is attached.
Attachments
Attachments
Issue Links
- Blocked
-
SPARK-13709 Spark unable to decode Avro when partitioned
- Resolved