Hive
  1. Hive
  2. HIVE-3308

Mixing avro and snappy gives null values

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.10.0
    • Fix Version/s: 0.11.0
    • Component/s: Query Processor
    • Labels:
      None

      Description

      On default hive uses LazySimpleSerDe for output.
      When I now enable compression and "select count from avrotable" the output is a file with the .avro extension but this then will display null values since the file is in reality not an avro file but a file created by LazySimpleSerDe using compression so should be a .snappy file.
      This causes any job (exception select * from avrotable is that not truly a job) to show null values.
      If you use any serde other then avro you can temporarily fix this by setting "set hive.output.file.extension=.snappy" and it will correctly work again but this won't work on avro since it overwrites the hive.output.file.extension during initializing.

      When you dump the query result into a table with "create table bla as" you can rename the .avro file into .snappy and the "select from bla" will also magiacally work again.

      Input and Ouput serdes don't always match so when I use avro as an input format it should not set the hive.output.file.extension.
      Onces it's set all queries will use it and fail making the connection useless to reuse.

      1. HIVE-3308.patch2.txt
        5 kB
        Bennie Schut
      2. HIVE-3308.patch1.txt
        1 kB
        Bennie Schut

        Issue Links

          Activity

          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Patch Available Patch Available
          182d 1h 17m 1 Bennie Schut 25/Jan/13 15:10
          Patch Available Patch Available Resolved Resolved
          73d 9h 34m 1 Navis 09/Apr/13 01:45
          Resolved Resolved Closed Closed
          37d 20h 25m 1 Owen O'Malley 16/May/13 22:10
          Owen O'Malley made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Owen O'Malley made changes -
          Link This issue is duplicated by HIVE-4195 [ HIVE-4195 ]
          Hide
          Hudson added a comment -

          Integrated in Hive-trunk-hadoop2 #148 (See https://builds.apache.org/job/Hive-trunk-hadoop2/148/)
          HIVE-3308 Mixing avro and snappy gives null values (Bennie Schut via Navis) (Revision 1465849)

          Result = FAILURE
          navis : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1465849
          Files :

          • /hive/trunk/ql/src/test/queries/clientpositive/avro_compression_enabled.q
          • /hive/trunk/ql/src/test/results/clientpositive/avro_compression_enabled.q.out
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerDe.java
          Show
          Hudson added a comment - Integrated in Hive-trunk-hadoop2 #148 (See https://builds.apache.org/job/Hive-trunk-hadoop2/148/ ) HIVE-3308 Mixing avro and snappy gives null values (Bennie Schut via Navis) (Revision 1465849) Result = FAILURE navis : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1465849 Files : /hive/trunk/ql/src/test/queries/clientpositive/avro_compression_enabled.q /hive/trunk/ql/src/test/results/clientpositive/avro_compression_enabled.q.out /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerDe.java
          Hide
          Hudson added a comment -

          Integrated in Hive-trunk-h0.21 #2053 (See https://builds.apache.org/job/Hive-trunk-h0.21/2053/)
          HIVE-3308 Mixing avro and snappy gives null values (Bennie Schut via Navis) (Revision 1465849)

          Result = FAILURE
          navis : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1465849
          Files :

          • /hive/trunk/ql/src/test/queries/clientpositive/avro_compression_enabled.q
          • /hive/trunk/ql/src/test/results/clientpositive/avro_compression_enabled.q.out
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerDe.java
          Show
          Hudson added a comment - Integrated in Hive-trunk-h0.21 #2053 (See https://builds.apache.org/job/Hive-trunk-h0.21/2053/ ) HIVE-3308 Mixing avro and snappy gives null values (Bennie Schut via Navis) (Revision 1465849) Result = FAILURE navis : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1465849 Files : /hive/trunk/ql/src/test/queries/clientpositive/avro_compression_enabled.q /hive/trunk/ql/src/test/results/clientpositive/avro_compression_enabled.q.out /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerDe.java
          Navis made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Fix Version/s 0.11.0 [ 12323587 ]
          Resolution Fixed [ 1 ]
          Hide
          Navis added a comment -

          Committed to trunk. Thanks, Bennie.

          Show
          Navis added a comment - Committed to trunk. Thanks, Bennie.
          Hide
          Bennie Schut added a comment -

          I would really appreciate someone committing this. It has tests showing the issue with correct results after the patch. It makes the serde more consistent with other serdes. Basically anyone using compression combined with avro will hit this bug like we see with HIVE-4195.

          Show
          Bennie Schut added a comment - I would really appreciate someone committing this. It has tests showing the issue with correct results after the patch. It makes the serde more consistent with other serdes. Basically anyone using compression combined with avro will hit this bug like we see with HIVE-4195 .
          Hide
          Jakob Homan added a comment -

          Will do.

          Show
          Jakob Homan added a comment - Will do.
          Ashutosh Chauhan made changes -
          Assignee Bennie Schut [ bennies ]
          Hide
          Ashutosh Chauhan added a comment -

          Jakob Homan Would you like to review this patch?

          Show
          Ashutosh Chauhan added a comment - Jakob Homan Would you like to review this patch?
          Bennie Schut made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Bennie Schut made changes -
          Attachment HIVE-3308.patch2.txt [ 12566516 ]
          Bennie Schut made changes -
          Field Original Value New Value
          Attachment HIVE-3308.patch1.txt [ 12538157 ]
          Hide
          Bennie Schut added a comment -

          Added a test to show the problem.
          Result of the test will show:

                1. A masked pattern was here ####
                  POSTHOOK: query: select count from src
                  POSTHOOK: type: QUERY
                  POSTHOOK: Input: default@src
                2. A masked pattern was here ####
                  NULL

          But should show something like:

                1. A masked pattern was here ####
                  POSTHOOK: query: select count from src
                  POSTHOOK: type: QUERY
                  POSTHOOK: Input: default@src
                2. A masked pattern was here ####
                  500
          Show
          Bennie Schut added a comment - Added a test to show the problem. Result of the test will show: A masked pattern was here #### POSTHOOK: query: select count from src POSTHOOK: type: QUERY POSTHOOK: Input: default@src A masked pattern was here #### NULL But should show something like: A masked pattern was here #### POSTHOOK: query: select count from src POSTHOOK: type: QUERY POSTHOOK: Input: default@src A masked pattern was here #### 500
          Bennie Schut created issue -

            People

            • Assignee:
              Bennie Schut
              Reporter:
              Bennie Schut
            • Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development