Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-22936

NPE in SymbolicInputFormat

    XMLWordPrintableJSON

    Details

    • Flags:
      Patch

      Description

      Symptom

      I was running Hive over AWS S3 Inventory Report, which uses SymlinkTextInputFormat, and symlink file content is the FQDN S3 URL of each s3 file, like :

      s3://inventory-bucket/bucket1/2020-02-04-11-23-00-AEFEDDCE
      s3://inventory-bucket/bucket1/2020-02-05-11-23-00-BCEDCCDD

      When I have the following setting:

      set hive.rework.mapredwork=true;  
      

      The job fails with NullPointException, without stack trace.

      Cause

      The content of symlink may be arbitrary full qualified FS path, while SymbolicInputFormat uses the default FS instance to get the status of the data files, which fails (and returns null) when the schema of data file differs from Hive's default FS.

      Code point:
      https://github.com/apache/hive/blob/cfc12f05f0c034f9aad149960e58d40902e0dcfe/ql/src/java/org/apache/hadoop/hive/ql/io/SymbolicInputFormat.java#L78

                    // "fileSystem" may not be able to list status for given file uri.
                    FileStatus[] matches = fileSystem.globStatus(new Path(line));

      Fix

      Please check attached npe-symbolic-inputformat.patch

       

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                redisliu Redis Liu
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h