Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-5360

Pig sets working directory of input file systems causes exception thrown

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Patch Available
    • Minor
    • Resolution: Unresolved
    • 0.17.0
    • 0.18.0
    • impl
    • Patch

    Description

      In getSplits() method in PigInputFormat, Pig is trying to set the working directory of input File System to jobContext.getWorkingDirectory(), which is always the default working directory of default file system (eg. hdfs://host:port/user/userId in case of HDFS) unless “mapreduce.job.working.dir” is explicitly set to non-default value. So if the input path uses non-default file system, then it will fail since it is trying to set the working directory of non-default file system to a HDFS path.

      The proposed change is to completely remove this logic of setting working directory. There are several reasons for doing so.

      Firstly, getSplits() is only supposed to return a list of input splits. It should not have side effects (especially doing so can potentially change the output path). Having InputFormat changes OutputFormat does not make much sense here.

      Secondly, there is inconsistency between the working directories of input and output file systems. if "mapreduce.job.working.dir" is set to non-default value, it will affect the output path only (if it is a relative path) because input path will be made qualified even before this logic.

      Thirdly, there is already a "CD" functionality that allows customers to change the working directory. However, this logic will overwrite the "CD" functionality if input and output paths both use default file system.

      Lastly, if customer has a sequence of jobs, changing the working directory may change the input paths of downstream jobs if the input paths are specified as relative

      Attachments

        1. PIG-5360.diff
          13 kB
          Xuzhou Yin

        Activity

          People

            Unassigned Unassigned
            xuzhoyin Xuzhou Yin
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - 504h
                504h
                Remaining:
                Remaining Estimate - 504h
                504h
                Logged:
                Time Spent - Not Specified
                Not Specified