Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-2856

AvroStorage doesn't load files in the directories when a glob pattern matches both files and directories.

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.11
    • Fix Version/s: 0.11
    • Component/s: piggybank
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      This is a regression from PIG-2492.

      When a glob pattern such as '*' matches not only files but also directories, AvroStorage does not load files in the directories. This is a bug in getAllSubDirs() that can be fixed as follows:

      static boolean getAllSubDirs(Path path, Job job, Set<Path> paths)
      ...
      FileStatus[] matchedFiles = fs.globStatus(path, PATH_FILTER);
      ...
      for (FileStatus file : matchedFiles) {
          if (file.isDir()) {
      -        for (FileStatus sub : fs.listStatus(path)) {
      +        for (FileStatus sub : fs.listStatus(file.getPath())) {
                  getAllSubDirs(sub.getPath(), job, paths);
              }
          }
      }
      

        Attachments

        1. PIG-2856.patch
          2 kB
          Cheolsoo Park
        2. PIG-2856-2.patch
          3 kB
          Cheolsoo Park

          Activity

            People

            • Assignee:
              cheolsoo Cheolsoo Park
              Reporter:
              cheolsoo Cheolsoo Park
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: