Pig
  1. Pig
  2. PIG-3617

problem with temp file deletion in MAPREDUCE operator

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.12.0
    • Fix Version/s: 0.13.0
    • Component/s: None
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      Hi all,
      When I run a native MR job with the MAPREDUCE keyword and store the intermediate data in HBase with:
      stored = MAPREDUCE 'my.jar'
      STORE x INTO 'hbase://temp_table'
      USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('hbase_schema')
      .... and the rest ....;

      Pig tries to delete the temp files, which in this case has an HBase path, and fails with the exception:

      Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:hbase:/temp_table
      at org.apache.hadoop.fs.Path.initialize(Path.java:148)
      at org.apache.hadoop.fs.Path.<init>(Path.java:126)
      at org.apache.pig.backend.hadoop.datastorage.HDataStorage.isContainer(HDataStorage.java:197)
      at org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:128)
      at org.apache.pig.impl.io.FileLocalizer.delete(FileLocalizer.java:415)
      at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:419)
      at org.apache.pig.PigServer.launchPlan(PigServer.java:1322)

      1. PIG-3617.patch
        1 kB
        Nezih Yigitbasi
      2. PIG-3617-2.patch
        1.0 kB
        Cheolsoo Park

        Activity

        Hide
        Nezih Yigitbasi added a comment -

        Guys, I plan to implement a solution that checks for an hbase prefix ("hbase://") in the fileSpec variable (see FileLocalizer.delete), but this method currently uses the DataStorage interfaces (either local or distributed) kept in PigContext to access the backend storage and as far as I can see there is no HBase specific implementation for the DataStorage interface. So what do you guys think is the right way to fix this issue? Is it implementing a DataStorage interface for HBase or simply use HBaseAdmin.deleteTable to delete the temp table?

        Show
        Nezih Yigitbasi added a comment - Guys, I plan to implement a solution that checks for an hbase prefix ("hbase://") in the fileSpec variable (see FileLocalizer.delete), but this method currently uses the DataStorage interfaces (either local or distributed) kept in PigContext to access the backend storage and as far as I can see there is no HBase specific implementation for the DataStorage interface. So what do you guys think is the right way to fix this issue? Is it implementing a DataStorage interface for HBase or simply use HBaseAdmin.deleteTable to delete the temp table?
        Hide
        Cheolsoo Park added a comment -

        Nezih Yigitbasi, sorry for the late reply.

        I see why you propose implementing a DataStorage interface for HBase, but that sounds like an overkill for this problem. Do you think that will be useful in the future for other things?

        In addition, how about handling an hbase prefix in MapReduceLauncher rather than in FileLocalizer? PIG-3592 fixes a similar issue, and it does in MapReducerLauncher. I think it is better to do storage-specific things in Launcher than in FileLocalizer because the latter currently assumes file system.

        Feel free to disagree with me. Thanks!

        Show
        Cheolsoo Park added a comment - Nezih Yigitbasi , sorry for the late reply. I see why you propose implementing a DataStorage interface for HBase, but that sounds like an overkill for this problem. Do you think that will be useful in the future for other things? In addition, how about handling an hbase prefix in MapReduceLauncher rather than in FileLocalizer? PIG-3592 fixes a similar issue, and it does in MapReducerLauncher. I think it is better to do storage-specific things in Launcher than in FileLocalizer because the latter currently assumes file system. Feel free to disagree with me. Thanks!
        Hide
        Nezih Yigitbasi added a comment -

        Cheolsoo, thanks for the reply. I also think that implementing a DataStorage interface for HBase is an overkill. So I implemented a simple fix to skip HBase paths during temporary file deletions in MapReduceLauncher like you proposed. Please review.

        Show
        Nezih Yigitbasi added a comment - Cheolsoo, thanks for the reply. I also think that implementing a DataStorage interface for HBase is an overkill. So I implemented a simple fix to skip HBase paths during temporary file deletions in MapReduceLauncher like you proposed. Please review.
        Hide
        Cheolsoo Park added a comment -

        Nezih Yigitbasi, thank you very much for the patch. It looks good, but can I make a minor suggestion?

        How about using Utils.hasFileSystemImpl() instead of checking whether a path starts with "hbase"? I think this approach is better because then other non-filesystem storages such as accumulo will also be protected from the same problem.

        PIG-3617-2.patch implements what I describe here. I can commit it if you agree. Thanks!

        Show
        Cheolsoo Park added a comment - Nezih Yigitbasi , thank you very much for the patch. It looks good, but can I make a minor suggestion? How about using Utils.hasFileSystemImpl() instead of checking whether a path starts with "hbase"? I think this approach is better because then other non-filesystem storages such as accumulo will also be protected from the same problem. PIG-3617 -2.patch implements what I describe here. I can commit it if you agree. Thanks!
        Hide
        Nezih Yigitbasi added a comment -

        Yes, that's even better! Thanks.

        Show
        Nezih Yigitbasi added a comment - Yes, that's even better! Thanks.
        Hide
        Cheolsoo Park added a comment -

        Committed to trunk. Thank you Nezih!

        Show
        Cheolsoo Park added a comment - Committed to trunk. Thank you Nezih!
        Hide
        Rohini Palaniswamy added a comment -

        Just found out that with HADOOP-7549 in hadoop-2 branch, Utils.hasFileSystemImpl() is broken. Will open a jira to fix it.

        Show
        Rohini Palaniswamy added a comment - Just found out that with HADOOP-7549 in hadoop-2 branch, Utils.hasFileSystemImpl() is broken. Will open a jira to fix it.
        Hide
        Nezih Yigitbasi added a comment -

        Aha, good catch.

        Show
        Nezih Yigitbasi added a comment - Aha, good catch.
        Hide
        Rohini Palaniswamy added a comment -

        Will fix that as part of PIG-3672

        Show
        Rohini Palaniswamy added a comment - Will fix that as part of PIG-3672

          People

          • Assignee:
            Nezih Yigitbasi
            Reporter:
            Nezih Yigitbasi
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development