Accumulo
  1. Accumulo
  2. ACCUMULO-507

Large amount of ranges can prevent job from kicking off

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Won't Fix
    • Affects Version/s: 1.3.5
    • Fix Version/s: None
    • Component/s: client
    • Labels:

      Description

      We use the various ranges a user provides to create splits. Those get read when the job is submitted by the client. On the client side, those ranges are used to get all of the splits, and then the job is started. If the configuration is too large, the job will fail to submit (this size is configurable, but that's besides the point). We should look into clearing the ranges out of the jobconf if it's large to prevent this error, since at this point the ranges are no longer needed in the configuration.

      1. ACCUMULO-507.patch
        5 kB
        Billie Rinaldi

        Activity

        Hide
        Billie Rinaldi added a comment -

        This should be fixed by r1382004. I made the set ranges method write all the ranges to a file instead of adding them directly to the configuration, which is the same thing we're doing with the password. We should test that it actually solves the problem, though.

        Show
        Billie Rinaldi added a comment - This should be fixed by r1382004. I made the set ranges method write all the ranges to a file instead of adding them directly to the configuration, which is the same thing we're doing with the password. We should test that it actually solves the problem, though.
        Hide
        Keith Turner added a comment -

        This ticket is marked as fix for 1.4.2, but the only commit is to trunk.

        Show
        Keith Turner added a comment - This ticket is marked as fix for 1.4.2, but the only commit is to trunk.
        Hide
        Billie Rinaldi added a comment -

        Good point. Do you think we should make this change in 1.4 branch?

        Show
        Billie Rinaldi added a comment - Good point. Do you think we should make this change in 1.4 branch?
        Hide
        Keith Turner added a comment -

        I do not have an opinion about it being in 1.4.2, I just want to ensure the release notes will be correct.

        Show
        Keith Turner added a comment - I do not have an opinion about it being in 1.4.2, I just want to ensure the release notes will be correct.
        Hide
        Chris Waring added a comment -

        My preference would be to include it in 1.4.2.

        Show
        Chris Waring added a comment - My preference would be to include it in 1.4.2.
        Hide
        Eric Newton added a comment -

        Fixed with 1397702 and 1397700 which added a quick test to read and write a large number of ranges.

        Show
        Eric Newton added a comment - Fixed with 1397702 and 1397700 which added a quick test to read and write a large number of ranges.
        Hide
        Keith Turner added a comment - - edited

        See ACCUMULO-826. Rolled back changes that created a file 1.4 : r1400976 1.5. : r1401004. What should we do for 1.5?

        Show
        Keith Turner added a comment - - edited See ACCUMULO-826 . Rolled back changes that created a file 1.4 : r1400976 1.5. : r1401004. What should we do for 1.5?
        Hide
        Hudson added a comment -

        Integrated in Accumulo-1.4.x #244 (See https://builds.apache.org/job/Accumulo-1.4.x/244/)
        ACCUMULO-826 ACCUMULO-507 reverted revisions 1397700,1382923,1339308,1339223,1336322. These changes caused map reduce jobs to fail if the process that started the job exited. (Revision 1400976)

        Result = SUCCESS
        kturner :
        Files :

        • /accumulo/branches/1.4/src/core/src/main/java/org/apache/accumulo/core/client/mapreduce/AccumuloOutputFormat.java
        • /accumulo/branches/1.4/src/core/src/main/java/org/apache/accumulo/core/client/mapreduce/InputFormatBase.java
        • /accumulo/branches/1.4/src/core/src/test/java/org/apache/accumulo/core/client/mapreduce/AccumuloInputFormatTest.java
        Show
        Hudson added a comment - Integrated in Accumulo-1.4.x #244 (See https://builds.apache.org/job/Accumulo-1.4.x/244/ ) ACCUMULO-826 ACCUMULO-507 reverted revisions 1397700,1382923,1339308,1339223,1336322. These changes caused map reduce jobs to fail if the process that started the job exited. (Revision 1400976) Result = SUCCESS kturner : Files : /accumulo/branches/1.4/src/core/src/main/java/org/apache/accumulo/core/client/mapreduce/AccumuloOutputFormat.java /accumulo/branches/1.4/src/core/src/main/java/org/apache/accumulo/core/client/mapreduce/InputFormatBase.java /accumulo/branches/1.4/src/core/src/test/java/org/apache/accumulo/core/client/mapreduce/AccumuloInputFormatTest.java
        Hide
        Hudson added a comment -

        Integrated in Accumulo-Trunk #533 (See https://builds.apache.org/job/Accumulo-Trunk/533/)
        ACCUMULO-826 ACCUMULO-507 reverted changes that caused map reduce jobs to fail if the process that started the job exited (Revision 1401004)

        Result = SUCCESS
        kturner :
        Files :

        • /accumulo/trunk
        • /accumulo/trunk/assemble
        • /accumulo/trunk/core
        • /accumulo/trunk/core/src/main/java/org/apache/accumulo/core/client/mapreduce/AccumuloOutputFormat.java
        • /accumulo/trunk/core/src/main/java/org/apache/accumulo/core/client/mapreduce/InputFormatBase.java
        • /accumulo/trunk/core/src/test/java/org/apache/accumulo/core/client/mapreduce/AccumuloInputFormatTest.java
        • /accumulo/trunk/fate/src/main/java/org/apache/accumulo/fate/ZooStore.java
        • /accumulo/trunk/fate/src/main/java/org/apache/accumulo/fate/zookeeper/ZooSession.java
        • /accumulo/trunk/server
        • /accumulo/trunk/src
        Show
        Hudson added a comment - Integrated in Accumulo-Trunk #533 (See https://builds.apache.org/job/Accumulo-Trunk/533/ ) ACCUMULO-826 ACCUMULO-507 reverted changes that caused map reduce jobs to fail if the process that started the job exited (Revision 1401004) Result = SUCCESS kturner : Files : /accumulo/trunk /accumulo/trunk/assemble /accumulo/trunk/core /accumulo/trunk/core/src/main/java/org/apache/accumulo/core/client/mapreduce/AccumuloOutputFormat.java /accumulo/trunk/core/src/main/java/org/apache/accumulo/core/client/mapreduce/InputFormatBase.java /accumulo/trunk/core/src/test/java/org/apache/accumulo/core/client/mapreduce/AccumuloInputFormatTest.java /accumulo/trunk/fate/src/main/java/org/apache/accumulo/fate/ZooStore.java /accumulo/trunk/fate/src/main/java/org/apache/accumulo/fate/zookeeper/ZooSession.java /accumulo/trunk/server /accumulo/trunk/src
        Hide
        Billie Rinaldi added a comment -

        TODO: figure out how to reintroduce changes using DistributedCache correctly, without deleting file on client exit or leaving the file sitting around when the job finishes.

        Show
        Billie Rinaldi added a comment - TODO: figure out how to reintroduce changes using DistributedCache correctly, without deleting file on client exit or leaving the file sitting around when the job finishes.
        Hide
        Billie Rinaldi added a comment -

        I'm tempted not to fix this since it seems impossible to satisfy both our requirements: not leaving large files around and not killing the job when the process exits. Is it too onerous to adjust the mapred.user.jobconf.limit property to allow for jobs with large numbers of ranges?

        Show
        Billie Rinaldi added a comment - I'm tempted not to fix this since it seems impossible to satisfy both our requirements: not leaving large files around and not killing the job when the process exits. Is it too onerous to adjust the mapred.user.jobconf.limit property to allow for jobs with large numbers of ranges?
        Hide
        John Vines added a comment -

        I think that is acceptable, as long we document the mapred.user.jobconf.limit as an potential issue/fix for the input format.

        Show
        John Vines added a comment - I think that is acceptable, as long we document the mapred.user.jobconf.limit as an potential issue/fix for the input format.
        Hide
        Mike Drob added a comment -

        Can you clear the ranges from the conf after the splits are generated? Not sure how it's done now, but individual tasks should be able to reconstruct what they need from the split metadata, and then there's no reason to pass around all the ranges to each mapper.

        Show
        Mike Drob added a comment - Can you clear the ranges from the conf after the splits are generated? Not sure how it's done now, but individual tasks should be able to reconstruct what they need from the split metadata, and then there's no reason to pass around all the ranges to each mapper.
        Hide
        Billie Rinaldi added a comment -

        I looked into it. What I saw in the code led me to believe that the size restriction is on the entire job submit directory, which contains a file where the ranges are written after they are obtained from getSplits. However, the code was not very transparent, so it's possible that I made incorrect conclusions. Anyone else feel like verifying it one way or the other?

        Show
        Billie Rinaldi added a comment - I looked into it. What I saw in the code led me to believe that the size restriction is on the entire job submit directory, which contains a file where the ranges are written after they are obtained from getSplits. However, the code was not very transparent, so it's possible that I made incorrect conclusions. Anyone else feel like verifying it one way or the other?
        Hide
        Billie Rinaldi added a comment -

        I think this fix was reverted for the wrong reason. ACCUMULO-826 found that a MapReduce job would fail if the process that started the job was killed. This was an issue because we were writing the user's password to a file that was being deleted on exit. Whenever a new map task is kicked off it needs to read the password, so it was trying to read a nonexistent file. But the ranges don't need to be read by each map task, they only need to be accessed once when getSplits is called, which happens before the job is actually submitted. Thus it shouldn't matter if the file containing the ranges is deleted in the middle of a job – if the process exits before the job is actually submitted, the job will fail, but that seems OK to me.

        The other issue pointed out in ACCUMULO-826 is valid, that the file was being written to the file system, added to the distributed cache, then read directly from the file system. The ranges file shouldn't have been added to the distributed cache at all, since it's not needed by the slave nodes.

        However, there may be little point in re-applying this fix if the mapred.user.jobconf.limit applies to the whole job submit directory. Using the ranges file method might effectively halve the size of the job submit directory, but you could still hit the limit if you had enough ranges. I guess I'll try to verify this is true. Does anyone have opinions about this issue?

        Show
        Billie Rinaldi added a comment - I think this fix was reverted for the wrong reason. ACCUMULO-826 found that a MapReduce job would fail if the process that started the job was killed. This was an issue because we were writing the user's password to a file that was being deleted on exit. Whenever a new map task is kicked off it needs to read the password, so it was trying to read a nonexistent file. But the ranges don't need to be read by each map task, they only need to be accessed once when getSplits is called, which happens before the job is actually submitted. Thus it shouldn't matter if the file containing the ranges is deleted in the middle of a job – if the process exits before the job is actually submitted, the job will fail, but that seems OK to me. The other issue pointed out in ACCUMULO-826 is valid, that the file was being written to the file system, added to the distributed cache, then read directly from the file system. The ranges file shouldn't have been added to the distributed cache at all, since it's not needed by the slave nodes. However, there may be little point in re-applying this fix if the mapred.user.jobconf.limit applies to the whole job submit directory. Using the ranges file method might effectively halve the size of the job submit directory, but you could still hit the limit if you had enough ranges. I guess I'll try to verify this is true. Does anyone have opinions about this issue?
        Hide
        Keith Turner added a comment -

        Does anyone have opinions about this issue?

        I do not have a problem with reverting the revert if the job keeps running. What about random files being left around? Was the file place somewhere that M/R will eventually clean up if the process that created it does not?

        Show
        Keith Turner added a comment - Does anyone have opinions about this issue? I do not have a problem with reverting the revert if the job keeps running. What about random files being left around? Was the file place somewhere that M/R will eventually clean up if the process that created it does not?
        Hide
        Billie Rinaldi added a comment -

        Happily, I have verified that my initial impression was incorrect: the size limit is on the job configuration file only.

        What about random files being left around?

        It looks like the files are created in the user's home directory. I'm not sure how FileSystem.deleteOnExit works, but since the ACCUMULO-826 bug was about the files being deleted too soon, I'm inclined to think it's effective. This is what the javadocs say: "Mark a path to be deleted when FileSystem is closed. When the JVM shuts down, all FileSystem objects will be closed automatically. Then, the marked path will be deleted as a result of closing the FileSystem."

        I'll look at recreating a patch. For 1.5.0?

        Show
        Billie Rinaldi added a comment - Happily, I have verified that my initial impression was incorrect: the size limit is on the job configuration file only. What about random files being left around? It looks like the files are created in the user's home directory. I'm not sure how FileSystem.deleteOnExit works, but since the ACCUMULO-826 bug was about the files being deleted too soon, I'm inclined to think it's effective. This is what the javadocs say: "Mark a path to be deleted when FileSystem is closed. When the JVM shuts down, all FileSystem objects will be closed automatically. Then, the marked path will be deleted as a result of closing the FileSystem." I'll look at recreating a patch. For 1.5.0?
        Hide
        John Vines added a comment -

        I could go either way on it. There is a functional workaround, so it's not pressing.

        Show
        John Vines added a comment - I could go either way on it. There is a functional workaround, so it's not pressing.
        Hide
        Billie Rinaldi added a comment -

        The workaround does require a restart of the jobtracker, so I will go ahead with the patch; but since there's a workaround, I'll only check it into trunk.

        Show
        Billie Rinaldi added a comment - The workaround does require a restart of the jobtracker, so I will go ahead with the patch; but since there's a workaround, I'll only check it into trunk.
        Hide
        Keith Turner added a comment -

        deleteOnExit works if the process is killed w/ SIGTERM. It will not work if the nodes dies or the process is killed w/ SIGKILL. A job submit directory was mentioned, I was wondering if the file would be placed there and if it would be periodically cleaned up by the job tracker. But the file was placed in user home dir, so that would be left in case of unclean shutdown. Could the file be placed somewhere that the jobtracker does cleanup?

        Show
        Keith Turner added a comment - deleteOnExit works if the process is killed w/ SIGTERM. It will not work if the nodes dies or the process is killed w/ SIGKILL. A job submit directory was mentioned, I was wondering if the file would be placed there and if it would be periodically cleaned up by the job tracker. But the file was placed in user home dir, so that would be left in case of unclean shutdown. Could the file be placed somewhere that the jobtracker does cleanup?
        Hide
        Billie Rinaldi added a comment -

        I was wondering about that. The job submit directory is an example of such a temporary directory, but that wouldn't be created until after the user has called setRanges and submitted the job.

        Maybe I should revisit deleting the ranges from the job configuration when getSplits is called, after the ranges are read. Is there ever a case where a job would be submitted more than once? If so, this change would cause a failure on the second submit.

        Show
        Billie Rinaldi added a comment - I was wondering about that. The job submit directory is an example of such a temporary directory, but that wouldn't be created until after the user has called setRanges and submitted the job. Maybe I should revisit deleting the ranges from the job configuration when getSplits is called, after the ranges are read. Is there ever a case where a job would be submitted more than once? If so, this change would cause a failure on the second submit.
        Hide
        Billie Rinaldi added a comment -

        I think the attached patch is working. It adds an optional boolean parameter that clears the ranges after they are read.

        Show
        Billie Rinaldi added a comment - I think the attached patch is working. It adds an optional boolean parameter that clears the ranges after they are read.
        Hide
        John Vines added a comment -

        Billie Rinaldi Is this patch good to commit (after updating) or do we need more discussion here?

        Show
        John Vines added a comment - Billie Rinaldi Is this patch good to commit (after updating) or do we need more discussion here?
        Hide
        Billie Rinaldi added a comment -

        I'm a little uncomfortable with not knowing whether this has a chance of causing your job to fail, if it's the case that under some conditions a job is submitted more than once. If I figure it out today I'll commit, and otherwise push it off. It doesn't seem like people are clamoring for this change given that there's a workaround.

        Show
        Billie Rinaldi added a comment - I'm a little uncomfortable with not knowing whether this has a chance of causing your job to fail, if it's the case that under some conditions a job is submitted more than once. If I figure it out today I'll commit, and otherwise push it off. It doesn't seem like people are clamoring for this change given that there's a workaround.
        Hide
        John Vines added a comment -

        Sounds good

        Show
        John Vines added a comment - Sounds good
        Hide
        Billie Rinaldi added a comment -

        I'm going to close this. In Hadoop 2, the property limiting the conf size no longer exists (which doesn't necessarily mean that there is no limit, just that I haven't found it). If someone finds the Hadoop 1 workaround to be insufficient, we can revisit this then.

        Show
        Billie Rinaldi added a comment - I'm going to close this. In Hadoop 2, the property limiting the conf size no longer exists (which doesn't necessarily mean that there is no limit, just that I haven't found it). If someone finds the Hadoop 1 workaround to be insufficient, we can revisit this then.

          People

          • Assignee:
            Billie Rinaldi
            Reporter:
            John Vines
          • Votes:
            1 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development