Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-6765

Log INodeID instead of full path in edit log when closing a file

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      When closing a file, currently the edit log records the full path instead of the inode id. The performance is sub-optimal, and it prevents further clean ups of the code (e.g., HDFS-6757)

      This jira proposes to record inode id instead of the full path in edit logs when closing a file.

        Activity

        Hide
        Daryn Sharp added a comment -

        Other than the inconsistency with other edit ops, is this going to break compatibility for the edits viewer?

        I'm a bit leery due to recent edit log corruptions. Replaying the edit based on inode instead of path isn't replaying the edit as originally issued which may mask edit bugs.

        I'll look at HDFS-6757 but could you elaborate on what impact this has on performance (other than re-building the path?) and what cleanup it blocks?

        Show
        Daryn Sharp added a comment - Other than the inconsistency with other edit ops, is this going to break compatibility for the edits viewer? I'm a bit leery due to recent edit log corruptions. Replaying the edit based on inode instead of path isn't replaying the edit as originally issued which may mask edit bugs. I'll look at HDFS-6757 but could you elaborate on what impact this has on performance (other than re-building the path?) and what cleanup it blocks?
        Hide
        Colin Patrick McCabe added a comment -

        I'm a bit leery due to recent edit log corruptions. Replaying the edit based on inode instead of path isn't replaying the edit as originally issued which may mask edit bugs.

        I hope I'm not misinterpreting what you're asking here, but the RPC "originally issued" by the DFSClient contains an inode ID, which we use in preference to the path if it's present.

        You guys already know this, but just to be 100% clear, the problem with closing a file by the path is that it doesn't work in the presence of renames. The old way of working around this problem was for the DFSClient to maintain a map of open files, and whenever it did a rename, it would modify that map. Of course, this fails horribly if some _other_ client renamed the file or ancestor.

        So the new way of doing things (introduced in HDFS-6294) is to basically ignore that path provided in the completeFile RPC (we would have removed it from the RPC, but we didn't want to break wire compatibility). I think we use that path in log messages in some cases (like if the inode isn't found), but that's about it. For everything important, we use the inode ID since that always works, even in the presence of renames. The paths we use come from that inode ID, not from the path in the completeFile RPC, which we assume is a lie.

        So in some sense, translating that inode ID provided in completeFile back to a path for the edit log is an extra step.

        Other than the inconsistency with other edit ops, is this going to break compatibility for the edits viewer?

        So, the compatibility guarantee with edit logs is that a new client can read old edits, but not vice versa. There is nothing like PB's "optional" fields in the current edit log format... any new stuff whatsoever causes a hard failure in old clients. This is enforced by code bailing out if the edit log version is newer than what it knows it can support. It won't even try.

        This isn't really a problem because anyone using oev is an expert doing some kind of advanced debugging. And the error message is pretty clear... get yourself a version of oev that matches the edit log before you attempt the debugging.

        We could handle this JIRA by looking at the edit log version number (provided in all edit logs), and either looking for a string or an inode in OP_CLOSE depending on that version. Alternately, we could create a new edit log op for close-with-inode-id. I kind of like the "version number" strategy... that's what the version number is for, after all, and we have too many legacy edit types anyway...

        I'll look at HDFS-6757 but could you elaborate on what impact this has on performance (other than re-building the path?) and what cleanup it blocks?

        Putting full paths into the edit log is kind of unfortunate, since they may be very long (especially if people have long pathnames). An inode ID is much smaller.

        The biggest reason why I think this is a good change isn't performance at all, though, but correctness and simplicity. If you have to repair a damaged edit log, and you miss a rename, you'll be a lot happier if the subsequent close operations are done by inode ID than if they're done by path. Our inode IDs are globally unique, which makes things nice and simple.

        On the other hand, handling paths can get hairy really fast. We have places in FSNamesystem comparing paths with String#equals, and relying on the fact that all paths are properly canonicalized. That's brittle. And of course, any time you're dealing with paths, you're also dealing with namesystem lookups, which will generate garbage and burn CPU.

        Show
        Colin Patrick McCabe added a comment - I'm a bit leery due to recent edit log corruptions. Replaying the edit based on inode instead of path isn't replaying the edit as originally issued which may mask edit bugs. I hope I'm not misinterpreting what you're asking here, but the RPC "originally issued" by the DFSClient contains an inode ID, which we use in preference to the path if it's present. You guys already know this, but just to be 100% clear, the problem with closing a file by the path is that it doesn't work in the presence of renames. The old way of working around this problem was for the DFSClient to maintain a map of open files, and whenever it did a rename, it would modify that map. Of course, this fails horribly if some _ other _ client renamed the file or ancestor. So the new way of doing things (introduced in HDFS-6294 ) is to basically ignore that path provided in the completeFile RPC (we would have removed it from the RPC, but we didn't want to break wire compatibility). I think we use that path in log messages in some cases (like if the inode isn't found), but that's about it. For everything important, we use the inode ID since that always works, even in the presence of renames. The paths we use come from that inode ID, not from the path in the completeFile RPC, which we assume is a lie. So in some sense, translating that inode ID provided in completeFile back to a path for the edit log is an extra step. Other than the inconsistency with other edit ops, is this going to break compatibility for the edits viewer? So, the compatibility guarantee with edit logs is that a new client can read old edits, but not vice versa. There is nothing like PB's "optional" fields in the current edit log format... any new stuff whatsoever causes a hard failure in old clients. This is enforced by code bailing out if the edit log version is newer than what it knows it can support. It won't even try. This isn't really a problem because anyone using oev is an expert doing some kind of advanced debugging. And the error message is pretty clear... get yourself a version of oev that matches the edit log before you attempt the debugging. We could handle this JIRA by looking at the edit log version number (provided in all edit logs), and either looking for a string or an inode in OP_CLOSE depending on that version. Alternately, we could create a new edit log op for close-with-inode-id. I kind of like the "version number" strategy... that's what the version number is for, after all, and we have too many legacy edit types anyway... I'll look at HDFS-6757 but could you elaborate on what impact this has on performance (other than re-building the path?) and what cleanup it blocks? Putting full paths into the edit log is kind of unfortunate, since they may be very long (especially if people have long pathnames). An inode ID is much smaller. The biggest reason why I think this is a good change isn't performance at all, though, but correctness and simplicity. If you have to repair a damaged edit log, and you miss a rename, you'll be a lot happier if the subsequent close operations are done by inode ID than if they're done by path. Our inode IDs are globally unique, which makes things nice and simple. On the other hand, handling paths can get hairy really fast. We have places in FSNamesystem comparing paths with String#equals, and relying on the fact that all paths are properly canonicalized. That's brittle. And of course, any time you're dealing with paths, you're also dealing with namesystem lookups, which will generate garbage and burn CPU.
        Hide
        Daryn Sharp added a comment -

        My bad, I forgot 2.x was more FD-ish. I actually filed jiras long ago about the rename/close/orphaned lease problem so I'm glad that's fixed.

        I was vague regarding oev, I was referring to backwards-compatibility wrt to usability. Any tool that depends on oev records containing a path will be broken. As a human, I'd rather have the path in plain view versus rummaging through previous edit log segments or ultimately an image.

        AddCloseOp already contains an inode field that isn't being set. Isn't the simplest & compatible solution to set the inode field and use the id with preference over the path? It won't require another layout change and will enable rolling back to an earlier NN if there's a nasty bug in a new NN.

        Show
        Daryn Sharp added a comment - My bad, I forgot 2.x was more FD-ish. I actually filed jiras long ago about the rename/close/orphaned lease problem so I'm glad that's fixed. I was vague regarding oev, I was referring to backwards-compatibility wrt to usability. Any tool that depends on oev records containing a path will be broken. As a human, I'd rather have the path in plain view versus rummaging through previous edit log segments or ultimately an image. AddCloseOp already contains an inode field that isn't being set. Isn't the simplest & compatible solution to set the inode field and use the id with preference over the path? It won't require another layout change and will enable rolling back to an earlier NN if there's a nasty bug in a new NN.

          People

          • Assignee:
            Unassigned
            Reporter:
            Haohui Mai
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:

              Development