Hadoop Common
  1. Hadoop Common
  2. HADOOP-8761

Help for FsShell's Stat incorrectly mentions "file size in blocks" (should be bytes)

    Details

    • Type: Bug Bug
    • Status: Patch Available
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 2.0.0-alpha
    • Fix Version/s: None
    • Component/s: fs
    • Labels:
      None

      Description

      Trivial patch attached corrects the usage information. Stat.java calls FileStatus.getLen(), which is most definitely the file size in bytes.

        Activity

        Hide
        Philip Zeyliger added a comment -

        Trivial patch attached.

        Show
        Philip Zeyliger added a comment - Trivial patch attached.
        Hide
        Philip Zeyliger added a comment -

        Submitting patch for Hudson.

        No tests were added.

        Show
        Philip Zeyliger added a comment - Submitting patch for Hudson. No tests were added.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12543672/HADOOP-8761.patch.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common:

        org.apache.hadoop.ha.TestZKFailoverController
        org.apache.hadoop.cli.TestCLI

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1398//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1398//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12543672/HADOOP-8761.patch.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common: org.apache.hadoop.ha.TestZKFailoverController org.apache.hadoop.cli.TestCLI +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1398//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1398//console This message is automatically generated.
        Hide
        Daryn Sharp added a comment -

        I think the correct behavior should be to match the usage. The unix stat uses %b for blocks, and %s for size in bytes.

        Show
        Daryn Sharp added a comment - I think the correct behavior should be to match the usage. The unix stat uses %b for blocks, and %s for size in bytes.
        Hide
        Andy Isaacson added a comment -

        I think the correct behavior should be to match the usage. The unix stat uses %b for blocks, and %s for size in bytes.

        I agree that adding %s for size in bytes is a good idea, but changing the semantics of %b is an incompatible change, which seems risky. OTOH, I think it would be potentially very useful to expose "number of blocks in this file".

        Tangentially, the documentation also fails to describe %F.

        Show
        Andy Isaacson added a comment - I think the correct behavior should be to match the usage. The unix stat uses %b for blocks, and %s for size in bytes. I agree that adding %s for size in bytes is a good idea, but changing the semantics of %b is an incompatible change, which seems risky. OTOH, I think it would be potentially very useful to expose "number of blocks in this file". Tangentially, the documentation also fails to describe %F.
        Hide
        Aaron T. Myers added a comment -

        Targeting for 2.2.0.

        Philip, it looks like the TestCLI failure is related, probably because it's expecting some specific help text.

        Show
        Aaron T. Myers added a comment - Targeting for 2.2.0. Philip, it looks like the TestCLI failure is related, probably because it's expecting some specific help text.
        Hide
        Daryn Sharp added a comment -

        True, it's incompatible yet correct behavior. I guess our options are codify the bad behavior forever, or fix it now. I lean toward the latter since there's enough churn in 2.x that this will be a minor inconvenience for people to adjust. What do the other watchers/lurkers think?

        Show
        Daryn Sharp added a comment - True, it's incompatible yet correct behavior. I guess our options are codify the bad behavior forever, or fix it now. I lean toward the latter since there's enough churn in 2.x that this will be a minor inconvenience for people to adjust. What do the other watchers/lurkers think?
        Hide
        Andy Isaacson added a comment -

        True, it's incompatible yet correct behavior. I guess our options are codify the bad behavior forever, or fix it now.

        Since this is client-side code, we could do something like

        1. in 2.1.0, add %s and leave %b giving bytes, with a stderr message that %b will change in a future release.
        2. in 2.2.0, change %b to give blocks and remove the stderr message.

        True, it's incompatible yet correct behavior.

        I'm a little skeptical that "correct" with respect to POSIX stat(2) is significant here. The "blocks" used in struct stat are very different from the "blocks" in HDFS; POSIX blocks are fixed size, atomically written [1], and the block size is a legacy feature (hardcoded to 512 bytes which is not the underlying size of anything anymore). By contrast HDFS blocks are variable size, nonatomic, and are much larger than POSIX blocks.

        I agree that exposing the blocksize (and the blockcount) is a pretty valuable feature, but there's a lot of caveats. Just off the top of my head: a single file can have multiple blocksizes.

        [1] blocks aren't guaranteed to be atomic by the POSIX spec AFAIK, but as a practical matter modern implementations are atomic at some blocksize between 512 and 4096 bytes.

        Show
        Andy Isaacson added a comment - True, it's incompatible yet correct behavior. I guess our options are codify the bad behavior forever, or fix it now. Since this is client-side code, we could do something like in 2.1.0, add %s and leave %b giving bytes, with a stderr message that %b will change in a future release. in 2.2.0, change %b to give blocks and remove the stderr message. True, it's incompatible yet correct behavior. I'm a little skeptical that "correct" with respect to POSIX stat(2) is significant here. The "blocks" used in struct stat are very different from the "blocks" in HDFS; POSIX blocks are fixed size, atomically written [1] , and the block size is a legacy feature (hardcoded to 512 bytes which is not the underlying size of anything anymore). By contrast HDFS blocks are variable size, nonatomic, and are much larger than POSIX blocks. I agree that exposing the blocksize (and the blockcount) is a pretty valuable feature, but there's a lot of caveats. Just off the top of my head: a single file can have multiple blocksizes. [1] blocks aren't guaranteed to be atomic by the POSIX spec AFAIK, but as a practical matter modern implementations are atomic at some blocksize between 512 and 4096 bytes.

          People

          • Assignee:
            Philip Zeyliger
            Reporter:
            Philip Zeyliger
          • Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:

              Development