Issue Details (XML | Word | Printable)

Key: HADOOP-2845
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Martin Traverso
Reporter: Martin Traverso
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

dfsadmin disk utilization report on Solaris is wrong

Created: 16/Feb/08 12:22 AM   Updated: 21/May/08 08:05 PM
Return to search
Component/s: fs
Affects Version/s: 0.16.0
Fix Version/s: 0.17.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works HADOOP-2845-1.patch 2008-02-21 05:54 AM Martin Traverso 7 kB
Text File Licensed for inclusion in ASF works HADOOP-2845-2.patch 2008-02-27 07:15 PM Martin Traverso 7 kB
Text File Licensed for inclusion in ASF works HADOOP-2845.patch 2008-02-16 01:10 AM Martin Traverso 0.6 kB

Resolution Date: 29/Feb/08 04:48 PM


 Description  « Hide
dfsadmin reports 2x disk utilization on some platforms (Solaris, MacOS). The reason for this is that org.apache.hadoop.fs.DU is relying on du's default block size when reporting sizes and assuming they are 1024 byte blocks. This works fine on Linux, but du Solaris and MacOS uses 512-byte blocks to report disk usage.

DU should use "du -sk" instead of "du -s" to force the command to report sizes based on 1024 byte blocks.



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Martin Traverso added a comment - 16/Feb/08 01:12 AM
Use "du -sk" to force the command to report sizes based on 1024 byte blocks.

Hadoop QA added a comment - 16/Feb/08 02:54 AM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12375738/HADOOP-2845.patch
against trunk revision 619744.

@author +1. The patch does not contain any @author tags.

tests included -1. The patch doesn't appear to include any new or modified tests.
Please justify why no tests are needed for this patch.

javadoc +1. The javadoc tool did not generate any warning messages.

javac +1. The applied patch does not generate any new javac compiler warnings.

release audit +1. The applied patch does not generate any new release audit warnings.

findbugs +1. The patch does not introduce any new Findbugs warnings.

core tests -1. The patch failed core unit tests.

contrib tests +1. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1810/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1810/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1810/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1810/console

This message is automatically generated.


dhruba borthakur added a comment - 19/Feb/08 05:46 AM
This patch is a candidate for the next release, 0.17.0


Martin Traverso added a comment - 21/Feb/08 05:16 AM - edited
I've been able to reproduce the failure on Solaris with ZFS. It turns out that metadata updates on ZFS are asynchonous, so DU does not see size change reflected immediately.

It used to work without the "-k" flag because by the time du runs, it can see 2 & 3 blocks, respectively, and the oldSize < newSize assertion holds true. With -k, those numbers are divided by 2 (integer math), so you get 1 < 1, which fails.

According to the comments on the test, its intention is to ensure that DU does not get called multiple times if interval is > 0. This is actually a function of the Shell class (which DU extends), so my recommendation is to create a separate test to ensure that condition holds.

I'm working on a patch.


Konstantin Shvachko added a comment - 22/Feb/08 08:41 PM
  1. Do you really need to wait(5000). Would it help if we flush() and then sync() rather than just sync()?
  2. du -sk for a 1-byte file prints out 0 for nfs mounted on my linux box. So you will be getting 0-size blocks in this case.

Martin Traverso added a comment - 22/Feb/08 10:02 PM
> 1. Do you really need to wait(5000). Would it help if we flush() and then sync() rather than just sync()?

Doesn't work on Solaris w/ ZFS. Du doesn't see the size increase until after a few seconds have elapsed, hence the wait. I know it's not ideal, but it's the best I could come up with that would work. Even a 3s wait causes the test to fail, for example.

> 2. du -sk for a 1-byte file prints out 0 for nfs mounted on my linux box. So you will be getting 0-size blocks in this case.

Do you get that consistently? Or does it show > 0 after a while? Are you mounting NFS with attribute caching, and if so, what is the timeout?


Konstantin Shvachko added a comment - 23/Feb/08 03:17 AM
> wait(5000)

This is too bad, we are trying to avoid using waits in tests, mainly because it is not deterministic.
Can't believe ZFS doesn't have meta-data synchronization, it's posix right?

Yes, 0 is stable. I du files that's been created last year and get the same result.
I don't know about attribute caching, but I guess timeouts are in seconds, not in hours or years.


Martin Traverso added a comment - 23/Feb/08 04:47 AM
> Can't believe ZFS doesn't have meta-data synchronization

It does, but apparently space usage calculation doesn't qualify. From zfs man page:

"Committing a change to a disk using fsync(3c) or
O_SYNC does not necessarily guarantee that the space
usage information is updated immediately."

> Yes, 0 is stable. I du files that's been created last year and get the same result.

I was able to reproduce that. Not sure why, but files up to 64 bytes show 0 utilization on NFS according to DU. I guess this can be fixed by writing a bigger file.


Konstantin Shvachko added a comment - 25/Feb/08 07:28 PM
> files up to 64 bytes show 0 utilization on NFS according to DU. I guess this can be fixed by writing a bigger file.

+1 writing 128 bytes instead of 1 fixes the problem.


Hadoop QA added a comment - 27/Feb/08 03:46 AM
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12376095/HADOOP-2845-1.patch
against trunk revision 619744.

@author +1. The patch does not contain any @author tags.

tests included +1. The patch appears to include 6 new or modified tests.

javadoc +1. The javadoc tool did not generate any warning messages.

javac +1. The applied patch does not generate any new javac compiler warnings.

release audit +1. The applied patch does not generate any new release audit warnings.

findbugs +1. The patch does not introduce any new Findbugs warnings.

core tests +1. The patch passed core unit tests.

contrib tests +1. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1836/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1836/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1836/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1836/console

This message is automatically generated.


Martin Traverso added a comment - 27/Feb/08 07:15 PM
Write 128 bytes to get around the NFS issue.

Hadoop QA added a comment - 28/Feb/08 05:58 AM
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12376661/HADOOP-2845-2.patch
against trunk revision 619744.

@author +1. The patch does not contain any @author tags.

tests included +1. The patch appears to include 6 new or modified tests.

javadoc +1. The javadoc tool did not generate any warning messages.

javac +1. The applied patch does not generate any new javac compiler warnings.

release audit +1. The applied patch does not generate any new release audit warnings.

findbugs +1. The patch does not introduce any new Findbugs warnings.

core tests +1. The patch passed core unit tests.

contrib tests +1. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1857/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1857/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1857/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1857/console

This message is automatically generated.


Tom White added a comment - 29/Feb/08 04:48 PM
I've just committed this. Thanks Martin!

Hudson added a comment - 01/Mar/08 12:15 PM