Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.20.2
    • Fix Version/s: 0.23.0
    • Component/s: fuse-dfs
    • Labels:
      None
    • Environment:

      Fedora core 10, x86_64, 2.6.27.7-134.fc10.x86_64 #1 SMP (AMD 64), gcc 4.3.2, java 1.6.0 (IcedTea6 1.4 (fedora-7.b12.fc10-x86_64) Runtime Environment (build 1.6.0_0-b12) OpenJDK 64-Bit Server VM (build 10.0-b19, mixed mode)

    • Hadoop Flags:
      Reviewed

      Description

      Fuse-dfs should cache fs handles on a per-user basis. This significantly increases performance, and has the side effect of fixing the current code which leaks fs handles.

      The original bug description follows:

      I run the following test:

      1. Run hadoop DFS in single node mode
      2. start up fuse_dfs
      3. copy my source tree, about 250 megs, into the DFS
      cp -av * /mnt/hdfs/

      in /var/log/messages I keep seeing:

      Dec 22 09:02:08 bodum fuse_dfs: ERROR: hdfs trying to utime /bar/backend-trunk2/src/machinery/hadoop/output/2008/11/19 to 1229385138/1229963739

      and then eventually

      Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1333
      Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1333
      Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
      Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1333
      Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
      Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1333
      Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1209
      Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
      Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
      Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
      Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
      Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
      Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
      Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
      Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
      Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
      Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1209
      Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
      Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
      Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
      Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1333
      Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1209
      Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037

      and the file system hangs. hadoop is still running and I don't see any errors in it's logs. I have to unmount the dfs and restart fuse_dfs and then everything is fine again. At some point I see the following messages in the /var/log/messages:

      ERROR: dfs problem - could not close file_handle(139677114350528) for /bar/backend-trunk2/src/machinery/hadoop/input/2008/12/14/actionrecordlog-8339-93825052368848-1229278807.log fuse_dfs.c:1464
      Dec 22 09:04:49 bodum fuse_dfs: ERROR: dfs problem - could not close file_handle(139676770220176) for /bar/backend-trunk2/src/machinery/hadoop/input/2008/12/14/actionrecordlog-8140-93825025883216-1229278759.log fuse_dfs.c:1464
      Dec 22 09:05:13 bodum fuse_dfs: ERROR: dfs problem - could not close file_handle(139677114812832) for /bar/backend-trunk2/src/machinery/hadoop/input/2008/12/14/actionrecordlog-8138-93825070138960-1229251587.log fuse_dfs.c:1464

      Is this a known issue? Am I just flooding the system too much. All of this is being performed on a single, dual core, machine.

      Thanks!
      ttyl
      Dima

      1. hdfs-420-3.patch
        42 kB
        Eli Collins
      2. hdfs-420-2.patch
        40 kB
        Eli Collins
      3. hdfs-420-1.patch
        41 kB
        Eli Collins
      4. fuse_dfs_020_memleaks_v8.patch
        24 kB
        Brian Bockelman
      5. fuse_dfs_020_memleaks_v3.patch
        20 kB
        Brian Bockelman
      6. fuse_dfs_020_memleaks.patch
        25 kB
        Brian Bockelman

        Issue Links

          Activity

          Dima Brodsky created issue -
          Owen O'Malley made changes -
          Field Original Value New Value
          Project Hadoop Common [ 12310240 ] HDFS [ 12310942 ]
          Key HADOOP-4932 HDFS-420
          Affects Version/s 0.19.0 [ 12313211 ]
          Component/s contrib/fuse-dfs [ 12312913 ]
          Component/s contrib/fuse-dfs [ 12312376 ]
          Brian Bockelman made changes -
          Attachment HDFS-420.patch [ 12429045 ]
          Brian Bockelman made changes -
          Link This issue is blocked by HDFS-422 [ HDFS-422 ]
          Brian Bockelman made changes -
          Attachment fuse_dfs_020_memleaks.patch [ 12471672 ]
          Brian Bockelman made changes -
          Link This issue is blocked by HDFS-422 [ HDFS-422 ]
          Brian Bockelman made changes -
          Link This issue blocks HDFS-422 [ HDFS-422 ]
          Brian Bockelman made changes -
          Attachment HDFS-420.patch [ 12429045 ]
          Brian Bockelman made changes -
          Attachment fuse_dfs_020_memleaks.patch [ 12471672 ]
          Brian Bockelman made changes -
          Attachment fuse_dfs_020_memleaks.patch [ 12472384 ]
          Brian Bockelman made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Affects Version/s 0.20.2 [ 12314204 ]
          Assignee Brian Bockelman [ bockelman ]
          Fix Version/s 0.20.3 [ 12314814 ]
          Brian Bockelman made changes -
          Attachment fuse_dfs_020_memleaks_v3.patch [ 12473592 ]
          Brian Bockelman made changes -
          Attachment fuse_dfs_020_memleaks_v8.patch [ 12479821 ]
          Eli Collins made changes -
          Link This issue is duplicated by HDFS-422 [ HDFS-422 ]
          Eli Collins made changes -
          Link This issue blocks HDFS-422 [ HDFS-422 ]
          Eli Collins made changes -
          Summary fuse_dfs is unable to connect to the dfs after a copying a large number of files into the dfs over fuse Fuse-dfs should cache fs handles
          Issue Type Bug [ 1 ] Improvement [ 4 ]
          Fix Version/s 0.23.0 [ 12315571 ]
          Fix Version/s 0.20.3 [ 12314814 ]
          Eli Collins made changes -
          Attachment hdfs-420-1.patch [ 12480494 ]
          Eli Collins made changes -
          Description I run the following test:

          1. Run hadoop DFS in single node mode
          2. start up fuse_dfs
          3. copy my source tree, about 250 megs, into the DFS
               cp -av * /mnt/hdfs/

          in /var/log/messages I keep seeing:

          Dec 22 09:02:08 bodum fuse_dfs: ERROR: hdfs trying to utime /bar/backend-trunk2/src/machinery/hadoop/output/2008/11/19 to 1229385138/1229963739

          and then eventually

          Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1333
          Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1333
          Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
          Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1333
          Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
          Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1333
          Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1209
          Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
          Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
          Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
          Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
          Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
          Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
          Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
          Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
          Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
          Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1209
          Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
          Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
          Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
          Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1333
          Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1209
          Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037

          and the file system hangs. hadoop is still running and I don't see any errors in it's logs. I have to unmount the dfs and restart fuse_dfs and then everything is fine again. At some point I see the following messages in the /var/log/messages:

          ERROR: dfs problem - could not close file_handle(139677114350528) for /bar/backend-trunk2/src/machinery/hadoop/input/2008/12/14/actionrecordlog-8339-93825052368848-1229278807.log fuse_dfs.c:1464
          Dec 22 09:04:49 bodum fuse_dfs: ERROR: dfs problem - could not close file_handle(139676770220176) for /bar/backend-trunk2/src/machinery/hadoop/input/2008/12/14/actionrecordlog-8140-93825025883216-1229278759.log fuse_dfs.c:1464
          Dec 22 09:05:13 bodum fuse_dfs: ERROR: dfs problem - could not close file_handle(139677114812832) for /bar/backend-trunk2/src/machinery/hadoop/input/2008/12/14/actionrecordlog-8138-93825070138960-1229251587.log fuse_dfs.c:1464

          Is this a known issue? Am I just flooding the system too much. All of this is being performed on a single, dual core, machine.

          Thanks!
          ttyl
          Dima
          Fuse-dfs should cache fs handles on a per-user basis. This significantly increases performance, and has the side effect of fixing the current code which leaks fs handles.

          The original bug description follows:

          I run the following test:

          1. Run hadoop DFS in single node mode
          2. start up fuse_dfs
          3. copy my source tree, about 250 megs, into the DFS
               cp -av * /mnt/hdfs/

          in /var/log/messages I keep seeing:

          Dec 22 09:02:08 bodum fuse_dfs: ERROR: hdfs trying to utime /bar/backend-trunk2/src/machinery/hadoop/output/2008/11/19 to 1229385138/1229963739

          and then eventually

          Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1333
          Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1333
          Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
          Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1333
          Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
          Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1333
          Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1209
          Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
          Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
          Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
          Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
          Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
          Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
          Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
          Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
          Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
          Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1209
          Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
          Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
          Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
          Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1333
          Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1209
          Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037

          and the file system hangs. hadoop is still running and I don't see any errors in it's logs. I have to unmount the dfs and restart fuse_dfs and then everything is fine again. At some point I see the following messages in the /var/log/messages:

          ERROR: dfs problem - could not close file_handle(139677114350528) for /bar/backend-trunk2/src/machinery/hadoop/input/2008/12/14/actionrecordlog-8339-93825052368848-1229278807.log fuse_dfs.c:1464
          Dec 22 09:04:49 bodum fuse_dfs: ERROR: dfs problem - could not close file_handle(139676770220176) for /bar/backend-trunk2/src/machinery/hadoop/input/2008/12/14/actionrecordlog-8140-93825025883216-1229278759.log fuse_dfs.c:1464
          Dec 22 09:05:13 bodum fuse_dfs: ERROR: dfs problem - could not close file_handle(139677114812832) for /bar/backend-trunk2/src/machinery/hadoop/input/2008/12/14/actionrecordlog-8138-93825070138960-1229251587.log fuse_dfs.c:1464

          Is this a known issue? Am I just flooding the system too much. All of this is being performed on a single, dual core, machine.

          Thanks!
          ttyl
          Dima
          Eli Collins made changes -
          Attachment hdfs-420-2.patch [ 12480753 ]
          Eli Collins made changes -
          Attachment hdfs-420-2.patch [ 12480753 ]
          Eli Collins made changes -
          Attachment hdfs-420-2.patch [ 12480755 ]
          Eli Collins made changes -
          Attachment hdfs-420-3.patch [ 12483012 ]
          Eli Collins made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags [Reviewed]
          Resolution Fixed [ 1 ]
          Jeff Hammerbacher made changes -
          Link This issue relates to HDFS-3270 [ HDFS-3270 ]

            People

            • Assignee:
              Brian Bockelman
              Reporter:
              Dima Brodsky
            • Votes:
              2 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development