XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      In testing HDFS-347 against HBase (thanks Jean-Daniel Cryans) we ran into the following case:

      • a workload is running which puts a bunch of local sockets in the PeerCache
      • the workload abates for a while, causing the sockets to go "stale" (ie the DN side disconnects after the keepalive timeout)
      • the workload starts again

      In this case, the local socket retrieved from the cache failed the newBlockReader call, and it incorrectly disabled local sockets on that host. This is similar to an earlier bug HDFS-3376, but not quite the same.

      The next issue we ran into is that, once this happened, it never tried local sockets again, because the cache held lots of TCP sockets. Since we always managed to get a cached socket to the local node, it didn't bother trying local read again.

        Attachments

        1. fail.patch
          4 kB
          Colin McCabe
        2. HDFS-4417.004.patch
          26 kB
          Colin McCabe
        3. HDFS-4417.003.patch
          24 kB
          Colin McCabe
        4. HDFS-4417.002.patch
          33 kB
          Colin McCabe
        5. hdfs-4417.txt
          9 kB
          Todd Lipcon

          Activity

            People

            • Assignee:
              cmccabe Colin McCabe
              Reporter:
              tlipcon Todd Lipcon
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: