Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-14276

[SBN read] Reduce tailing overhead

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.3.0
    • 2.10.0, 3.3.0, 3.2.1, 3.1.3
    • ha, namenode
    • None
    • Hardware: 4-node cluster, each node has 4 core, Xeon 2.5Ghz, 25GB memory.
      Software: CentOS 7.4, CDH 6.0 + Consistent Reads from Standby, Kerberos, SSL, RPC encryption + Data Transfer Encryption.

    • Reviewed

    Description

      When Observer setsĀ dfs.ha.tail-edits.period = 0ms, it tails edit log continuously in order to fetch the latest edits, but there is a lot of overhead in doing so.

      Critically, edit log tailer should not update NameDirSize metric every time. It has nothing to do with fetching edits, and it involves lots of directory space calculation.

      Profiler suggests a non-trivial chunk of time is spent for nothing.

      Other than this, the biggest overhead is in the communication to serialize/deserialize messages to/from JNs. I am looking for ways to reduce the cost because it's burning 30% of my CPU time even when the cluster is idle.

      Attachments

        1. HDFS-14276.000.patch
          2 kB
          Wei-Chiu Chuang
        2. HDFS-14276-01.patch
          3 kB
          Ayush Saxena
        3. HDFS-14276-branch-2-01.patch
          3 kB
          Ayush Saxena
        4. Screen Shot 2019-02-12 at 10.51.41 PM.png
          421 kB
          Wei-Chiu Chuang
        5. Screen Shot 2019-02-14 at 11.50.37 AM.png
          399 kB
          Wei-Chiu Chuang

        Issue Links

          Activity

            People

              ayushtkn Ayush Saxena
              weichiu Wei-Chiu Chuang
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: