Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-11900

Hedged reads thread pool creation not synchronized

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.8.0
    • Fix Version/s: 3.2.0, 3.1.1
    • Component/s: hdfs-client
    • Labels:
      None
    • Target Version/s:

      Description

      Non-static synchronized method initThreadsNumForHedgedReads can't synchronize the access to the static class variable HEDGED_READ_THREAD_POOL.

        private static ThreadPoolExecutor HEDGED_READ_THREAD_POOL;
      ...
        private synchronized void initThreadsNumForHedgedReads(int num) {
      

      2 DFS clients may update the same static variable in a race because the lock is on each DFS client object, not on the shared DFSClient class object.

      There are 2 possible fixes:
      1. "Global thread pool": Change initThreadsNumForHedgedReads to static
      2. "Per-client thread pool": Change HEDGED_READ_THREAD_POOL to non-static

      From the description for property dfs.client.hedged.read.threadpool.size:

      to a positive number. The threadpool size is how many threads to dedicate
      to the running of these 'hedged', concurrent reads in your client.

      it seems to indicate the thread pool is per DFS client.

      Let's assume we go with #1 "Global thread pool". One DFS client has the property set to 10 in its config, while the other client has the property set to 5 in its config, what is supposed to the size of the global thread pool? 5? 10? Or 15?

      The 2nd fix seems more reasonable to me.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                jzhuge John Zhuge
                Reporter:
                jzhuge John Zhuge
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: