Uploaded image for project: 'Accumulo'
  1. Accumulo
  2. ACCUMULO-4458

lock contention around configuration settings impacts tablet server performance

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 1.7.1, 1.8.0
    • Fix Version/s: 1.7.3, 1.8.1, 2.0.0
    • Component/s: tserver
    • Labels:
      None

      Description

      While investigating a pretty severe performance regression comparing YCSB against 1.6 and 1.7, I found a fair bit of lock contention around getting configuration values. This was improved by ACCUMULO-4388, but various threads eventually all started contending on the configuration values expected from the site xml files.

        Issue Links

          Activity

          Hide
          elserj Josh Elser added a comment -

          Maybe related to ACCUMULO-3033 too?

          Show
          elserj Josh Elser added a comment - Maybe related to ACCUMULO-3033 too?
          Hide
          busbey Sean Busbey added a comment -

          Patch I intend to commit once tests finish running. Happy for any review feedback in the between time.

          Show
          busbey Sean Busbey added a comment - Patch I intend to commit once tests finish running. Happy for any review feedback in the between time.
          Hide
          busbey Sean Busbey added a comment -

          Interesting. I purposefully stayed away from trying to optimize for hte key-prefix-fetching case, since I didn't see those pop up in the big lock contention for the tablet server, but I also haven't configured any iterators.

          Show
          busbey Sean Busbey added a comment - Interesting. I purposefully stayed away from trying to optimize for hte key-prefix-fetching case, since I didn't see those pop up in the big lock contention for the tablet server, but I also haven't configured any iterators.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          -1 pre-patch 0m 0s JAVA_HOME is not defined.



          Subsystem Report/Notes
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12828547/ACCUMULO-4458-1.7.v1.patch
          JIRA Issue ACCUMULO-4458
          Optional Tests asflicense javac javadoc unit findbugs checkstyle compile
          uname Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /home/jenkins/jenkins-slave/workspace/PreCommit-ACCUMULO-Build/test_framework/yetus-0.3.0/lib/precommit/personality/accumulo.sh
          git revision 1.7 / 4136bfa
          Console output https://builds.apache.org/job/PreCommit-ACCUMULO-Build/40/console
          Powered by Apache Yetus 0.3.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment -1 pre-patch 0m 0s JAVA_HOME is not defined. Subsystem Report/Notes JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12828547/ACCUMULO-4458-1.7.v1.patch JIRA Issue ACCUMULO-4458 Optional Tests asflicense javac javadoc unit findbugs checkstyle compile uname Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /home/jenkins/jenkins-slave/workspace/PreCommit-ACCUMULO-Build/test_framework/yetus-0.3.0/lib/precommit/personality/accumulo.sh git revision 1.7 / 4136bfa Console output https://builds.apache.org/job/PreCommit-ACCUMULO-Build/40/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
          Hide
          elserj Josh Elser added a comment - - edited
          @@ -63,20 +64,19 @@ import com.google.common.base.Joiner;
            */
           public class HdfsZooInstance implements Instance {
           
          +  private final AccumuloConfiguration site = SiteConfiguration.getInstance();
          +
             private HdfsZooInstance() {
          -    AccumuloConfiguration acuConf = SiteConfiguration.getInstance();
          -    zooCache = new ZooCacheFactory().getZooCache(acuConf.get(Property.INSTANCE_ZK_HOST), (int) acuConf.getTimeInMillis(Property.INSTANCE_ZK_TIMEOUT));
          +    zooCache = new ZooCacheFactory().getZooCache(site.get(Property.INSTANCE_ZK_HOST), (int) site.getTimeInMillis(Property.INSTANCE_ZK_TIMEOUT));
             }
           
          -  private static HdfsZooInstance cachedHdfsZooInstance = null;
          +  private static final HdfsZooInstance cachedHdfsZooInstance = new HdfsZooInstance();
           
          -  public static synchronized Instance getInstance() {
          -    if (cachedHdfsZooInstance == null)
          -      cachedHdfsZooInstance = new HdfsZooInstance();
          +  public static Instance getInstance() {
               return cachedHdfsZooInstance;
             }
           
          -  private static ZooCache zooCache;
          +  private final ZooCache zooCache;
             private static String instanceId = null;
             private static final Logger log = Logger.getLogger(HdfsZooInstance.class);
           

          Is making ZooCache not static intentional? That seems like it would have serious implications (number of watchers would spike).

          Show
          elserj Josh Elser added a comment - - edited @@ -63,20 +64,19 @@ import com.google.common.base.Joiner; */ public class HdfsZooInstance implements Instance { + private final AccumuloConfiguration site = SiteConfiguration.getInstance(); + private HdfsZooInstance() { - AccumuloConfiguration acuConf = SiteConfiguration.getInstance(); - zooCache = new ZooCacheFactory().getZooCache(acuConf.get(Property.INSTANCE_ZK_HOST), ( int ) acuConf.getTimeInMillis(Property.INSTANCE_ZK_TIMEOUT)); + zooCache = new ZooCacheFactory().getZooCache(site.get(Property.INSTANCE_ZK_HOST), ( int ) site.getTimeInMillis(Property.INSTANCE_ZK_TIMEOUT)); } - private static HdfsZooInstance cachedHdfsZooInstance = null ; + private static final HdfsZooInstance cachedHdfsZooInstance = new HdfsZooInstance(); - public static synchronized Instance getInstance() { - if (cachedHdfsZooInstance == null ) - cachedHdfsZooInstance = new HdfsZooInstance(); + public static Instance getInstance() { return cachedHdfsZooInstance; } - private static ZooCache zooCache; + private final ZooCache zooCache; private static String instanceId = null ; private static final Logger log = Logger.getLogger(HdfsZooInstance.class); Is making ZooCache not static intentional? That seems like it would have serious implications (number of watchers would spike).
          Hide
          elserj Josh Elser added a comment -

          Oh, n/m. I see you said HdfsZooInstance is a singleton.

          Show
          elserj Josh Elser added a comment - Oh, n/m. I see you said HdfsZooInstance is a singleton.
          Hide
          elserj Josh Elser added a comment -
          +  /**
          +   * Properties we check the value of within the TabletServer request handling or maintenance processing loops.
          +   */
          +  public static final EnumSet<Property> HOT_PATH_PROPERTIES = EnumSet.of(Property.TSERV_CLIENT_TIMEOUT, Property.TSERV_TOTAL_MUTATION_QUEUE_MAX,
          +      Property.TSERV_ARCHIVE_WALOGS, Property.GC_TRASH_IGNORE, Property.TSERV_MAJC_DELAY, Property.TABLE_MINC_LOGS_MAX, Property.TSERV_MAJC_MAXCONCURRENT,
          +      Property.REPLICATION_WORKER_THREADS, Property.TABLE_DURABILITY, Property.INSTANCE_ZK_TIMEOUT, Property.TABLE_CLASSPATH);
          +
          

          It seems like it would be more simple if we just extracted all Accumulo properties (instead of this reduced list). Is there a big impact on memory for this? What about if we remove any that are still the default value (and let them be represented by the DefaultConfiguration)? My initial concern would be that we miss some, or add new "hot" properties, and we sign up to having to maintain this list.

          Show
          elserj Josh Elser added a comment - + /** + * Properties we check the value of within the TabletServer request handling or maintenance processing loops. + */ + public static final EnumSet<Property> HOT_PATH_PROPERTIES = EnumSet.of(Property.TSERV_CLIENT_TIMEOUT, Property.TSERV_TOTAL_MUTATION_QUEUE_MAX, + Property.TSERV_ARCHIVE_WALOGS, Property.GC_TRASH_IGNORE, Property.TSERV_MAJC_DELAY, Property.TABLE_MINC_LOGS_MAX, Property.TSERV_MAJC_MAXCONCURRENT, + Property.REPLICATION_WORKER_THREADS, Property.TABLE_DURABILITY, Property.INSTANCE_ZK_TIMEOUT, Property.TABLE_CLASSPATH); + It seems like it would be more simple if we just extracted all Accumulo properties (instead of this reduced list). Is there a big impact on memory for this? What about if we remove any that are still the default value (and let them be represented by the DefaultConfiguration)? My initial concern would be that we miss some, or add new "hot" properties, and we sign up to having to maintain this list.
          Hide
          busbey Sean Busbey added a comment -

          yeah, I was confused for awhile about what was going on with zooCache, since it was static but was initialized in the constructor of the singleton. Eventually decided it just must have been a mistake.

          Show
          busbey Sean Busbey added a comment - yeah, I was confused for awhile about what was going on with zooCache, since it was static but was initialized in the constructor of the singleton. Eventually decided it just must have been a mistake.
          Hide
          busbey Sean Busbey added a comment -

          Right now I do extract all of the properties that are set. This is just the list where we say to always default ot the parent if they weren't set. This effectively means we let them be represented by DefaultConfiguration.

          The presence of the "only use this for test" set method on SiteConfiguration leads me to presume some set of tests must be relying on the ability to poke values into the Hadoop Configuration object. As much as I'd like to just make this all read-only and throw away the Hadoop Configuration object entirely after construction, I didn't want to muddle things with possible test failures or having to rework them to no longer presume they can mutate on-disk configuration values.

          Show
          busbey Sean Busbey added a comment - Right now I do extract all of the properties that are set. This is just the list where we say to always default ot the parent if they weren't set. This effectively means we let them be represented by DefaultConfiguration. The presence of the "only use this for test" set method on SiteConfiguration leads me to presume some set of tests must be relying on the ability to poke values into the Hadoop Configuration object. As much as I'd like to just make this all read-only and throw away the Hadoop Configuration object entirely after construction, I didn't want to muddle things with possible test failures or having to rework them to no longer presume they can mutate on-disk configuration values.
          Hide
          elserj Josh Elser added a comment -

          Right now I do extract all of the properties that are set

          Ahh, apologies. I see this now. That leads me to wonder...

          -    String value = getXmlConfig().get(key);
          +    /* Check the available-on-load configs and fall-back to the possibly-update Configuration object. */
          +    String value = staticConfigs.containsKey(key) ? staticConfigs.get(key) : getXmlConfig().get(key);
          

          Why can't this just be String value = staticConfigs.get(key) then? We know that we have an explicit Accumulo config key already (given the method signature). Am I missing something else?

          yeah, I was confused for awhile about what was going on with zooCache, since it was static but was initialized in the constructor of the singleton. Eventually decided it just must have been a mistake.

          Makes sense, the bit-rot is real . HdfsZooInstance looks like it was always like this (from initial import).

          Show
          elserj Josh Elser added a comment - Right now I do extract all of the properties that are set Ahh, apologies. I see this now. That leads me to wonder... - String value = getXmlConfig().get(key); + /* Check the available-on-load configs and fall-back to the possibly-update Configuration object. */ + String value = staticConfigs.containsKey(key) ? staticConfigs.get(key) : getXmlConfig().get(key); Why can't this just be String value = staticConfigs.get(key) then? We know that we have an explicit Accumulo config key already (given the method signature). Am I missing something else? yeah, I was confused for awhile about what was going on with zooCache, since it was static but was initialized in the constructor of the singleton. Eventually decided it just must have been a mistake. Makes sense, the bit-rot is real . HdfsZooInstance looks like it was always like this (from initial import).
          Hide
          busbey Sean Busbey added a comment -

          Right now I do extract all of the properties that are set

          Ahh, apologies. I see this now. That leads me to wonder...

          -    String value = getXmlConfig().get(key);
          +    /* Check the available-on-load configs and fall-back to the possibly-update Configuration object. */
          +    String value = staticConfigs.containsKey(key) ? staticConfigs.get(key) : getXmlConfig().get(key);
          

          Why can't this just be String value = staticConfigs.get(key) then? We know that we have an explicit Accumulo config key already (given the method signature). Am I missing something else?

          SiteConfiguration has a set method that allows folks to add additional configuration keys to the underlying Hadoop Configuration object after instantiation. Any calls to set won't be reflected in staticConfigs, because they're read-only after instantiation of SiteConfiguration so that concurrent access can be safe.

          Show
          busbey Sean Busbey added a comment - Right now I do extract all of the properties that are set Ahh, apologies. I see this now. That leads me to wonder... - String value = getXmlConfig().get(key); + /* Check the available-on-load configs and fall-back to the possibly-update Configuration object. */ + String value = staticConfigs.containsKey(key) ? staticConfigs.get(key) : getXmlConfig().get(key); Why can't this just be String value = staticConfigs.get(key) then? We know that we have an explicit Accumulo config key already (given the method signature). Am I missing something else? SiteConfiguration has a set method that allows folks to add additional configuration keys to the underlying Hadoop Configuration object after instantiation. Any calls to set won't be reflected in staticConfigs , because they're read-only after instantiation of SiteConfiguration so that concurrent access can be safe.
          Hide
          busbey Sean Busbey added a comment -

          Forgot to mention above that the set method (that we effectively ignore for any configs that have been set in the xml configs) is expressly labeled as something that's only for test, which is why I think it's fine to ignore it at runtime.

          Show
          busbey Sean Busbey added a comment - Forgot to mention above that the set method (that we effectively ignore for any configs that have been set in the xml configs) is expressly labeled as something that's only for test, which is why I think it's fine to ignore it at runtime.
          Hide
          busbey Sean Busbey added a comment -

          Tests look good. Any other concerns here Josh Elser? Anyone concerned about cachedHdfsZooInstance not being lazily initialized anymore? Looking at the methods, I wasn't sure why you'd reference the HdfsZooInstance class (which will cause it to be initialized and the cachedHdfsZooInstance to be created) without calling getInstance, but maybe I'm missing something.

          Show
          busbey Sean Busbey added a comment - Tests look good. Any other concerns here Josh Elser ? Anyone concerned about cachedHdfsZooInstance not being lazily initialized anymore? Looking at the methods, I wasn't sure why you'd reference the HdfsZooInstance class (which will cause it to be initialized and the cachedHdfsZooInstance to be created) without calling getInstance , but maybe I'm missing something.
          Hide
          elserj Josh Elser added a comment -

          I think this is OK as-is Sean Busbey, but I'm still left wondering:

          It seems like it would be more simple if we just extracted all Accumulo properties (instead of this reduced list). Is there a big impact on memory for this?

          I just can't get over a worry that the HOT_PATH_PROPERTIES will rot and your optimization will be undone (or less efficient). We don't have that many properties, I think it might be better long-term to just put(prop.getKey(), null) for all of the Property's that are not explicitly defined.

          Forgot to mention above that the set method (that we effectively ignore for any configs that have been set in the xml configs) is expressly labeled as something that's only for test, which is why I think it's fine to ignore it at runtime.

          Any calls to set won't be reflected in staticConfigs, because they're read-only after instantiation of SiteConfiguration so that concurrent access can be safe.

          Agreed. Makes me think that we could do better at an API which allows injection of properties in some TestConfiguration (instead of trying to shim things in via set()). Maybe my (immediately) above worry could be alleviated by better test-API and then simplification of things here? (e.g. follow-on)

          Show
          elserj Josh Elser added a comment - I think this is OK as-is Sean Busbey , but I'm still left wondering: It seems like it would be more simple if we just extracted all Accumulo properties (instead of this reduced list). Is there a big impact on memory for this? I just can't get over a worry that the HOT_PATH_PROPERTIES will rot and your optimization will be undone (or less efficient). We don't have that many properties, I think it might be better long-term to just put(prop.getKey(), null) for all of the Property's that are not explicitly defined. Forgot to mention above that the set method (that we effectively ignore for any configs that have been set in the xml configs) is expressly labeled as something that's only for test, which is why I think it's fine to ignore it at runtime. Any calls to set won't be reflected in staticConfigs, because they're read-only after instantiation of SiteConfiguration so that concurrent access can be safe. Agreed. Makes me think that we could do better at an API which allows injection of properties in some TestConfiguration (instead of trying to shim things in via set() ). Maybe my (immediately) above worry could be alleviated by better test-API and then simplification of things here? (e.g. follow-on)
          Hide
          busbey Sean Busbey added a comment -

          It seems like it would be more simple if we just extracted all Accumulo properties (instead of this reduced list). Is there a big impact on memory for this?

          I just can't get over a worry that the HOT_PATH_PROPERTIES will rot and your optimization will be undone (or less efficient). We don't have that many properties, I think it might be better long-term to just put(prop.getKey(), null) for all of the Property's that are not explicitly defined.

          If we want to do that, it's simpler to just not keep around / use the underlying Hadoop Configuration object and default to the parent config for anything that's not in the cached hashmap. I'll push this and then make a follow on to give that a try.

          I agree on the general concern for HOT_PATH_PROPERTIES, but I think we need an answer for regular automated checks for perf regressions with or without it.

          Thanks for the feedback!

          Show
          busbey Sean Busbey added a comment - It seems like it would be more simple if we just extracted all Accumulo properties (instead of this reduced list). Is there a big impact on memory for this? I just can't get over a worry that the HOT_PATH_PROPERTIES will rot and your optimization will be undone (or less efficient). We don't have that many properties, I think it might be better long-term to just put(prop.getKey(), null) for all of the Property's that are not explicitly defined. If we want to do that, it's simpler to just not keep around / use the underlying Hadoop Configuration object and default to the parent config for anything that's not in the cached hashmap. I'll push this and then make a follow on to give that a try. I agree on the general concern for HOT_PATH_PROPERTIES, but I think we need an answer for regular automated checks for perf regressions with or without it. Thanks for the feedback!
          Hide
          elserj Josh Elser added a comment -

          Awesome, and thank you for digging into this. It sounds like it is a very nice improvement.

          I'd be curious about numbers before/after (casually collected) if you have them too

          Show
          elserj Josh Elser added a comment - Awesome, and thank you for digging into this. It sounds like it is a very nice improvement. I'd be curious about numbers before/after (casually collected) if you have them too
          Hide
          ctubbsii Christopher Tubbs added a comment -

          Is there a reason we need the HOT_PATHS_PROPERTIES at all? Why would we ever go back to getXmlConfig()? Why not just always default to the parent (DefaultConfiguration) when it doesn't exist in SiteConfiguration's static set of props on first load?

          Show
          ctubbsii Christopher Tubbs added a comment - Is there a reason we need the HOT_PATHS_PROPERTIES at all? Why would we ever go back to getXmlConfig()? Why not just always default to the parent (DefaultConfiguration) when it doesn't exist in SiteConfiguration's static set of props on first load?
          Hide
          ctubbsii Christopher Tubbs added a comment -

          Also, containsKey(key) followed by get(key) is unnecessarily inefficient. Just do a get, and check if it's null.

          Also, since this was committed already to 1.8, I'm going to merge it to master... because I'm merging another change, and this one hasn't yet been merged to master, so it'll get picked up with mine.

          Show
          ctubbsii Christopher Tubbs added a comment - Also, containsKey(key) followed by get(key) is unnecessarily inefficient. Just do a get, and check if it's null. Also, since this was committed already to 1.8, I'm going to merge it to master... because I'm merging another change, and this one hasn't yet been merged to master, so it'll get picked up with mine.
          Hide
          elserj Josh Elser added a comment - - edited

          Is there a reason we need the HOT_PATHS_PROPERTIES at all? Why would we ever go back to getXmlConfig()? Why not just always default to the parent (DefaultConfiguration) when it doesn't exist in SiteConfiguration's static set of props on first load?

          I think I had asked this same question. Is this still unclear from the earlier discussion?

          Also, containsKey(key) followed by get(key) is unnecessarily inefficient. Just do a get, and check if it's null.

          Another thing I had noticed, but I don't think that the semantics are the same because he loads a null value into the map. I believe what he has presently is correct.

          Show
          elserj Josh Elser added a comment - - edited Is there a reason we need the HOT_PATHS_PROPERTIES at all? Why would we ever go back to getXmlConfig()? Why not just always default to the parent (DefaultConfiguration) when it doesn't exist in SiteConfiguration's static set of props on first load? I think I had asked this same question. Is this still unclear from the earlier discussion? Also, containsKey(key) followed by get(key) is unnecessarily inefficient. Just do a get, and check if it's null. Another thing I had noticed, but I don't think that the semantics are the same because he loads a null value into the map. I believe what he has presently is correct.
          Hide
          ctubbsii Christopher Tubbs added a comment -

          Is this still unclear from the earlier discussion?

          Yes, because of what Sean Busbey said:

          If we want to do that, it's simpler to just not keep around / use the underlying Hadoop Configuration object and default to the parent config for anything that's not in the cached hashmap. I'll push this and then make a follow on to give that a try.

          It's not clear that was attempted yet... but I think it's the right way to go.

          I believe what he has presently is correct.

          You're right. I was confused about the semantics of inserting null values. If we adopt the strategy I quoted from Sean, then that whole part goes away, of course.

          Show
          ctubbsii Christopher Tubbs added a comment - Is this still unclear from the earlier discussion? Yes, because of what Sean Busbey said: If we want to do that, it's simpler to just not keep around / use the underlying Hadoop Configuration object and default to the parent config for anything that's not in the cached hashmap. I'll push this and then make a follow on to give that a try. It's not clear that was attempted yet... but I think it's the right way to go. I believe what he has presently is correct. You're right. I was confused about the semantics of inserting null values. If we adopt the strategy I quoted from Sean, then that whole part goes away, of course.
          Hide
          busbey Sean Busbey added a comment -

          I was still running tests on my 1.8 merge into master before pushing it.

          Show
          busbey Sean Busbey added a comment - I was still running tests on my 1.8 merge into master before pushing it.
          Hide
          busbey Sean Busbey added a comment - - edited

          If we want to do that, it's simpler to just not keep around / use the underlying Hadoop Configuration object and default to the parent config for anything that's not in the cached hashmap. I'll push this and then make a follow on to give that a try.

          It's not clear that was attempted yet... but I think it's the right way to go.

          I'll link the follow on jira here once I file it. I was waiting until I finished merging. The earlier discussion about the use of SiteConfiguration#set covers why I didn't do it here.

          Show
          busbey Sean Busbey added a comment - - edited If we want to do that, it's simpler to just not keep around / use the underlying Hadoop Configuration object and default to the parent config for anything that's not in the cached hashmap. I'll push this and then make a follow on to give that a try. It's not clear that was attempted yet... but I think it's the right way to go. I'll link the follow on jira here once I file it. I was waiting until I finished merging. The earlier discussion about the use of SiteConfiguration#set covers why I didn't do it here.
          Hide
          busbey Sean Busbey added a comment -

          Linking the follow-on jira for improvement ot SiteConfiguration, ACCUMULO-4460

          Show
          busbey Sean Busbey added a comment - Linking the follow-on jira for improvement ot SiteConfiguration, ACCUMULO-4460
          Hide
          ctubbsii Christopher Tubbs added a comment -

          I was still running tests on my 1.8 merge into master before pushing it.

          Cool. Thanks for the clarification. You probably already know this, but for anybody else who may not, you can merge locally, and push all the branches at once using: git push origin :

          Show
          ctubbsii Christopher Tubbs added a comment - I was still running tests on my 1.8 merge into master before pushing it. Cool. Thanks for the clarification. You probably already know this, but for anybody else who may not, you can merge locally, and push all the branches at once using: git push origin :

            People

            • Assignee:
              busbey Sean Busbey
              Reporter:
              busbey Sean Busbey
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 50m
                50m

                  Development