Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-14238

[Umbrella] Rechecking Guava's object is not exposed to user-facing API

    Details

    • Type: Improvement
    • Status: In Progress
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Target Version/s:

      Description

      This is reported by Hitesh Shah on HADOOP-10101.
      At least, AMRMClient#waitFor takes Guava's Supplier instance as an instance.

        Issue Links

          Activity

          Hide
          ozawa Tsuyoshi Ozawa added a comment - - edited

          I'm checking with a following method for judging whether user-facing api use Guava's object, though it's not elegant one:

          1. Executing find . -name "*.java" | xargs grep "google.common" under libraries which seems user-facing api (hadoop-hdfs-project/hadoop-hdfs-client/, hadoop-common-project, , hadoop-yarn-project/hadoop-yarn/{hadoop-yarn-client, hadoop-yarn-client}, hadoop-mapreduce-project/hadoop-mapreduce-client, )
          2. Open code and whether it has @InterfaceAudience.Private. If so, we don't need to check the code.
          3. Check the usage of guava's objects.

          Show
          ozawa Tsuyoshi Ozawa added a comment - - edited I'm checking with a following method for judging whether user-facing api use Guava's object, though it's not elegant one: 1. Executing find . -name "*.java" | xargs grep "google.common" under libraries which seems user-facing api (hadoop-hdfs-project/hadoop-hdfs-client/, hadoop-common-project, , hadoop-yarn-project/hadoop-yarn/{hadoop-yarn-client, hadoop-yarn-client}, hadoop-mapreduce-project/hadoop-mapreduce-client, ) 2. Open code and whether it has @InterfaceAudience.Private. If so, we don't need to check the code. 3. Check the usage of guava's objects.
          Hide
          ozawa Tsuyoshi Ozawa added a comment -

          I checked hadoop-hdfs-project/hadoop-hdfs-client/, and there are no exposure of Guava's object.

          Show
          ozawa Tsuyoshi Ozawa added a comment - I checked hadoop-hdfs-project/hadoop-hdfs-client/, and there are no exposure of Guava's object.
          Hide
          andrew.wang Andrew Wang added a comment -

          Since HADOOP-10101 was committed, is the plan to finish this investigation, then shade Guava everywhere?

          A more sure way of finding API pollution would be writing a bytecode analysis tool. We'd filter for classes and methods annotated IA.Public and LimitedPrivate, then check those method signatures for Guava class references.

          Another possible quick thing you could try is using JACC or JDiff to compare current vs. shaded Guava.

          Show
          andrew.wang Andrew Wang added a comment - Since HADOOP-10101 was committed, is the plan to finish this investigation, then shade Guava everywhere? A more sure way of finding API pollution would be writing a bytecode analysis tool. We'd filter for classes and methods annotated IA.Public and LimitedPrivate, then check those method signatures for Guava class references. Another possible quick thing you could try is using JACC or JDiff to compare current vs. shaded Guava.
          Hide
          ozawa Tsuyoshi Ozawa added a comment -

          Thanks Andrew for the information. Annotation File Utilities, being included in Checker Framework, can extract Annotations included in classes. Hence, we can use it to check whether classes have IA.Public and LimitedPrivate. Let me try.

          https://checkerframework.org/annotation-file-utilities/#extract-annotations

          Show
          ozawa Tsuyoshi Ozawa added a comment - Thanks Andrew for the information. Annotation File Utilities, being included in Checker Framework, can extract Annotations included in classes. Hence, we can use it to check whether classes have IA.Public and LimitedPrivate. Let me try. https://checkerframework.org/annotation-file-utilities/#extract-annotations
          Hide
          andrew.wang Andrew Wang added a comment -

          I'm upgrading this to a blocker, since it's a prereq for shading. I'm also going to file a blocker JIRA for shading guava everywhere.

          Show
          andrew.wang Andrew Wang added a comment - I'm upgrading this to a blocker, since it's a prereq for shading. I'm also going to file a blocker JIRA for shading guava everywhere.
          Hide
          andrew.wang Andrew Wang added a comment -

          Hi Tsuyoshi Ozawa, do you have an update for this JIRA?

          Show
          andrew.wang Andrew Wang added a comment - Hi Tsuyoshi Ozawa , do you have an update for this JIRA?
          Hide
          andrew.wang Andrew Wang added a comment -

          Can we resolve this given that the shaded client work seems to be working okay?

          Show
          andrew.wang Andrew Wang added a comment - Can we resolve this given that the shaded client work seems to be working okay?
          Hide
          bharatviswa Bharat Viswanadham added a comment -

          Andrew Wang Vinod Kumar Vavilapalli Can we use java8 supplier here, instead of guava to avoid this issue?

          Show
          bharatviswa Bharat Viswanadham added a comment - Andrew Wang Vinod Kumar Vavilapalli Can we use java8 supplier here, instead of guava to avoid this issue?
          Hide
          ozawa Tsuyoshi Ozawa added a comment -

          Andrew Wang Vinod Kumar Vavilapalli As far as I know and as Bharat Viswanadham mentioned, it would be ideal to substitute Guava's Supplier with java's one.

          Show
          ozawa Tsuyoshi Ozawa added a comment - Andrew Wang Vinod Kumar Vavilapalli As far as I know and as Bharat Viswanadham mentioned, it would be ideal to substitute Guava's Supplier with java's one.
          Hide
          bharatviswa Bharat Viswanadham added a comment -

          Tsuyoshi Ozawa As this umbrella jira, created a subtask for updating AMRMClient and AMRMClientAysnc.java files.

          Show
          bharatviswa Bharat Viswanadham added a comment - Tsuyoshi Ozawa As this umbrella jira, created a subtask for updating AMRMClient and AMRMClientAysnc.java files.
          Hide
          andrew.wang Andrew Wang added a comment -

          Hi Bharat Viswanadham is this JIRA tracking towards completion for beta1 (end-of-week)?

          Sean Busbey mentioned the apilyzer plugin to me which Accumulo uses, it might help for finding these classes.

          Show
          andrew.wang Andrew Wang added a comment - Hi Bharat Viswanadham is this JIRA tracking towards completion for beta1 (end-of-week)? Sean Busbey mentioned the apilyzer plugin to me which Accumulo uses, it might help for finding these classes.
          Hide
          bharatviswa Bharat Viswanadham added a comment -

          Andrew Wang As this is umbrella jira, created subtask for AMRMClient.
          And in the comments it is mentioned that hadoop-hdfs-project/hadoop-hdfs-client/ does not have any references.

          I can look over this apilyzer plugin, but i have not played with that and it may take some time for me, if any one has idea about that, they can take this up work item if it is needed work item by end of week.

          Show
          bharatviswa Bharat Viswanadham added a comment - Andrew Wang As this is umbrella jira, created subtask for AMRMClient. And in the comments it is mentioned that hadoop-hdfs-project/hadoop-hdfs-client/ does not have any references. I can look over this apilyzer plugin, but i have not played with that and it may take some time for me, if any one has idea about that, they can take this up work item if it is needed work item by end of week.
          Hide
          andrew.wang Andrew Wang added a comment -

          Okay, thanks Bharat. I wrote a little shell script possibly useful with apilyzer to generate a regex for fully-qualified classnames that mention "InterfaceAudience.Public":

          echo -n "("
          for f in `ag --java -l InterfaceAudience.Public`; do
              package=$(grep package "$f" | cut -d " " -f 2 | tr -d ";")
              classname=$(basename "$f" ".java")
              echo -n $package.$classname"|"
          done
          echo -n ")"
          

          I couldn't get apilyzer to work after playing with it for about an hour. A different tool might still be better.

          Show
          andrew.wang Andrew Wang added a comment - Okay, thanks Bharat. I wrote a little shell script possibly useful with apilyzer to generate a regex for fully-qualified classnames that mention "InterfaceAudience.Public": echo -n "(" for f in `ag --java -l InterfaceAudience.Public`; do package =$(grep package "$f" | cut -d " " -f 2 | tr -d ";" ) classname=$(basename "$f" ".java" ) echo -n $ package .$classname "|" done echo -n ")" I couldn't get apilyzer to work after playing with it for about an hour. A different tool might still be better.
          Hide
          busbey Sean Busbey added a comment -

          If we've taken care of all the problems we know about, let's close this out. We can open new jiras for new problems.

          Sad to hear apilyzer didn't work out. Andrew Wang could you send them a ping about the difficulty? Maybe we can get one of its maintainers to walk us through use.

          Show
          busbey Sean Busbey added a comment - If we've taken care of all the problems we know about, let's close this out. We can open new jiras for new problems. Sad to hear apilyzer didn't work out. Andrew Wang could you send them a ping about the difficulty? Maybe we can get one of its maintainers to walk us through use.
          Hide
          andrew.wang Andrew Wang added a comment -

          Thanks Sean, I downgraded this to critical to get it off the blocker list.

          Sure, I can ping the apilyzer folks.

          Show
          andrew.wang Andrew Wang added a comment - Thanks Sean, I downgraded this to critical to get it off the blocker list. Sure, I can ping the apilyzer folks.
          Hide
          haibochen Haibo Chen added a comment -

          I checked mapreduce client modules manually, that is, hadoop-mapreduce-client-jobclient, hadoop-mapreduce-client-core, hadoop-mapreduce-client-common. There is no exposure of GUAVA in the public api.

          Show
          haibochen Haibo Chen added a comment - I checked mapreduce client modules manually, that is, hadoop-mapreduce-client-jobclient, hadoop-mapreduce-client-core, hadoop-mapreduce-client-common. There is no exposure of GUAVA in the public api.
          Hide
          haibochen Haibo Chen added a comment -

          Manually checking hadoop-yarn-api, hadoop-yarn-applications, hadoop-yarn-common, hadoop-yarn-client and hadoop-yarn-registry,
          there is no exposure either.
          We do, however, expose Optional in ReconfigurationTaskStatus which is marked as public and stable. I think this needs to be fixed.

          It'll still be great if we can still get the tool to double check just in case, cc Andrew Wang.

          Show
          haibochen Haibo Chen added a comment - Manually checking hadoop-yarn-api, hadoop-yarn-applications, hadoop-yarn-common, hadoop-yarn-client and hadoop-yarn-registry, there is no exposure either. We do, however, expose Optional in ReconfigurationTaskStatus which is marked as public and stable. I think this needs to be fixed. It'll still be great if we can still get the tool to double check just in case, cc Andrew Wang .
          Hide
          stevel@apache.org Steve Loughran added a comment -

          thanks for doing this

          Show
          stevel@apache.org Steve Loughran added a comment - thanks for doing this
          Hide
          busbey Sean Busbey added a comment -

          apilyzer should work with our annotations now. Folks have a preferred way to get an initial report? I could hook it into Yetus, or I could just run it manually and put up a report output?

          Show
          busbey Sean Busbey added a comment - apilyzer should work with our annotations now. Folks have a preferred way to get an initial report? I could hook it into Yetus, or I could just run it manually and put up a report output?
          Hide
          haibochen Haibo Chen added a comment -

          +1 on hooking it into Yetus if it is not a lot of work.

          Show
          haibochen Haibo Chen added a comment - +1 on hooking it into Yetus if it is not a lot of work.

            People

            • Assignee:
              Unassigned
              Reporter:
              ozawa Tsuyoshi Ozawa
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:

                Development