Uploaded image for project: 'HCatalog'
  1. HCatalog
  2. HCATALOG-332

HCatalog needs interface classification

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      All the public facing interfaces should be marked for intended audience and stability. Ref: HADOOP-5073

      1. HCAT-332-2.patch
        21 kB
        Alan Gates
      2. HCAT-332.patch
        18 kB
        Alan Gates

        Activity

        Hide
        ashutoshc Ashutosh Chauhan added a comment -

        In particular, apis introduced in HCATALOG-287 are limited-private and evolving.

        Show
        ashutoshc Ashutosh Chauhan added a comment - In particular, apis introduced in HCATALOG-287 are limited-private and evolving.
        Hide
        alangates Alan Gates added a comment -

        This patch adds stability and intended audience labels to classes and interfaces that I assume will be used directly by HCat users. All other interfaces are assumed to be private. I did mark the new transfer APIs as Public Evolving instead of Limited Private.

        Reviewers should check that I didn't miss any important classes and that my classifications make sense.

        Show
        alangates Alan Gates added a comment - This patch adds stability and intended audience labels to classes and interfaces that I assume will be used directly by HCat users. All other interfaces are assumed to be private. I did mark the new transfer APIs as Public Evolving instead of Limited Private. Reviewers should check that I didn't miss any important classes and that my classifications make sense.
        Hide
        alangates Alan Gates added a comment -

        New version of the patch that includes some files I missed in the first version.

        Show
        alangates Alan Gates added a comment - New version of the patch that includes some files I missed in the first version.
        Hide
        toffer Francis Liu added a comment -

        Some comments:

        HCatStorageHandler - I believe the goals is to deprecate this and have storageHandlers depend on the hive one only. After HIVE-2773, I believe we are at parity, though until random read framework gets fully baked we will still internally use this for experimentation?
        HCatBaseInputFormat and HCatBaseOutputFormat - Not sure what public means in this context? Does that mean we provide support to users for subclassing these classes? Or do they only have access to public methods?
        PartInfo - this class needs some refactoring since it contains job related fields, I'd consider it evolving?

        Show
        toffer Francis Liu added a comment - Some comments: HCatStorageHandler - I believe the goals is to deprecate this and have storageHandlers depend on the hive one only. After HIVE-2773 , I believe we are at parity, though until random read framework gets fully baked we will still internally use this for experimentation? HCatBaseInputFormat and HCatBaseOutputFormat - Not sure what public means in this context? Does that mean we provide support to users for subclassing these classes? Or do they only have access to public methods? PartInfo - this class needs some refactoring since it contains job related fields, I'd consider it evolving?
        Hide
        alangates Alan Gates added a comment -

        So are you saying HCatStorageHandler should be marked deprecated now? Or should we mark it private/evolving?

        On HCatBase*Format and PartInfo, the issue is that both of these are exposed (eventually) through HCatInputFormat and HCatOutputFormat. HCIF/HCOF have to be public/stable. And it makes no sense to say an API that is stable relies on classes that are evolving.

        Show
        alangates Alan Gates added a comment - So are you saying HCatStorageHandler should be marked deprecated now? Or should we mark it private/evolving? On HCatBase*Format and PartInfo, the issue is that both of these are exposed (eventually) through HCatInputFormat and HCatOutputFormat. HCIF/HCOF have to be public/stable. And it makes no sense to say an API that is stable relies on classes that are evolving.
        Hide
        toffer Francis Liu added a comment -

        HCatStorageHandler - private/evolving sounds good to me

        PartInfo is not part of the IF/OF. It is a metadata for a partition. Just like HCatTableInfo is for a table.

        As for HCatBase*Format. Making it public means that users are free to subclass this and I believe this is something internal (and should probably be deprecated in the future).

        Show
        toffer Francis Liu added a comment - HCatStorageHandler - private/evolving sounds good to me PartInfo is not part of the IF/OF. It is a metadata for a partition. Just like HCatTableInfo is for a table. As for HCatBase*Format. Making it public means that users are free to subclass this and I believe this is something internal (and should probably be deprecated in the future).
        Hide
        toffer Francis Liu added a comment -

        As for PartInfo's exposure. My understanding of having it evolving means that it's interface may change. Which is exclusive of HCIF's interface changing though they maybe be related but isn't that one of the reasons for having these annotations? To have a finer grained control over interface scoping?

        Show
        toffer Francis Liu added a comment - As for PartInfo's exposure. My understanding of having it evolving means that it's interface may change. Which is exclusive of HCIF's interface changing though they maybe be related but isn't that one of the reasons for having these annotations? To have a finer grained control over interface scoping?
        Hide
        alangates Alan Gates added a comment -

        As for HCatBase*Format. Making it public means that users are free to subclass this and I believe this is something internal (and should probably be deprecated in the future).

        So this is a semantics question. If we say HCatInputFormat is public but HCatBaseInputFormat is private, I guess that makes sense. Both must be stable though, because HCatBaseInputFormat defines part of HCIF's interface.

        InputJobInfo.getPartitions() returns a list of PartInfos. InputJobInfo is stable. It seems meaningless to declare InputJobInfo to have a stable interface while threatening to evolve classes used in that interface. My assertion is that stability is cumulative. No part of a stable interface can be less than stable.

        Show
        alangates Alan Gates added a comment - As for HCatBase*Format. Making it public means that users are free to subclass this and I believe this is something internal (and should probably be deprecated in the future). So this is a semantics question. If we say HCatInputFormat is public but HCatBaseInputFormat is private, I guess that makes sense. Both must be stable though, because HCatBaseInputFormat defines part of HCIF's interface. InputJobInfo.getPartitions() returns a list of PartInfos. InputJobInfo is stable. It seems meaningless to declare InputJobInfo to have a stable interface while threatening to evolve classes used in that interface. My assertion is that stability is cumulative. No part of a stable interface can be less than stable.
        Hide
        toffer Francis Liu added a comment -

        So this is a semantics question. If we say HCatInputFormat is public but HCatBaseInputFormat is private, I guess that makes sense. Both must be stable though, because HCatBaseInputFormat defines part of HCIF's interface.

        That's fine, I only want to prevent users from subclassing the base class since this is something internal.

        InputJobInfo.getPartitions() returns a list of PartInfos. InputJobInfo is stable. It seems meaningless to declare InputJobInfo to have a stable interface while threatening to evolve classes used in that interface. My assertion is that stability is cumulative. No part of a stable interface can be less than stable.

        If we decide to follow that semantic then anything using PartInfo shouldn't be stable. As it will be changing soon. HCATALOG-308 alluded to the fact that we are missing a partition descriptor class and some of that class' responsibility is currently being done by PartInfo.

        Show
        toffer Francis Liu added a comment - So this is a semantics question. If we say HCatInputFormat is public but HCatBaseInputFormat is private, I guess that makes sense. Both must be stable though, because HCatBaseInputFormat defines part of HCIF's interface. That's fine, I only want to prevent users from subclassing the base class since this is something internal. InputJobInfo.getPartitions() returns a list of PartInfos. InputJobInfo is stable. It seems meaningless to declare InputJobInfo to have a stable interface while threatening to evolve classes used in that interface. My assertion is that stability is cumulative. No part of a stable interface can be less than stable. If we decide to follow that semantic then anything using PartInfo shouldn't be stable. As it will be changing soon. HCATALOG-308 alluded to the fact that we are missing a partition descriptor class and some of that class' responsibility is currently being done by PartInfo.
        Hide
        alangates Alan Gates added a comment -

        Marking something as Stable doesn't mean it can't change. It means it can't change in a non-backward compatible way. With these designations we can still change PartInfo in the ways you suggest. Worst case we can create a PartInfo2 and deprecate PartInfo, as long as we continue to support it for a while.

        Show
        alangates Alan Gates added a comment - Marking something as Stable doesn't mean it can't change. It means it can't change in a non-backward compatible way. With these designations we can still change PartInfo in the ways you suggest. Worst case we can create a PartInfo2 and deprecate PartInfo, as long as we continue to support it for a while.
        Hide
        traviscrawford Travis Crawford added a comment -

        Canceling patch to clear up the "patch available" report; there's a good discussion going here though!

        Show
        traviscrawford Travis Crawford added a comment - Canceling patch to clear up the "patch available" report; there's a good discussion going here though!
        Hide
        traviscrawford Travis Crawford added a comment -

        These interface classifications will be super useful, since right now its not always clear what our contract with users is.

        An issue discussed here is how to treat base classes, since they may be needed by public/stable classes. It seems like we could initially take the approach of making as little as possible public, but anything public/stable depends on would be declared stable.

        Thoughts?

        Show
        traviscrawford Travis Crawford added a comment - These interface classifications will be super useful, since right now its not always clear what our contract with users is. An issue discussed here is how to treat base classes, since they may be needed by public/stable classes. It seems like we could initially take the approach of making as little as possible public, but anything public/stable depends on would be declared stable. Thoughts?
        Hide
        toffer Francis Liu added a comment -

        but anything public/stable depends on would be declared stable.

        This would be an ideal to follow but it seems a bit restrictive to me. What if I add a new experimental method and class as an argument: HCatInputFormat.foo(Bar arg). Does that mean that we can't add new methods until we can mark all the new classes it exposes as stable?

        Show
        toffer Francis Liu added a comment - but anything public/stable depends on would be declared stable. This would be an ideal to follow but it seems a bit restrictive to me. What if I add a new experimental method and class as an argument: HCatInputFormat.foo(Bar arg). Does that mean that we can't add new methods until we can mark all the new classes it exposes as stable?
        Hide
        traviscrawford Travis Crawford added a comment -

        That's a good point, it would be restrictive in the case you mentioned above.

        What do you think about marking the classes that are obviously public as such (HCatLoader, HCatInputFormat, ...) and leaving others unmarked, implying they're private. This would allow us to get some experience evolving interfaces consistent with the stability contract, without having to figure out the corner cases. As we gain experience hopefully we learn how to deal with corner cases like you mentioned above. For example, maybe just the experimental method is marked as such. I just checked Hive and found an example where just a single method is annotated suggesting we could go that route too.

        I think the high-level goal of these annotations is to let users know what's safe to use. If we start small and build from there I think we'll end up giving some pretty good guidance.

        Show
        traviscrawford Travis Crawford added a comment - That's a good point, it would be restrictive in the case you mentioned above. What do you think about marking the classes that are obviously public as such (HCatLoader, HCatInputFormat, ...) and leaving others unmarked, implying they're private. This would allow us to get some experience evolving interfaces consistent with the stability contract, without having to figure out the corner cases. As we gain experience hopefully we learn how to deal with corner cases like you mentioned above. For example, maybe just the experimental method is marked as such. I just checked Hive and found an example where just a single method is annotated suggesting we could go that route too. I think the high-level goal of these annotations is to let users know what's safe to use. If we start small and build from there I think we'll end up giving some pretty good guidance.
        Hide
        toffer Francis Liu added a comment -

        That sounds good to me.

        Show
        toffer Francis Liu added a comment - That sounds good to me.

          People

          • Assignee:
            alangates Alan Gates
            Reporter:
            ashutoshc Ashutosh Chauhan
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:

              Development