HCatalog
  1. HCatalog
  2. HCATALOG-332

HCatalog needs interface classification

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      All the public facing interfaces should be marked for intended audience and stability. Ref: HADOOP-5073

      1. HCAT-332-2.patch
        21 kB
        Alan Gates
      2. HCAT-332.patch
        18 kB
        Alan Gates

        Activity

        Hide
        Ashutosh Chauhan added a comment -

        In particular, apis introduced in HCATALOG-287 are limited-private and evolving.

        Show
        Ashutosh Chauhan added a comment - In particular, apis introduced in HCATALOG-287 are limited-private and evolving.
        Hide
        Alan Gates added a comment -

        This patch adds stability and intended audience labels to classes and interfaces that I assume will be used directly by HCat users. All other interfaces are assumed to be private. I did mark the new transfer APIs as Public Evolving instead of Limited Private.

        Reviewers should check that I didn't miss any important classes and that my classifications make sense.

        Show
        Alan Gates added a comment - This patch adds stability and intended audience labels to classes and interfaces that I assume will be used directly by HCat users. All other interfaces are assumed to be private. I did mark the new transfer APIs as Public Evolving instead of Limited Private. Reviewers should check that I didn't miss any important classes and that my classifications make sense.
        Hide
        Alan Gates added a comment -

        New version of the patch that includes some files I missed in the first version.

        Show
        Alan Gates added a comment - New version of the patch that includes some files I missed in the first version.
        Hide
        Francis Liu added a comment -

        Some comments:

        HCatStorageHandler - I believe the goals is to deprecate this and have storageHandlers depend on the hive one only. After HIVE-2773, I believe we are at parity, though until random read framework gets fully baked we will still internally use this for experimentation?
        HCatBaseInputFormat and HCatBaseOutputFormat - Not sure what public means in this context? Does that mean we provide support to users for subclassing these classes? Or do they only have access to public methods?
        PartInfo - this class needs some refactoring since it contains job related fields, I'd consider it evolving?

        Show
        Francis Liu added a comment - Some comments: HCatStorageHandler - I believe the goals is to deprecate this and have storageHandlers depend on the hive one only. After HIVE-2773 , I believe we are at parity, though until random read framework gets fully baked we will still internally use this for experimentation? HCatBaseInputFormat and HCatBaseOutputFormat - Not sure what public means in this context? Does that mean we provide support to users for subclassing these classes? Or do they only have access to public methods? PartInfo - this class needs some refactoring since it contains job related fields, I'd consider it evolving?
        Hide
        Alan Gates added a comment -

        So are you saying HCatStorageHandler should be marked deprecated now? Or should we mark it private/evolving?

        On HCatBase*Format and PartInfo, the issue is that both of these are exposed (eventually) through HCatInputFormat and HCatOutputFormat. HCIF/HCOF have to be public/stable. And it makes no sense to say an API that is stable relies on classes that are evolving.

        Show
        Alan Gates added a comment - So are you saying HCatStorageHandler should be marked deprecated now? Or should we mark it private/evolving? On HCatBase*Format and PartInfo, the issue is that both of these are exposed (eventually) through HCatInputFormat and HCatOutputFormat. HCIF/HCOF have to be public/stable. And it makes no sense to say an API that is stable relies on classes that are evolving.
        Hide
        Francis Liu added a comment -

        HCatStorageHandler - private/evolving sounds good to me

        PartInfo is not part of the IF/OF. It is a metadata for a partition. Just like HCatTableInfo is for a table.

        As for HCatBase*Format. Making it public means that users are free to subclass this and I believe this is something internal (and should probably be deprecated in the future).

        Show
        Francis Liu added a comment - HCatStorageHandler - private/evolving sounds good to me PartInfo is not part of the IF/OF. It is a metadata for a partition. Just like HCatTableInfo is for a table. As for HCatBase*Format. Making it public means that users are free to subclass this and I believe this is something internal (and should probably be deprecated in the future).
        Hide
        Francis Liu added a comment -

        As for PartInfo's exposure. My understanding of having it evolving means that it's interface may change. Which is exclusive of HCIF's interface changing though they maybe be related but isn't that one of the reasons for having these annotations? To have a finer grained control over interface scoping?

        Show
        Francis Liu added a comment - As for PartInfo's exposure. My understanding of having it evolving means that it's interface may change. Which is exclusive of HCIF's interface changing though they maybe be related but isn't that one of the reasons for having these annotations? To have a finer grained control over interface scoping?
        Hide
        Alan Gates added a comment -

        As for HCatBase*Format. Making it public means that users are free to subclass this and I believe this is something internal (and should probably be deprecated in the future).

        So this is a semantics question. If we say HCatInputFormat is public but HCatBaseInputFormat is private, I guess that makes sense. Both must be stable though, because HCatBaseInputFormat defines part of HCIF's interface.

        InputJobInfo.getPartitions() returns a list of PartInfos. InputJobInfo is stable. It seems meaningless to declare InputJobInfo to have a stable interface while threatening to evolve classes used in that interface. My assertion is that stability is cumulative. No part of a stable interface can be less than stable.

        Show
        Alan Gates added a comment - As for HCatBase*Format. Making it public means that users are free to subclass this and I believe this is something internal (and should probably be deprecated in the future). So this is a semantics question. If we say HCatInputFormat is public but HCatBaseInputFormat is private, I guess that makes sense. Both must be stable though, because HCatBaseInputFormat defines part of HCIF's interface. InputJobInfo.getPartitions() returns a list of PartInfos. InputJobInfo is stable. It seems meaningless to declare InputJobInfo to have a stable interface while threatening to evolve classes used in that interface. My assertion is that stability is cumulative. No part of a stable interface can be less than stable.
        Hide
        Francis Liu added a comment -

        So this is a semantics question. If we say HCatInputFormat is public but HCatBaseInputFormat is private, I guess that makes sense. Both must be stable though, because HCatBaseInputFormat defines part of HCIF's interface.

        That's fine, I only want to prevent users from subclassing the base class since this is something internal.

        InputJobInfo.getPartitions() returns a list of PartInfos. InputJobInfo is stable. It seems meaningless to declare InputJobInfo to have a stable interface while threatening to evolve classes used in that interface. My assertion is that stability is cumulative. No part of a stable interface can be less than stable.

        If we decide to follow that semantic then anything using PartInfo shouldn't be stable. As it will be changing soon. HCATALOG-308 alluded to the fact that we are missing a partition descriptor class and some of that class' responsibility is currently being done by PartInfo.

        Show
        Francis Liu added a comment - So this is a semantics question. If we say HCatInputFormat is public but HCatBaseInputFormat is private, I guess that makes sense. Both must be stable though, because HCatBaseInputFormat defines part of HCIF's interface. That's fine, I only want to prevent users from subclassing the base class since this is something internal. InputJobInfo.getPartitions() returns a list of PartInfos. InputJobInfo is stable. It seems meaningless to declare InputJobInfo to have a stable interface while threatening to evolve classes used in that interface. My assertion is that stability is cumulative. No part of a stable interface can be less than stable. If we decide to follow that semantic then anything using PartInfo shouldn't be stable. As it will be changing soon. HCATALOG-308 alluded to the fact that we are missing a partition descriptor class and some of that class' responsibility is currently being done by PartInfo.
        Hide
        Alan Gates added a comment -

        Marking something as Stable doesn't mean it can't change. It means it can't change in a non-backward compatible way. With these designations we can still change PartInfo in the ways you suggest. Worst case we can create a PartInfo2 and deprecate PartInfo, as long as we continue to support it for a while.

        Show
        Alan Gates added a comment - Marking something as Stable doesn't mean it can't change. It means it can't change in a non-backward compatible way. With these designations we can still change PartInfo in the ways you suggest. Worst case we can create a PartInfo2 and deprecate PartInfo, as long as we continue to support it for a while.
        Hide
        Travis Crawford added a comment -

        Canceling patch to clear up the "patch available" report; there's a good discussion going here though!

        Show
        Travis Crawford added a comment - Canceling patch to clear up the "patch available" report; there's a good discussion going here though!
        Hide
        Travis Crawford added a comment -

        These interface classifications will be super useful, since right now its not always clear what our contract with users is.

        An issue discussed here is how to treat base classes, since they may be needed by public/stable classes. It seems like we could initially take the approach of making as little as possible public, but anything public/stable depends on would be declared stable.

        Thoughts?

        Show
        Travis Crawford added a comment - These interface classifications will be super useful, since right now its not always clear what our contract with users is. An issue discussed here is how to treat base classes, since they may be needed by public/stable classes. It seems like we could initially take the approach of making as little as possible public, but anything public/stable depends on would be declared stable. Thoughts?
        Hide
        Francis Liu added a comment -

        but anything public/stable depends on would be declared stable.

        This would be an ideal to follow but it seems a bit restrictive to me. What if I add a new experimental method and class as an argument: HCatInputFormat.foo(Bar arg). Does that mean that we can't add new methods until we can mark all the new classes it exposes as stable?

        Show
        Francis Liu added a comment - but anything public/stable depends on would be declared stable. This would be an ideal to follow but it seems a bit restrictive to me. What if I add a new experimental method and class as an argument: HCatInputFormat.foo(Bar arg). Does that mean that we can't add new methods until we can mark all the new classes it exposes as stable?
        Hide
        Travis Crawford added a comment -

        That's a good point, it would be restrictive in the case you mentioned above.

        What do you think about marking the classes that are obviously public as such (HCatLoader, HCatInputFormat, ...) and leaving others unmarked, implying they're private. This would allow us to get some experience evolving interfaces consistent with the stability contract, without having to figure out the corner cases. As we gain experience hopefully we learn how to deal with corner cases like you mentioned above. For example, maybe just the experimental method is marked as such. I just checked Hive and found an example where just a single method is annotated suggesting we could go that route too.

        I think the high-level goal of these annotations is to let users know what's safe to use. If we start small and build from there I think we'll end up giving some pretty good guidance.

        Show
        Travis Crawford added a comment - That's a good point, it would be restrictive in the case you mentioned above. What do you think about marking the classes that are obviously public as such (HCatLoader, HCatInputFormat, ...) and leaving others unmarked, implying they're private. This would allow us to get some experience evolving interfaces consistent with the stability contract, without having to figure out the corner cases. As we gain experience hopefully we learn how to deal with corner cases like you mentioned above. For example, maybe just the experimental method is marked as such. I just checked Hive and found an example where just a single method is annotated suggesting we could go that route too. I think the high-level goal of these annotations is to let users know what's safe to use. If we start small and build from there I think we'll end up giving some pretty good guidance.
        Hide
        Francis Liu added a comment -

        That sounds good to me.

        Show
        Francis Liu added a comment - That sounds good to me.

          People

          • Assignee:
            Alan Gates
            Reporter:
            Ashutosh Chauhan
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:

              Development