Details

    • Type: Improvement Improvement
    • Status: Reopened
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Documentation
    • Labels:
      None

      Description

      GenericUDFs are very poorly documented, this includes everything they relate to:

      • ObjectInspector (JavaDoc not really helpful for someone not familiar with Hive)
      • ObjectInspectorFactory
      • ObjectInspectorConverters
      • ...

      An example would help as well as a unit test for one of the built in GenericUDFs. Writing a normal UDF is pretty well documented but GenericUDFs (and UDTF/UDAF) require more knowledge about the inner workings of Hive and that could be documented better.

        Activity

        Hide
        Edward Capriolo added a comment -

        What are you saying? Do you want more Java Doc? Hive currently uses a wiki that all our free to edit. I do not think we should be opening up Jira's for documentation. Hive already has enough issues assigned to no one, to be completed never.

        Show
        Edward Capriolo added a comment - What are you saying? Do you want more Java Doc? Hive currently uses a wiki that all our free to edit. I do not think we should be opening up Jira's for documentation. Hive already has enough issues assigned to no one, to be completed never.
        Hide
        Lars Francke added a comment -

        What's the "Documentation" component for then?

        The Wiki is nice but it's currently not a very good source of information beyond the very basic things. It is also not complete and I guess probably not up-to-date either on every page.

        The nature of this issue also requires someone with more knowledge about Hive to take a look it at than a regular user. So I think it's a bad idea closing this issue just because there are other unassigned issues. This is an issue tracker after all.

        Show
        Lars Francke added a comment - What's the "Documentation" component for then? The Wiki is nice but it's currently not a very good source of information beyond the very basic things. It is also not complete and I guess probably not up-to-date either on every page. The nature of this issue also requires someone with more knowledge about Hive to take a look it at than a regular user. So I think it's a bad idea closing this issue just because there are other unassigned issues. This is an issue tracker after all.
        Hide
        Edward Capriolo added a comment -

        Lars
        I understand that no one likes to see issues closed as "WONT FIX" (I am not trying to be snotty). Hive currently has hundreds of Open Issues. Opening an issue like "Document X" is vague. You are correct in saying this is an issue tracker, but it is quite common to first come on the IRC or ML and discuss the feature you want.

        IMHO. What this boils down to is if the developers had more time to document they would. If we had to open an issue for each thing that needed more documentation Jira would be unusable (Many things need documentation).
        Generally, if the user submitting the request is not willing to assign it to themselves there is little chance of it getting done by anyone else (as evidenced by the number of opened unassigned tickets). These issues should be actionable in the near term. If no one is going to actively work on the issue we do not get anything from having a ticket open on it.

        Show
        Edward Capriolo added a comment - Lars I understand that no one likes to see issues closed as "WONT FIX" (I am not trying to be snotty). Hive currently has hundreds of Open Issues. Opening an issue like "Document X" is vague. You are correct in saying this is an issue tracker, but it is quite common to first come on the IRC or ML and discuss the feature you want. IMHO. What this boils down to is if the developers had more time to document they would. If we had to open an issue for each thing that needed more documentation Jira would be unusable (Many things need documentation). Generally, if the user submitting the request is not willing to assign it to themselves there is little chance of it getting done by anyone else (as evidenced by the number of opened unassigned tickets). These issues should be actionable in the near term. If no one is going to actively work on the issue we do not get anything from having a ticket open on it.
        Hide
        Lars Francke added a comment -

        I tried to add details about what and why should be documented so I had hoped it wouldn't be vague. Let me know what needs clarifying and I'll gladly do it.

        I've been on IRC and the mailing list and we've asked questions about these things and tried to figure it out on our own that's why I decided to open the ticket.

        I didn't assign the issue to myself because I don't feel like I have any idea about what's going on there. And I also don't agree on the value of open and unassigned tickets. It gives an overview of what needs to be done still and perhaps one of these days someone's going to focus on documentation and I think it would be helpful then to know what's missing.

        But I'll leave this issue alone now to not take any more time. Thanks for looking at it anyway. One of your GenericUDFs can be found on Google and that's about the only documentation/example I could find so you've already helped

        Show
        Lars Francke added a comment - I tried to add details about what and why should be documented so I had hoped it wouldn't be vague. Let me know what needs clarifying and I'll gladly do it. I've been on IRC and the mailing list and we've asked questions about these things and tried to figure it out on our own that's why I decided to open the ticket. I didn't assign the issue to myself because I don't feel like I have any idea about what's going on there. And I also don't agree on the value of open and unassigned tickets. It gives an overview of what needs to be done still and perhaps one of these days someone's going to focus on documentation and I think it would be helpful then to know what's missing. But I'll leave this issue alone now to not take any more time. Thanks for looking at it anyway. One of your GenericUDFs can be found on Google and that's about the only documentation/example I could find so you've already helped
        Hide
        Patrick Angeles added a comment -

        FWIW, +1 on better documentation, particularly on extension points like UD*Fs, SerDes and StorageHandlers. These are things that are worth taking on up front because it increases user engagement and adoption and reduces the burden of support/education on core committers in the long term.

        Show
        Patrick Angeles added a comment - FWIW, +1 on better documentation, particularly on extension points like UD*Fs, SerDes and StorageHandlers. These are things that are worth taking on up front because it increases user engagement and adoption and reduces the burden of support/education on core committers in the long term.
        Hide
        Edward Capriolo added a comment -

        The best way to currently learn about these features is to look through the unit tests, code, and .q files. There other UDFs like atan easy to follow. There is a UDFT for example that splits a URL into parts. Looking at the split() or case() UDF's give you an idea of some of the more complex things that can be done. Looking at struct() or list() udfs shows you a lot about how to use object inspectors to detect and return different types. If you want to come on IRC I can help with specific questions.

        Show
        Edward Capriolo added a comment - The best way to currently learn about these features is to look through the unit tests, code, and .q files. There other UDFs like atan easy to follow. There is a UDFT for example that splits a URL into parts. Looking at the split() or case() UDF's give you an idea of some of the more complex things that can be done. Looking at struct() or list() udfs shows you a lot about how to use object inspectors to detect and return different types. If you want to come on IRC I can help with specific questions.
        Hide
        Jeff Hammerbacher added a comment -

        It's ridiculous to close an issue as "Won't Fix" that many people think should be fixed.

        Show
        Jeff Hammerbacher added a comment - It's ridiculous to close an issue as "Won't Fix" that many people think should be fixed.
        Hide
        Edward Capriolo added a comment -

        I do not care that much. I just dislike seeing things stay open forever that no one is going to work on.
        https://issues.apache.org/jira/browse/HIVE-29

        Show
        Edward Capriolo added a comment - I do not care that much. I just dislike seeing things stay open forever that no one is going to work on. https://issues.apache.org/jira/browse/HIVE-29

          People

          • Assignee:
            Unassigned
            Reporter:
            Lars Francke
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:

              Development