Uploaded image for project: 'HCatalog'
  1. HCatalog
  2. HCATALOG-341

InitializeInput improvements

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.5
    • None
    • None

    Description

      This came up in HCATALOG-328.

      InitializeInput is the HCatalog class that queries the HiveMetaStore and stores the query result. It could be improved in the following ways:

      • The class has entirely static methods, so a private arg-less constructor should be added to prevent people from accidentally creating instances.
      • Instead of querying the HiveMetaStore each time info is requested, the results should be cached after the first query using a key of db+table+filter.
      • setInput and getSerializedHcatKeyJobInfo require an existing InputJobInfo argument, however, the point of calling those methods is to populate a InputJobInfo with info from the metastore. While this reduces the number of arguments (instead of needing database name, table name, partition filter) it confuses the user because its not clear only db/table/filter should be set when passed as an argument.
      • getSerializedHcatKeyJobInfo should be renamed getInputJobInfo and return an unserialized InputJobInfo. This avoids unnecessary serialization/deserialization in the front-end when its not necessary to read from the job configuration.

      Attachments

        1. HCATALOG-341.patch
          6 kB
          Travis Crawford
        2. HCATALOG-341.2.patch
          23 kB
          Travis Crawford

        Activity

          People

            traviscrawford Travis Crawford
            traviscrawford Travis Crawford
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment