Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-23175

Skip serializing hadoop and tez config on HS side

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Tez

    Description

      HiveServer spends a lot of time serializing configuration objects. We can skip putting hadoop and tez config xml files in payload assuming that the configs are the same on both HS and Task side. This depends on Tez to load local xml configs when creating config objects https://issues.apache.org/jira/browse/TEZ-4137 

      Ideally we should be able to skip hive-site.xml too. However, if we skip hive-site.xml at that stage, then we make wrong choices at tez dag build stage due to missing configs.

      In the ideal version of this, we should not be both looking up configs and putting new configs from and to the same config object at DAG and Vertex build phases. Instead we should be looking up from a HS2's HiveConf object and writing to a brand new JobConf for each vertex. That way we would not have any unnecessary item in the jobconf for any vertex. However Dag and Vertex build stages (TezTask#build) and a lot of other components called from there treat a single config object both the source of HS2 side config and the target JobConf that they are putting vertex level options into. It is very hard to separate these concerns now.

      With this patch, we are reducing the size of JobConf (per vertex) by ~65%. It should improve the transmit latency. However, most significant gains are at CPU time while compressing job configs as the config objects are much smaller now.

      Attachments

        1. HIVE-23175.1.patch
          3 kB
          Mustafa İman
        2. HIVE-23175.2.patch
          4 kB
          Mustafa İman
        3. HIVE-23175.3.patch
          4 kB
          Mustafa İman

        Issue Links

          Activity

            People

              mustafaiman Mustafa İman
              mustafaiman Mustafa İman
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 50m
                  50m