Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-23175

Skip serializing hadoop and tez config on HS side



    • Type: Improvement
    • Status: In Progress
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Tez


      HiveServer spends a lot of time serializing configuration objects. We can skip putting hadoop and tez config xml files in payload assuming that the configs are the same on both HS and Task side. This depends on Tez to load local xml configs when creating config objects https://issues.apache.org/jira/browse/TEZ-4137 

      Ideally we should be able to skip hive-site.xml too. However, if we skip hive-site.xml at that stage, then we make wrong choices at tez dag build stage due to missing configs.

      In the ideal version of this, we should not be both looking up configs and putting new configs from and to the same config object at DAG and Vertex build phases. Instead we should be looking up from a HS2's HiveConf object and writing to a brand new JobConf for each vertex. That way we would not have any unnecessary item in the jobconf for any vertex. However Dag and Vertex build stages (TezTask#build) and a lot of other components called from there treat a single config object both the source of HS2 side config and the target JobConf that they are putting vertex level options into. It is very hard to separate these concerns now.

      With this patch, we are reducing the size of JobConf (per vertex) by ~65%. It should improve the transmit latency. However, most significant gains are at CPU time while compressing job configs as the config objects are much smaller now.


        1. HIVE-23175.1.patch
          3 kB
          Mustafa İman
        2. HIVE-23175.2.patch
          4 kB
          Mustafa İman
        3. HIVE-23175.3.patch
          4 kB
          Mustafa İman

          Issue Links



              • Assignee:
                mustafaiman Mustafa İman
                mustafaiman Mustafa İman
              • Votes:
                0 Vote for this issue
                3 Start watching this issue


                • Created:

                  Time Tracking

                  Original Estimate - Not Specified
                  Not Specified
                  Remaining Estimate - 0h
                  Time Spent - 50m