Hive
  1. Hive
  2. HIVE-477

Some optimization thoughts for Hive

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:

      Description

      Before we can start working on Hive-461. I am doing some profiling for hive. And here are some thoughts for improvements:

      minor :
      1) add a new HiveText to replace Text. It can avoid byte copy when init LazyString. I have done a draft one, it shows ~1% performance gains.
      2) let StructObjectInspector's

           public List<Object> getStructFieldsDataAsList(Object data);
          

      to be

           public Object[] getStructFieldsDataAsArray(Object data);
          

      In my profiling test, it shows some performace gains. but in acutal execution it did not. Anyway, let it return java array will reduce gc's burden of collection ArrayList

      not so minor:
      3) split FileSinkOperator's Writer into another Thread. Adding a producer-consumer array as the bridge between the Operators thread and the Writer thread.
      4) the operator stack is kind of deep. In order to avoid instruction cache misses, and increase the efficiency data cache, I suggest to let Hive's operator can process an array of rows instead of processing only one row at a time.

        Issue Links

          Activity

          No work has yet been logged on this issue.

            People

            • Assignee:
              Unassigned
              Reporter:
              He Yongqiang
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:

                Development