Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-25874

Slow filter evaluation of nest struct fields in vectorized executions

    XMLWordPrintableJSON

Details

    Description

      time is spent at resizing vectors around here or in some other "ensureSize" method

      
      create table t as
      select
      named_struct('id',13,'str','string','nest',named_struct('id',12,'str','string','arr',array('value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value')))
      s;
      
      -- go up to 1M rows
      insert into table t select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t;
      insert into table t select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t;
      insert into table t select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t;
      insert into table t select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t;
      insert into table t select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t;
      -- insert into table t select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t;
      
      
      set hive.fetch.task.conversion=none;
      
      select count(1) from t;
      --explain
      select s
      .id from t
      where 
      s
      .nest
      .id  > 0;
      
       

      interestingly; the issue is not present:

      • for a query not looking into the nested struct
      • and in case the struct with the array is at the top level
      select count(1) from t;
      --explain
      select s
      .id from t
      where 
      s
      -- .nest
      .id  > 0;
      

      Attachments

        Issue Links

          Activity

            People

              kgyrtkirk Zoltan Haindrich
              kgyrtkirk Zoltan Haindrich
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m