Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2017

Lazy materialization of Parquet columns during query

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • Impala 1.4, Impala 2.0, Impala 2.1, Impala 2.2
    • None
    • Backend

    Description

      When I run a query over a 4 billion row table that returns a single row, it takes ~30 seconds if i do 'select * ...'. It takes only 3 seconds if I do a 'select field1, field2 ...'. This is repeatable.

      Given these times, it would seem that the 'select *' query is materializing all the fields for rows whether they match or not.

      Lazy materialization of columns when they are needed could improve performance.

      These four queries were run back to back. The actual returned data is elided (sorry). The table has 35 fields.

      0: jdbc:hive2://atl1c1r2data09.vldb-bo.secure> select * from events where event_id=1416403791; 
      <elided>
      1 row selected (33.777 seconds)
      0: jdbc:hive2://atl1c1r2data09.vldb-bo.secure> select event_id, client_id from events where event_id=1416403791;
      +-------------+------------+--+
      | event_id | client_id |
      +-------------+------------+--+
      | 1416403791 | <elided> |
      +-------------+------------+--+
      1 row selected (3.363 seconds)
      0: jdbc:hive2://atl1c1r2data09.vldb-bo.secure> select * from events where event_id=1416403791; 
      <elided>
      1 row selected (33.138 seconds)
      0: jdbc:hive2://atl1c1r2data09.vldb-bo.secure> select event_id, client_id from events where event_id=1416403791;
      +-------------+------------+--+
      | event_id | client_id |
      +-------------+------------+--+
      | 1416403791 | <elided> |
      +-------------+------------+--+
      1 row selected (3.074 seconds)
      0: jdbc:hive2://atl1c1r2data09.vldb-bo.secure>
      

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            arawat Abhishek Rawat
            lbershad_impala_629c Lou Bershad

            Dates

              Created:
              Updated:

              Slack

                Issue deployment