Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-1337

Current block alignment logic may lead to several row groups per block

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • None
    • None
    • parquet-mr

    Description

      When the size of buffered data gets near the desired row group size, Parquet flushes the data to a row group. However, at this point the data for the last page is not yet encoded nor compressed, thereby the row group may end up being significantly smaller than it was intended.

      If the row group ends up being so small that it is farther away from the next disk block boundary than the maximum padding, Parquet will try to create a new group in the same disk block, this time targeting the remaning space. This may also be flushed prematurely, leading to the creation of an even smaller row group, which may lead to an even smaller one... This gets repeated until we get sufficiently close to the block boundary so that padding can be finally applied. The resulting superflous row groups can lead to bad read performance.

      An example of the structure of a Parquet file suffering from this problem can be seen below. For easier interpretation, the row groups are visually grouped by disk blocks:

      row group 1:  RC:18774 TS:22182960 OFFSET:       4
      row group 2:  RC: 2896 TS: 3428160 OFFSET: 6574564
      row group 3:  RC: 1964 TS: 2322560 OFFSET: 7679844
      row group 4:  RC: 1074 TS: 1268880 OFFSET: 8732964
      
      row group 5:  RC:18808 TS:22228560 OFFSET:10000000
      row group 6:  RC: 2872 TS: 3389520 OFFSET:16612640
      row group 7:  RC: 1930 TS: 2284960 OFFSET:17716800
      row group 8:  RC: 1040 TS: 1233520 OFFSET:18768240
      
      row group 9:  RC:18852 TS:22275520 OFFSET:20000000
      row group 10: RC: 2831 TS: 3345680 OFFSET:26656320
      row group 11: RC: 1893 TS: 2244640 OFFSET:27757200
      row group 12: RC: 1008 TS: 1195520 OFFSET:28806560
      
      row group 13: RC:18841 TS:22263360 OFFSET:30000000
      row group 14: RC: 2835 TS: 3350480 OFFSET:36652000
      row group 15: RC: 1900 TS: 2249040 OFFSET:37753600
      row group 16: RC: 1016 TS: 1198640 OFFSET:38803600
      
      row group 17: RC: 1466 TS: 1740320 OFFSET:40000000
      

      In this example, both the disk block size and the row group size was set to 10000000. The data would fit in 5 row groups of this size, but instead, each of the disk blocks (except the last) is split into 4 row groups of progressively decreasing size.

      Attachments

        Activity

          People

            Unassigned Unassigned
            gszadovszky Gabor Szadovszky
            Votes:
            1 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated: