Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-2437

Split a tablet into primary key ranges by size

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.8.0
    • Component/s: client, tablet
    • Labels:
      None

      Description

      When reading data in a kudu table using spark, if there is a large amount of data in the tablet, reading the data takes a long time. The reason is that KuduRDD uses a tablet to generate the scanToken, so a spark task needs to process all the data in a tablet. 

      We think that TabletServer should provide an RPC interface, which can be split tablet into multiple primary key ranges by size. The kudu-client can choose whether to perform parallel scan according to the case.

      RPC interface:

      // A split key range request. Split tablet to key ranges, the request
      // doesn't change layout of tablet.
      message SplitKeyRangeRequestPB {
       required bytes tablet_id = 1;
      
       // Encoded primary key to begin scanning at (inclusive).
       optional bytes start_primary_key = 2 [(kudu.REDACT) = true];
       // Encoded primary key to stop scanning at (exclusive).
       optional bytes stop_primary_key = 3 [(kudu.REDACT) = true];
      
       // Number of bytes to try to return in each chunk. This is a hint.
       // The tablet server may return chunks larger or smaller than this value.
       optional uint64 target_chunk_size_bytes = 4;
      
       // The columns to consider when chunking.
       // If specified, then the size estimate used for 'target_chunk_size_bytes'
       // should only include these columns. This can be used if a query will
       // only scan a certain subset of the columns.
       repeated ColumnSchemaPB columns = 5;
      }
      
      // The primary key range of a Kudu tablet.
      message KeyRangePB {
       // Encoded primary key to begin scanning at (inclusive).
       optional bytes start_primary_key = 1 [(kudu.REDACT) = true];
       // Encoded primary key to stop scanning at (exclusive).
       optional bytes stop_primary_key = 2 [(kudu.REDACT) = true];
       // Number of bytes in chunk.
       required uint64 size_bytes_estimates = 3;
      }
      
      message SplitKeyRangeResponsePB {
       // The error, if an error occurred with this request.
       optional TabletServerErrorPB error = 1;
      
       repeated KeyRangePB ranges = 2;
      }
      

       

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                oclarms Xu Yao
                Reporter:
                oclarms Xu Yao
              • Votes:
                1 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: