Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-5516

Reduce memory footprint on workload with thousand active partitions

    XMLWordPrintableJSON

Details

    • 2

    Description

      We can reduce memory footprint on workload with thousand active partitions between checkpoints. That workload is relevant with wide checkpoint interval. More specifically, active partition here is a special case of active fileId.
      Write client holds map with write handles to create ReplaceHandle between checkpoints. It leads to OutOfMemoryError on the workload because write handle is huge object.

      
      create table source (
          `id` int,
          `data` string
      ) with (
          'connector' = 'datagen',
          'rows-per-second' = '100',
          'fields.id.kind' = 'sequence',
          'fields.id.start' = '0',
          'fields.id.end' = '3000'
      );
      create table sink (
          `id` int primary key,
          `data` string,
          `part` string
      ) partitioned by (`part`) with (
          'connector' = 'hudi',
          'path' = '/tmp/sink',
          'write.batch.size' = '0.001',  -- 1024 bytes
          'write.task.max.size' = '101.001',  -- 101.001MB
          'write.merge.max_memory' = '1'  -- 1024 bytes
      );
      
      insert into sink select `id`, `data`, concat('part', cast(`id` as string)) as `part` from source;
      
      

       

      Attachments

        Issue Links

          Activity

            People

              danny0405 Danny Chen
              trushev Alexander Trushev
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: