Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-20538

sink.rolling-policy.file-size does not work in filesystem connector

    XMLWordPrintableJSON

Details

    Description

      When I use sql filesystem connector to write data to hdfs,and set sink.rolling-policy.file-size to 50MB.But seems not working, there are still 100MB+ size files.

      My table ddl is :

       

      CREATE TABLE cpc_bd_recall_log_hdfs (
         log_timestamp BIGINT,
         ip STRING,
         `raw` STRING,
         `day` STRING, `hour` STRING,`minute` STRING
      ) PARTITIONED BY (`day` , `hour` ,`minute`) WITH (
         'connector'='filesystem',
         'path'='hdfs://xxx/test.db/hdfs_test',
         'format'='parquet',
         'parquet.compression'='SNAPPY',
         'sink.rolling-policy.file-size' = '50MB',
         'sink.partition-commit.policy.kind' = 'success-file',
         'sink.partition-commit.delay'='60s'
      );
      

      the hdfs files are:

       

       

           0 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/_SUCCESS
      -rw-r--r--   3 hadoop hadoop     31.7 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-0-2500
      -rw-r--r--   3 hadoop hadoop    121.8 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-0-2501
      -rw-r--r--   3 hadoop hadoop     31.9 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-1-2499
      -rw-r--r--   3 hadoop hadoop    122.0 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-1-2500
      -rw-r--r--   3 hadoop hadoop     31.8 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-10-2501
      -rw-r--r--   3 hadoop hadoop    121.8 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-10-2502
      -rw-r--r--   3 hadoop hadoop     31.9 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-11-2500
      -rw-r--r--   3 hadoop hadoop    122.2 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-11-2501
      -rw-r--r--   3 hadoop hadoop     31.9 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-12-2500
      -rw-r--r--   3 hadoop hadoop    122.2 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-12-2501
      -rw-r--r--   3 hadoop hadoop     31.8 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-13-2499
      -rw-r--r--   3 hadoop hadoop    122.0 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-13-2500
      -rw-r--r--   3 hadoop hadoop     31.6 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-14-2500
      -rw-r--r--   3 hadoop hadoop    122.1 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-14-2501
      -rw-r--r--   3 hadoop hadoop     31.9 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-15-2498
      -rw-r--r--   3 hadoop hadoop    121.8 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-15-2499
      -rw-r--r--   3 hadoop hadoop     31.7 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-16-2501
      -rw-r--r--   3 hadoop hadoop    122.0 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-16-2502
      -rw-r--r--   3 hadoop hadoop     31.7 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-17-2500
      -rw-r--r--   3 hadoop hadoop    122.5 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-17-2501
      -rw-r--r--   3 hadoop hadoop     31.8 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-18-2500
      -rw-r--r--   3 hadoop hadoop    121.7 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-18-2501
      -rw-r--r--   3 hadoop hadoop     31.9 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-19-2501
      -rw-r--r--   3 hadoop hadoop    121.7 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-19-2502
      -rw-r--r--   3 hadoop hadoop     31.6 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-2-2499
      -rw-r--r--   3 hadoop hadoop    121.6 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-2-2500
      -rw-r--r--   3 hadoop hadoop     31.8 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-3-2500
      -rw-r--r--   3 hadoop hadoop    121.8 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-3-2501
      -rw-r--r--   3 hadoop hadoop     31.6 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-4-2499
      -rw-r--r--   3 hadoop hadoop    122.1 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-4-2500
      -rw-r--r--   3 hadoop hadoop     31.6 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-5-2499
      -rw-r--r--   3 hadoop hadoop    121.8 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-5-2500
      -rw-r--r--   3 hadoop hadoop     31.8 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-6-2499
      -rw-r--r--   3 hadoop hadoop    121.5 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-6-2500
      -rw-r--r--   3 hadoop hadoop     31.6 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-7-2500
      -rw-r--r--   3 hadoop hadoop    122.0 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-7-2501
      -rw-r--r--   3 hadoop hadoop     31.7 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-8-2501
      -rw-r--r--   3 hadoop hadoop    122.0 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-8-2502
      -rw-r--r--   3 hadoop hadoop     31.9 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-9-2501
      -rw-r--r--   3 hadoop hadoop    121.9 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-9-2502
      

       

       

      However,when I dig into source code,when writing element to bucket it'll invoke `shouldRollOnEvent` in TableRollingPolicy.

      I don't understand how can this happen?Is a BUG or somewhere I get it wrong.

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            ZhuShang zhuxiaoshang
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: