Details
-
Bug
-
Status: Open
-
Not a Priority
-
Resolution: Unresolved
-
1.11.1
-
None
Description
When I use sql filesystem connector to write data to hdfs,and set sink.rolling-policy.file-size to 50MB.But seems not working, there are still 100MB+ size files.
My table ddl is :
CREATE TABLE cpc_bd_recall_log_hdfs ( log_timestamp BIGINT, ip STRING, `raw` STRING, `day` STRING, `hour` STRING,`minute` STRING ) PARTITIONED BY (`day` , `hour` ,`minute`) WITH ( 'connector'='filesystem', 'path'='hdfs://xxx/test.db/hdfs_test', 'format'='parquet', 'parquet.compression'='SNAPPY', 'sink.rolling-policy.file-size' = '50MB', 'sink.partition-commit.policy.kind' = 'success-file', 'sink.partition-commit.delay'='60s' );
the hdfs files are:
0 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/_SUCCESS -rw-r--r-- 3 hadoop hadoop 31.7 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-0-2500 -rw-r--r-- 3 hadoop hadoop 121.8 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-0-2501 -rw-r--r-- 3 hadoop hadoop 31.9 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-1-2499 -rw-r--r-- 3 hadoop hadoop 122.0 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-1-2500 -rw-r--r-- 3 hadoop hadoop 31.8 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-10-2501 -rw-r--r-- 3 hadoop hadoop 121.8 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-10-2502 -rw-r--r-- 3 hadoop hadoop 31.9 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-11-2500 -rw-r--r-- 3 hadoop hadoop 122.2 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-11-2501 -rw-r--r-- 3 hadoop hadoop 31.9 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-12-2500 -rw-r--r-- 3 hadoop hadoop 122.2 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-12-2501 -rw-r--r-- 3 hadoop hadoop 31.8 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-13-2499 -rw-r--r-- 3 hadoop hadoop 122.0 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-13-2500 -rw-r--r-- 3 hadoop hadoop 31.6 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-14-2500 -rw-r--r-- 3 hadoop hadoop 122.1 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-14-2501 -rw-r--r-- 3 hadoop hadoop 31.9 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-15-2498 -rw-r--r-- 3 hadoop hadoop 121.8 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-15-2499 -rw-r--r-- 3 hadoop hadoop 31.7 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-16-2501 -rw-r--r-- 3 hadoop hadoop 122.0 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-16-2502 -rw-r--r-- 3 hadoop hadoop 31.7 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-17-2500 -rw-r--r-- 3 hadoop hadoop 122.5 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-17-2501 -rw-r--r-- 3 hadoop hadoop 31.8 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-18-2500 -rw-r--r-- 3 hadoop hadoop 121.7 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-18-2501 -rw-r--r-- 3 hadoop hadoop 31.9 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-19-2501 -rw-r--r-- 3 hadoop hadoop 121.7 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-19-2502 -rw-r--r-- 3 hadoop hadoop 31.6 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-2-2499 -rw-r--r-- 3 hadoop hadoop 121.6 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-2-2500 -rw-r--r-- 3 hadoop hadoop 31.8 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-3-2500 -rw-r--r-- 3 hadoop hadoop 121.8 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-3-2501 -rw-r--r-- 3 hadoop hadoop 31.6 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-4-2499 -rw-r--r-- 3 hadoop hadoop 122.1 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-4-2500 -rw-r--r-- 3 hadoop hadoop 31.6 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-5-2499 -rw-r--r-- 3 hadoop hadoop 121.8 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-5-2500 -rw-r--r-- 3 hadoop hadoop 31.8 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-6-2499 -rw-r--r-- 3 hadoop hadoop 121.5 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-6-2500 -rw-r--r-- 3 hadoop hadoop 31.6 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-7-2500 -rw-r--r-- 3 hadoop hadoop 122.0 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-7-2501 -rw-r--r-- 3 hadoop hadoop 31.7 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-8-2501 -rw-r--r-- 3 hadoop hadoop 122.0 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-8-2502 -rw-r--r-- 3 hadoop hadoop 31.9 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-9-2501 -rw-r--r-- 3 hadoop hadoop 121.9 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-9-2502
However,when I dig into source code,when writing element to bucket it'll invoke `shouldRollOnEvent` in TableRollingPolicy.
I don't understand how can this happen?Is a BUG or somewhere I get it wrong.