Details
-
Sub-task
-
Status: Closed
-
Blocker
-
Resolution: Fixed
-
None
-
None
Description
The issue aims to verfiy FLINK-29635.
Please verify in batch mode, the document is in https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/hive/hive_read_write/#file-compaction:
1: enable auto-compaction, write some data to a Hive table which results in the average size of files is less than compaction.small-files.avg-size(16MB by default), verfiy these files should be merged.
2: enable auto-compaction, set compaction.small-files.avg-size to a smaller values, then write some data to a Hive table which results in the average size of files is greater thant the compaction.small-files.avg-size, verfiy these files shouldn't be merged.
3. set sink.parallelism manually, check the parallelism of the compact operator is equal to sink.parallelism.
4. set compaction.parallelism manually, check the parallelism of the compact operator is equal to compaction.parallelism.
5. set compaction.file-size, check the size of the each target file merged is about the `compaction.file-size`.
We shoud verify it with writing non-partitioned table, static partition table, dynamic partition table.
We can find the example sql for how to create & write hive table in the codebase [HiveTableCompactSinkITCase|https://github.com/apache/flink/blob/0915c9850d861165e283acc0f60545cd836f0567/flink-connectors/flink-connector-hive/src/test/java/org/apache/flink/connectors/hive/HiveTableCompactSinkITCase.java].
Attachments
Attachments
Issue Links
- is caused by
-
FLINK-29635 Hive sink should support merge small files in batch mode
- Resolved
- relates to
-
FLINK-31132 compact without setting parallelism does not follow the configured sink parallelism for HiveTableSink
- Closed