Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.19.0
-
None
-
None
Description
Description:
The Streaming Hive Source cannot track tables that have more than 32,767 partitions.
Root Cause:
The Streaming Hive Source uses the following lines to get all partitions of a table:
HivePartitionFetcherContextBase.java:
@Override public List<ComparablePartitionValue> getComparablePartitionValueList() throws Exception { List<ComparablePartitionValue> partitionValueList = new ArrayList<>(); switch (partitionOrder) { case PARTITION_NAME: List<String> partitionNames = metaStoreClient.listPartitionNames( tablePath.getDatabaseName(), tablePath.getObjectName(), Short.MAX_VALUE); for (String partitionName : partitionNames) { partitionValueList.add(getComparablePartitionByName(partitionName)); } break; case CREATE_TIME: Map<List<String>, Long> partValuesToCreateTime = new HashMap<>(); partitionNames = metaStoreClient.listPartitionNames( tablePath.getDatabaseName(), tablePath.getObjectName(), Short.MAX_VALUE);
Where the `metaStoreClient` is a wrapper of the `IMetaStoreClient`, and the function `listPartitionNames` can only list no more than `Short.MAX_VALUE` partitions, which is 32,767.
For tables that have more partitions, the source fails to track new partitions and read from it.