Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-35118

StreamingHiveSource cannot track tables that have more than 32,767 partitions

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.19.0
    • None
    • Connectors / Hive
    • None

    Description

      Description:

      The Streaming Hive Source cannot track tables that have more than 32,767 partitions.

       Root Cause:

      The Streaming Hive Source uses the following lines to get all partitions of a table:

      (git hub link)

      HivePartitionFetcherContextBase.java:

          @Override
          public List<ComparablePartitionValue> getComparablePartitionValueList() throws Exception {
              List<ComparablePartitionValue> partitionValueList = new ArrayList<>();
              switch (partitionOrder) {
                  case PARTITION_NAME:
                      List<String> partitionNames =
                              metaStoreClient.listPartitionNames(
                                      tablePath.getDatabaseName(),
                                      tablePath.getObjectName(),
                                      Short.MAX_VALUE);
                      for (String partitionName : partitionNames) {
                          partitionValueList.add(getComparablePartitionByName(partitionName));
                      }
                      break;
                  case CREATE_TIME:
                      Map<List<String>, Long> partValuesToCreateTime = new HashMap<>();
                      partitionNames =
                              metaStoreClient.listPartitionNames(
                                      tablePath.getDatabaseName(),
                                      tablePath.getObjectName(),
                                      Short.MAX_VALUE); 

      Where the `metaStoreClient` is a wrapper of the `IMetaStoreClient`, and the function `listPartitionNames` can only list no more than `Short.MAX_VALUE` partitions, which is 32,767.

       

      For tables that have more partitions, the source fails to track new partitions and read from it.

      Attachments

        Activity

          People

            Unassigned Unassigned
            heywxl roland
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: