Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-10422

Follow AWS specs in Kinesis Consumer

    XMLWordPrintableJSON

Details

    Description

      Related conversation in mailing list:

      https://lists.apache.org/thread.html/96de3bac9761564767cf283b58d664f5ae1b076e0c4431620552af5b@%3Cdev.flink.apache.org%3E

      Summary:

      Flink Kinesis consumer checks shards id for a particular pattern:

      "^shardId-\\d{12}"
      

      https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/model/StreamShardHandle.java#L132

      While this inlines with current Kinesis streams server implementation (all streams follows this pattern), it confronts with AWS docs:

       

      ShardId
       The unique identifier of the shard within the stream.
       Type: String
       Length Constraints: Minimum length of 1. Maximum length of 128.
      Pattern: [a-zA-Z0-9_.-]+
       Required: Yes
      

       

      https://docs.aws.amazon.com/kinesis/latest/APIReference/API_Shard.html

      Intention:
      We have no guarantees and can't rely on patterns other than provided in AWS manifest.
      Any custom implementation of Kinesis mock should rely on AWS manifest which claims ShardID to be alfanums. This prevents anyone to use Flink with such kind of mocks.

      The reason behind the scene to use particular pattern "^shardId-d12" is to create Flink's custom Shard comparator, filter already seen shards, and pass latest shard for client.listShards only to limit the scope for RPC call to AWS.

      In the meantime, I think we can get rid of this logic at all. The current usage in project is:

      Attachments

        Issue Links

          Activity

            People

              eyushin eugen yushin
              eyushin eugen yushin
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 10m
                  10m