Flume
  1. Flume
  2. FLUME-1419

Using system time if 'timestamp' property is absent in event header

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Not a Problem
    • Affects Version/s: v1.2.0
    • Fix Version/s: v1.3.0
    • Component/s: Sinks+Sources
    • Labels:
      None

      Description

      If we want to use pattern for generating HDFS path but the 'timestamp' property is absent in event header, it may cause the exception...

      Event may comes from several Sources and doesn't use Interceptor(pupulate 'timestamp' property). Thus, event may be haven't this property normally.
      Use the local system time to replace it.

      java.lang.RuntimeException: Flume wasn't able to parse timestamp header in the event to resolve time based bucketing. Please check that you're correctly populating timestamp header (for example using TimestampInterceptor source interceptor).
      at org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:149)
      at org.apache.flume.formatter.output.BucketPath.escapeString(BucketPath.java:318)
      at org.apache.flume.formatter.output.TestBucketPath.testDateFormatHours(TestBucketPath.java:46)
      Caused by: java.lang.NumberFormatException: null
      at java.lang.Long.parseLong(Unknown Source)
      at java.lang.Long.valueOf(Unknown Source)
      at org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:147)
      ... 26 more

        Activity

        Hide
        Denny Ye added a comment -

        It may conflict with original intention of missing timestamp

        Show
        Denny Ye added a comment - It may conflict with original intention of missing timestamp
        Hide
        Mike Percy added a comment -

        The functionality here is already available using the timestamp interceptor, and this changes the semantics so that the bucketing is not consistent: whereas right now the logic is event centric, it now can be local time centric, but that depends on whether the event is annotated with timestamp or not.

        This change seems like it would make things more confusing and less predictable.

        Show
        Mike Percy added a comment - The functionality here is already available using the timestamp interceptor, and this changes the semantics so that the bucketing is not consistent: whereas right now the logic is event centric, it now can be local time centric, but that depends on whether the event is annotated with timestamp or not. This change seems like it would make things more confusing and less predictable.
        Hide
        Hari Shreedharan added a comment -

        I am not sure if this is a right thing to do or not. Usually this timestamp's use would be to exactly record the time at which the event was generated, and store it in HDFS based on this information. This exception allows the user to know that he needs to use the interceptor at the first hop, to know when the event was generated and insert the data into the correct file.

        I am not sure if this should go in or not, so I am going to wait for other developers to comment.

        Thanks!

        Show
        Hari Shreedharan added a comment - I am not sure if this is a right thing to do or not. Usually this timestamp's use would be to exactly record the time at which the event was generated, and store it in HDFS based on this information. This exception allows the user to know that he needs to use the interceptor at the first hop, to know when the event was generated and insert the data into the correct file. I am not sure if this should go in or not, so I am going to wait for other developers to comment. Thanks!

          People

          • Assignee:
            Denny Ye
            Reporter:
            Denny Ye
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development