Uploaded image for project: 'Flume'
  1. Flume
  2. FLUME-2620

File channel throws NullPointerException if a header value is null

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.8.0
    • Component/s: File Channel
    • Labels:
      None

      Description

      File channel throws NullPointerException if a header value is null.
      If this is intended, it should be reported correctly in the logs.

      Sample trace:

      org.apache.flume.ChannelException: Unable to put batch on required channel: FileChannel chan

      { dataDirs: [/var/lib/ingestion-csv/chan/data] }

      at org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:200)
      at org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:236)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
      at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.NullPointerException
      at org.apache.flume.channel.file.proto.ProtosFactory$FlumeEventHeader$Builder.setValue(ProtosFactory.java:7415)
      at org.apache.flume.channel.file.Put.writeProtos(Put.java:85)
      at org.apache.flume.channel.file.TransactionEventRecord.toByteBuffer(TransactionEventRecord.java:174)
      at org.apache.flume.channel.file.Log.put(Log.java:622)
      at org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doPut(FileChannel.java:469)
      at org.apache.flume.channel.BasicTransactionSemantics.put(BasicTransactionSemantics.java:93)
      at org.apache.flume.channel.BasicChannelSemantics.put(BasicChannelSemantics.java:80)
      at org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:189)

      1. FLUME-2620-5.patch
        10 kB
        Marcell Hegedus
      2. FLUME-2620-4.patch
        13 kB
        Neerja Khattar
      3. FLUME-2620-3.patch
        9 kB
        Neerja Khattar
      4. FLUME-2620-2.patch
        8 kB
        Neerja Khattar
      5. FLUME-2620-1.patch
        9 kB
        Neerja Khattar
      6. FLUME-2620-0.patch
        8 kB
        Neerja Khattar
      7. FLUME-2620.patch
        1 kB
        Neerja Khattar
      8. FLUME-2620.patch
        3 kB
        Neerja Khattar

        Issue Links

          Activity

          Hide
          hudson Hudson added a comment -

          ABORTED: Integrated in Jenkins build Flume-trunk-hbase-1 #304 (See https://builds.apache.org/job/Flume-trunk-hbase-1/304/)
          FLUME-2620. File Channel to support empty values in headers (denes: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=1e69fc7c29f104a2117a62de11cba9b2a2c740e1)

          • (edit) flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/Put.java
          • (edit) flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/proto/ProtosFactory.java
          • (edit) flume-ng-channels/flume-file-channel/src/main/proto/filechannel.proto
          • (edit) flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestFileChannel.java
          • (edit) flume-ng-core/src/test/java/org/apache/flume/channel/TestMemoryChannel.java
          Show
          hudson Hudson added a comment - ABORTED: Integrated in Jenkins build Flume-trunk-hbase-1 #304 (See https://builds.apache.org/job/Flume-trunk-hbase-1/304/ ) FLUME-2620 . File Channel to support empty values in headers (denes: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=1e69fc7c29f104a2117a62de11cba9b2a2c740e1 ) (edit) flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/Put.java (edit) flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/proto/ProtosFactory.java (edit) flume-ng-channels/flume-file-channel/src/main/proto/filechannel.proto (edit) flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestFileChannel.java (edit) flume-ng-core/src/test/java/org/apache/flume/channel/TestMemoryChannel.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Jenkins build Flume-trunk-hbase-1 #261 (See https://builds.apache.org/job/Flume-trunk-hbase-1/261/)
          FLUME-2620. File Channel to support empty values in headers (denes: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=1e69fc7c29f104a2117a62de11cba9b2a2c740e1)

          • (edit) flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/Put.java
          • (edit) flume-ng-channels/flume-file-channel/src/main/proto/filechannel.proto
          • (edit) flume-ng-core/src/test/java/org/apache/flume/channel/TestMemoryChannel.java
          • (edit) flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestFileChannel.java
          • (edit) flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/proto/ProtosFactory.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Jenkins build Flume-trunk-hbase-1 #261 (See https://builds.apache.org/job/Flume-trunk-hbase-1/261/ ) FLUME-2620 . File Channel to support empty values in headers (denes: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=1e69fc7c29f104a2117a62de11cba9b2a2c740e1 ) (edit) flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/Put.java (edit) flume-ng-channels/flume-file-channel/src/main/proto/filechannel.proto (edit) flume-ng-core/src/test/java/org/apache/flume/channel/TestMemoryChannel.java (edit) flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestFileChannel.java (edit) flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/proto/ProtosFactory.java
          Hide
          denes Denes Arvay added a comment -

          Thank you all for the contribution, I have committed and pushed this change. Marking the ticket as resolved.

          Show
          denes Denes Arvay added a comment - Thank you all for the contribution, I have committed and pushed this change. Marking the ticket as resolved.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1e69fc7c29f104a2117a62de11cba9b2a2c740e1 in flume's branch refs/heads/trunk from Marcell Hegedus
          [ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=1e69fc7 ]

          FLUME-2620. File Channel to support empty values in headers

          Flume user guide does not specify whether a value in event header could be null or not.
          Given an external system generating events which header values can be null and a user configures
          Flume with Memory Channel then he will have no trouble.
          Later on when the user changes Memory Channel to File Channel then Flume will fail with NPE.
          It is because FC is serializing events with protocol buffer and header values are defined as
          required in the proto file.
          In this patch I have changed the value field to optional. However protocol buffer does not have
          a notation for null and setting a field to null raises NPE again. Added a null check before
          serialization to prevent this.
          There is on caveat: When an optional field is not set, at deserialization it will be set to a
          default value: in this case it will be empty string.

          Reviewers: Miklos Csanady

          (Marcell Hegedus via Denes Arvay)

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1e69fc7c29f104a2117a62de11cba9b2a2c740e1 in flume's branch refs/heads/trunk from Marcell Hegedus [ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=1e69fc7 ] FLUME-2620 . File Channel to support empty values in headers Flume user guide does not specify whether a value in event header could be null or not. Given an external system generating events which header values can be null and a user configures Flume with Memory Channel then he will have no trouble. Later on when the user changes Memory Channel to File Channel then Flume will fail with NPE. It is because FC is serializing events with protocol buffer and header values are defined as required in the proto file. In this patch I have changed the value field to optional. However protocol buffer does not have a notation for null and setting a field to null raises NPE again. Added a null check before serialization to prevent this. There is on caveat: When an optional field is not set, at deserialization it will be set to a default value: in this case it will be empty string. Reviewers: Miklos Csanady (Marcell Hegedus via Denes Arvay)
          Hide
          marcellhegedus Marcell Hegedus added a comment -

          Uploaded patch to reviewboard

          Show
          marcellhegedus Marcell Hegedus added a comment - Uploaded patch to reviewboard
          Hide
          roshan_naik Roshan Naik added a comment -

          Marcell Hegedus It is easier to review the patches if put on Review Board

          Show
          roshan_naik Roshan Naik added a comment - Marcell Hegedus It is easier to review the patches if put on Review Board
          Hide
          roshan_naik Roshan Naik added a comment -

          Marcell Hegedus i have added you as contributor. You should be able to go ahead and assign this JIRA yourself if you like.

          Show
          roshan_naik Roshan Naik added a comment - Marcell Hegedus i have added you as contributor. You should be able to go ahead and assign this JIRA yourself if you like.
          Hide
          marcellhegedus Marcell Hegedus added a comment -

          Attaching new patch

          Show
          marcellhegedus Marcell Hegedus added a comment - Attaching new patch
          Hide
          marcellhegedus Marcell Hegedus added a comment -

          There was no activity on this issue in the past 6 months. Neerja Khattar, do you mind if I assign it to me? Your proposed patch fixes the issue in HTTP Source however the issue lays in File Channel and can be reproduced with other sources (eg. JMS Source). I'd prefer if File Channel behaved similar to Memory Channel: accept null values in header.

          Show
          marcellhegedus Marcell Hegedus added a comment - There was no activity on this issue in the past 6 months. Neerja Khattar , do you mind if I assign it to me? Your proposed patch fixes the issue in HTTP Source however the issue lays in File Channel and can be reproduced with other sources (eg. JMS Source). I'd prefer if File Channel behaved similar to Memory Channel: accept null values in header.
          Hide
          mpercy Mike Percy added a comment -

          Also Jarek Jarcec Cecho regarding your comment back in May, I don't think the Flume data model really supports null values in the headers because the schema for the header values is Map<String, String>. I would be more comfortable always casting null -> empty string and even implicitly performing this case by accepting null values in the FileChannel and coercing them to empty strings. Thoughts?

          Show
          mpercy Mike Percy added a comment - Also Jarek Jarcec Cecho regarding your comment back in May, I don't think the Flume data model really supports null values in the headers because the schema for the header values is Map<String, String>. I would be more comfortable always casting null -> empty string and even implicitly performing this case by accepting null values in the FileChannel and coercing them to empty strings. Thoughts?
          Hide
          mpercy Mike Percy added a comment -

          The above is just my opinion, I am happy for others to chime in if they prefer this solution

          Show
          mpercy Mike Percy added a comment - The above is just my opinion, I am happy for others to chime in if they prefer this solution
          Hide
          mpercy Mike Percy added a comment -

          Neerja Khattar: See my comment on reviewboard from a few weeks ago. Here is what I wrote: This doesn't look like it changes the file channel at all. Isn't that the underlying issue? Couldn't the FileChannel allow a null value in a header, or treat it as an empty string? Maybe that would be a cleaner solution.

          Show
          mpercy Mike Percy added a comment - Neerja Khattar : See my comment on reviewboard from a few weeks ago. Here is what I wrote: This doesn't look like it changes the file channel at all. Isn't that the underlying issue? Couldn't the FileChannel allow a null value in a header, or treat it as an empty string? Maybe that would be a cleaner solution.
          Hide
          neerjakhattar Neerja Khattar added a comment -

          Jarek Jarcec Cecho any update?

          Show
          neerjakhattar Neerja Khattar added a comment - Jarek Jarcec Cecho any update?
          Hide
          neerjakhattar Neerja Khattar added a comment -

          Jarek Jarcec Cecho let me know if you need anything else. I updated the patch with doc info.

          Show
          neerjakhattar Neerja Khattar added a comment - Jarek Jarcec Cecho let me know if you need anything else. I updated the patch with doc info.
          Hide
          neerjakhattar Neerja Khattar added a comment -

          new patch is attached with doc change.

          Show
          neerjakhattar Neerja Khattar added a comment - new patch is attached with doc change.
          Hide
          neerjakhattar Neerja Khattar added a comment -

          updated patch attached

          Show
          neerjakhattar Neerja Khattar added a comment - updated patch attached
          Hide
          neerjakhattar Neerja Khattar added a comment -

          New patch is attached based on review board comments

          Show
          neerjakhattar Neerja Khattar added a comment - New patch is attached based on review board comments
          Hide
          neerjakhattar Neerja Khattar added a comment -

          new patch is attached ( fixed the review board comments)

          Show
          neerjakhattar Neerja Khattar added a comment - new patch is attached ( fixed the review board comments)
          Hide
          neerjakhattar Neerja Khattar added a comment -

          New patch is attached for review

          Show
          neerjakhattar Neerja Khattar added a comment - New patch is attached for review
          Hide
          jarcec Jarek Jarcec Cecho added a comment -

          I have two comments to the latest patch:

          • Our code style is to put opening bracket next to the previous statement. E.g. if(headerVal.getValue()==null) { on one single line rather then two. Can you please fix the method checkAndReplaceNullInHeaders to look like that?
          • Can we introduce a configuration option that will allow user to configure the default "replacement" value for null? I'm concerned that someone might need to distinguish the case of value being null or empty.
          Show
          jarcec Jarek Jarcec Cecho added a comment - I have two comments to the latest patch : Our code style is to put opening bracket next to the previous statement. E.g. if(headerVal.getValue()==null) { on one single line rather then two. Can you please fix the method checkAndReplaceNullInHeaders to look like that? Can we introduce a configuration option that will allow user to configure the default "replacement" value for null ? I'm concerned that someone might need to distinguish the case of value being null or empty.
          Hide
          neerjakhattar Neerja Khattar added a comment -

          New patch is attached with test cases included.

          Show
          neerjakhattar Neerja Khattar added a comment - New patch is attached with test cases included.
          Hide
          neerjakhattar Neerja Khattar added a comment -

          Patch is attached to handle null values in header

          for example:
          [{
          "headers" :

          { "timestamp" : "434324343", "host" : null }

          ,
          "body" : "random_body"
          }]

          This used to fail previously and now it works.

          Show
          neerjakhattar Neerja Khattar added a comment - Patch is attached to handle null values in header for example: [{ "headers" : { "timestamp" : "434324343", "host" : null } , "body" : "random_body" }] This used to fail previously and now it works.
          Hide
          hshreedharan Hari Shreedharan added a comment -

          I don't think it is, this is a minor fix. I will submit a patch soon.

          Show
          hshreedharan Hari Shreedharan added a comment - I don't think it is, this is a minor fix. I will submit a patch soon.

            People

            • Assignee:
              marcellhegedus Marcell Hegedus
              Reporter:
              smolav Santiago M. Mola
            • Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development