Uploaded image for project: 'Flume'
  1. Flume
  2. FLUME-2517

Performance issue: SimpleDateFormat constructor takes 30% of HDFSEventSink.process()

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.5.0.1
    • Fix Version/s: 1.6.0
    • Component/s: Sinks+Sources
    • Labels:
    • Environment:

      linux i686
      java version "1.7.0_45"

      Description

      I started investigating why HDFS sink has so bad throughput in v 1.5.0.0. It seems to be better in 1.6.0.0 (current trunk).

      PseudoTx channel was filling up, because HDFS Sink could not write as fast as data coming from source.

      Profiling from jconsole revealed that 30% of the time spent in HDFSEventSink.process() method is taken by constructing SimpleDateFormat objects. SimpleDateFormat object is notoriously a heavy and time consuming object to create. It is also not thread-safe.

      It is used in HDFS Sink to calculate the path that contains date-time wildcards. I will provide a patch to cache SimpleDateFormat objects for thread. With this patch, the PseudoTx channel I used for testing was not constantly filling up, and throughput was much better.

        Attachments

        1. flume_2517.patch
          2 kB
          Pal Konyves
        2. flume_2517.png
          660 kB
          Pal Konyves

          Activity

            People

            • Assignee:
              pkonyves Pal Konyves
              Reporter:
              pkonyves Pal Konyves
            • Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: