Chukwa
  1. Chukwa
  2. CHUKWA-30

Remove HDFS flush & connection holding (Collector)

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.2.0
    • Fix Version/s: 0.2.0
    • Component/s: Data Collection
    • Labels:
      None
    1. CHUKWA-30-3.patch
      29 kB
      Jerome Boulon
    2. CHUKWA-30-2.patch
      23 kB
      Jerome Boulon
    3. CHUKWA-30.patch
      21 kB
      Jerome Boulon

      Activity

      Hide
      Hudson added a comment -
      Show
      Hudson added a comment - Integrated in Chukwa-trunk #45 (See http://hudson.zones.apache.org/hudson/job/Chukwa-trunk/45/ )
      Hide
      Hudson added a comment -

      Integrated in Chukwa-trunk #36 (See http://hudson.zones.apache.org/hudson/job/Chukwa-trunk/36/)
      . Missed some files.

      Show
      Hudson added a comment - Integrated in Chukwa-trunk #36 (See http://hudson.zones.apache.org/hudson/job/Chukwa-trunk/36/ ) . Missed some files.
      Hide
      Hudson added a comment -

      Integrated in Chukwa-trunk #35 (See http://hudson.zones.apache.org/hudson/job/Chukwa-trunk/35/)
      . Remove HDFS flush and connection holding. (Contributed by Jerome Boulon)

      Show
      Hudson added a comment - Integrated in Chukwa-trunk #35 (See http://hudson.zones.apache.org/hudson/job/Chukwa-trunk/35/ ) . Remove HDFS flush and connection holding. (Contributed by Jerome Boulon)
      Hide
      Ari Rabkin added a comment -

      I just committed this. Thanks, Jerome!

      Show
      Ari Rabkin added a comment - I just committed this. Thanks, Jerome!
      Hide
      Ari Rabkin added a comment -

      Gonna push this for 0.2.0

      Show
      Ari Rabkin added a comment - Gonna push this for 0.2.0
      Hide
      Ari Rabkin added a comment -

      Looks alright. +1 to commit to trunk.

      Show
      Ari Rabkin added a comment - Looks alright. +1 to commit to trunk.
      Hide
      Jerome Boulon added a comment -

      Same patch + test case that validate SeqFileWriter output against LocalWriter output.

      Show
      Jerome Boulon added a comment - Same patch + test case that validate SeqFileWriter output against LocalWriter output.
      Hide
      Jerome Boulon added a comment -

      Remove static for seqFileWriter and the associated lock

      Show
      Jerome Boulon added a comment - Remove static for seqFileWriter and the associated lock
      Hide
      Jerome Boulon added a comment -

      Theoretically, yes, in practice it's not currently possible.
      I have access o the FSOutputStream so I can call flush and it should have just flush to disk except that the HDFS implementation is not doing anything on the flush instead there's a flushbuffer method that write to disk but that one is private or protected ...

      I gonna see with the HDFS team if there's a way to flush without too much overhead because the flushbuffer also compute the crc. If that's possible then we can extends the class or play with introspection to get access to this method ... but yes, I want to be able to flush after every post so I'll work on that direction.

      Show
      Jerome Boulon added a comment - Theoretically, yes, in practice it's not currently possible. I have access o the FSOutputStream so I can call flush and it should have just flush to disk except that the HDFS implementation is not doing anything on the flush instead there's a flushbuffer method that write to disk but that one is private or protected ... I gonna see with the HDFS team if there's a way to flush without too much overhead because the flushbuffer also compute the crc. If that's possible then we can extends the class or play with introspection to get access to this method ... but yes, I want to be able to flush after every post so I'll work on that direction.
      Hide
      Ari Rabkin added a comment -

      Rock.

      I think for now it makes sense to leave the direct-to-HDFS as default until we've done a bit more testing. I was suggesting that we add all the relevant properties to chukwa-collector-conf.xml. They shouldn't be commented out, but they should have innocuous defaults.

      10X performance gain is impressive! Is there some way we can actually force data to local disk before returning Acks, without losing all that speedup?

      Show
      Ari Rabkin added a comment - Rock. I think for now it makes sense to leave the direct-to-HDFS as default until we've done a bit more testing. I was suggesting that we add all the relevant properties to chukwa-collector-conf.xml. They shouldn't be commented out, but they should have innocuous defaults. 10X performance gain is impressive! Is there some way we can actually force data to local disk before returning Acks, without losing all that speedup?
      Hide
      Jerome Boulon added a comment -

      Unit testing the collector is difficult since we don't have a end-to-end testing tools but this is something we are going to work on.
      That been said, this code is running for one week now collecting System metrics from 3700 machines.

      What do you mean by "add the necessary/optional conf options to chukwa-collector-conf.xml.template"?
      Activate the new Writer in place of the current one or just add all properties but comment the xml block?

      We had to remove the 10 seconds lock (hdfs flush) for performance reason.
      Then the reason of writing to local first is because local file system tend to be more reliable than writing cross network and because we have a use case where people want to use the DataCollection pipeline but without HDFS at all.

      This give me a 10X improvement compare to the default writer. In order to collect System Metrics from 3700 machines I had to have 5 collectors running and data was still late.
      With the new collector with only one instance running, I've been able to handle all SM for all machines from a single collector.
      Also, Demux is more efficient since I have fewer and bigger dataSink files.

      Show
      Jerome Boulon added a comment - Unit testing the collector is difficult since we don't have a end-to-end testing tools but this is something we are going to work on. That been said, this code is running for one week now collecting System metrics from 3700 machines. What do you mean by "add the necessary/optional conf options to chukwa-collector-conf.xml.template"? Activate the new Writer in place of the current one or just add all properties but comment the xml block? We had to remove the 10 seconds lock (hdfs flush) for performance reason. Then the reason of writing to local first is because local file system tend to be more reliable than writing cross network and because we have a use case where people want to use the DataCollection pipeline but without HDFS at all. This give me a 10X improvement compare to the default writer. In order to collect System Metrics from 3700 machines I had to have 5 collectors running and data was still late. With the new collector with only one instance running, I've been able to handle all SM for all machines from a single collector. Also, Demux is more efficient since I have fewer and bigger dataSink files.
      Hide
      Ari Rabkin added a comment -

      Looks promising. Couple comments:

      • Why are seqFileWriter and the associated lock static? This makes it harder to unit test and seems unnecessary.
      • Why aren't there unit tests?
      • Can you add the necessary/optional conf options to chukwa-collector-conf.xml.template?
      • What's the point of doing local buffering if it doesn't give us any better reliability semantics?
      Show
      Ari Rabkin added a comment - Looks promising. Couple comments: Why are seqFileWriter and the associated lock static? This makes it harder to unit test and seems unnecessary. Why aren't there unit tests? Can you add the necessary/optional conf options to chukwa-collector-conf.xml.template? What's the point of doing local buffering if it doesn't give us any better reliability semantics?
      Hide
      Jerome Boulon added a comment -

      Add a new Writer class.
      This new writer will first write to local then move the final to HDFS without waiting for the data to be written to disk before sending back the acknowledge.
      See java doc for more information on how to use it.

      Show
      Jerome Boulon added a comment - Add a new Writer class. This new writer will first write to local then move the final to HDFS without waiting for the data to be written to disk before sending back the acknowledge. See java doc for more information on how to use it.

        People

        • Assignee:
          Jerome Boulon
          Reporter:
          Jerome Boulon
        • Votes:
          0 Vote for this issue
          Watchers:
          1 Start watching this issue

          Dates

          • Created:
            Updated:
            Resolved:

            Development