Uploaded image for project: 'Chukwa (retired)'
  1. Chukwa (retired)
  2. CHUKWA-444

Redefine Chukwa time series storage

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.5.0
    • Data Processors
    • None
    • Redhat EL 5.1, Java 6

    • Added HBaseWriter for storing time series data in HBase for faster random read/write.

    Description

      The current Chukwa Record format is not suitable for data visualization. It is more like an archive format which combines data from multiple sources (hosts), and group them into a sorted time partitioned sequence file. Most of people collected data for two reasons, archive and data analysis. The current chukwa record format is fine for archive, but it is not so great for data analysis. Data analysis could be further break down into two different types. 1) Data can be aggregated and summarized, such as metrics. 2) Data that can not be summarized, like job history. Type 1 data is useful for visualization by graph, and type 2 data is useful by plain text viewing or search for a particular event.

      By the above rational, it probably makes sense to restructure Chukwa Records for data analysis. Outside of Hadoop world, rrdtools is great for time series data storage, and optimized for metrics from a single source, i.e. a host. RRD data file fragments badly when there are hundred of thousands of sources. Chukwa time series data storage should be able to combine multiple data sources into one Chukwa file to combat file fragmentation problem.

      Attachments

        1. CHUKWA-444-2.patch
          62 kB
          Eric Yang

        Issue Links

          Activity

            People

              eyang Eric Yang
              eyang Eric Yang
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Issue deployment