Uploaded image for project: 'Accumulo'
  1. Accumulo
  2. ACCUMULO-4730

Create an Entry length summarizer

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.0.0
    • None

    Description

      It would be very useful to have a built in Summarizer that computes summary information about field lengths. Specifically key length, row length, family length, qualifier length, visibility length, and value length. Whatever stats are computed must be able to computed incrementally. For example can incrementally compute min, max, count, sum, and log2 histogram. I think these would be good stats to start with. Count and sum can be used to compute the average. There is an example of computing a log2 histogram in the Summarizer javadoc.

      The Summarizer could be named EntryLenghtSummarizer and possibly produce summaries like the following.

      count=XXX     //do not need to track this per field, its the same for all
      key.min=XXX
      key.max=XXX
      key.sum=XXX
      key.logHist.8=XXX   //only output non zero exponents 
      key.logHist.9=XXX
      row.min=XXX
      row.max=XXX
      row.sum=XXX
      row.logHist.7=XXX
      row.logHist.8=XXX
      row.logHist.10=XXX
      family.min=XXX
      family.max=XXX
      family.sum=XXX
      family.logHist.6=XXX
      family.logHist.7=XXX
      etc...
      

      This new summarizer would be placed in the summarizers package.

      Attachments

        Issue Links

          Activity

            People

              jkrdev Jared R
              kturner Keith Turner
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3h 20m
                  3h 20m