Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-9395

Short Names for repeated Hbase Column names

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 3.2.0
    • Fix Version/s: None
    • Component/s: ATSv2
    • Labels:
      None

      Description

      Currently ATS HBase tables stores the config name / metric name as column names which are long. This repeats for all the rows and consumes lot of storage space. And we have seen Customers Hbase Tables already consumes more than 1.5 TB in few days

      Example Configs:
      c:yarn.timeline-service.webapp.rest-csrf.methods-to-ignore
      c:yarn.timeline-service.entity-group-fs-store.active-dir
      c:yarn.scheduler.configuration.zk-store.parent-path
      
      Example Metrics:
      m:REDUCE:org.apache.hadoop.mapreduce.FileSystemCounter:HDFS_READ_OPS
      m:REDUCE:org.apache.hadoop.mapreduce.TaskCounter:COMBINE_INPUT_RECORDS
      m:REDUCE:org.apache.hadoop.mapreduce.TaskCounter:PHYSICAL_MEMORY_BYTES
      

      We need to use short column names as per Hbase Best Practice - http://moi.vonos.net/bigdata/avro-hbase-colnames/ But the challenge is ATS does not know the column names until the rows get inserted. We can provide a mapping file to map the repeated configs / metrics / info from different applications to unique numbers which customers can configure upfront to save the storage space. Similar to what Phoenix does

      https://blogs.apache.org/phoenix/entry/column-mapping-and-immutable-data
      https://phoenix.apache.org/columnencoding.html

        Attachments

          Activity

            People

            • Assignee:
              prabhujoseph Prabhu Joseph
              Reporter:
              prabhujoseph Prabhu Joseph
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated: