Uploaded image for project: 'Kylin'
  1. Kylin
  2. KYLIN-998

Finish the hive intermediate table clean up job in org.apache.kylin.job.hadoop.cube.StorageCleanupJob

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • v0.7.1, v0.7.2
    • v1.1
    • Storage - HBase
    • None

    Description

      Current kylin has its last cube building job step named “Garbage Collection” to remove the intermediate data in hdfs/hbase/hive. But if the job is accidentally stopped like problem in hadoop cluster, bad cube design, discarded by user, the data was left un-deleted.

      In such cases, we can run "hbase org.apache.hadoop.util.RunJar $KYLIN_HOME/lib/kylin-job-0.8.1-incubating-SNAPSHOT.jar org.apache.kylin.job.hadoop.cube.StorageCleanupJob --delete true" to remove the data. But the method "cleanUnusedIntermediateHiveTable" is unfinished.

      My first patch is to finish the method, it will remove unused hive tables with names begin with "kylin_intermediate_".

      My second patch add some methods to enable deleting unused data with uuids in command line, or stored in a file.

      I don't know whether the second patch is useful to you, it's used in our kylin server to remove data after one cube is deleted.

      Attachments

        1. KYLIN-998-UUIDS.patch
          19 kB
          nichunen
        2. KYLIN-998-0.8-v3.patch
          6 kB
          nichunen
        3. KYLIN-998-0.7-staging-v3.patch
          6 kB
          nichunen
        4. KYLIN-998-0.8.patch
          3 kB
          Shao Feng Shi
        5. KYLIN-998-0.7-staging.patch
          4 kB
          Shao Feng Shi

        Activity

          People

            nichunen nichunen
            nichunen nichunen
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: