Uploaded image for project: 'Tajo'
  1. Tajo
  2. TAJO-421

Improve split for compression file

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.8.0
    • Component/s: Storage
    • Labels:
      None

      Description

      If compressed file size less than the hdfs block size, the volume information can use.

      1. TAJO-421_2.patch
        1 kB
        Jinho Kim
      2. TAJO-421.patch
        1 kB
        Jinho Kim

        Activity

        Hide
        jhkim Jinho Kim added a comment -

        I've verified 'mvn clean install' and TPC-H 1,3.

        Show
        jhkim Jinho Kim added a comment - I've verified 'mvn clean install' and TPC-H 1,3.
        Hide
        blrunner Jaehwa Jung added a comment -

        +1 for the patch.

        'mvn clean install' verified successfully.
        Ship it.

        Show
        blrunner Jaehwa Jung added a comment - +1 for the patch. 'mvn clean install' verified successfully. Ship it.
        Hide
        jihoonson Jihoon Son added a comment -

        Wait please.
        In this patch, the block size can be different according to the first BlockStorageLocation.
        It would be better to get the block size from the configuration as follows.

        conf.getInt("dfs.block.size", -1);
        
        Show
        jihoonson Jihoon Son added a comment - Wait please. In this patch, the block size can be different according to the first BlockStorageLocation. It would be better to get the block size from the configuration as follows. conf.getInt( "dfs.block.size" , -1);
        Hide
        jhkim Jinho Kim added a comment -

        Jihoon,
        This should need real blockSize.
        A user can be change each file.

        Show
        jhkim Jinho Kim added a comment - Jihoon, This should need real blockSize. A user can be change each file.
        Hide
        jihoonson Jihoon Son added a comment -

        Sorry, it is hard to understand for me.
        Would you please add more detailed explanation about the issue?

        Also, although the block size is get from the first BlockStorageLocation, it looks to be

        blockStorageLocations[0].getLength()

        instead of

        blockStorageLocations[0].getLength() - blockStorageLocations[0].getOffset()

        .

        Show
        jihoonson Jihoon Son added a comment - Sorry, it is hard to understand for me. Would you please add more detailed explanation about the issue? Also, although the block size is get from the first BlockStorageLocation, it looks to be blockStorageLocations[0].getLength() instead of blockStorageLocations[0].getLength() - blockStorageLocations[0].getOffset() .
        Hide
        jhkim Jinho Kim added a comment - - edited

        Thank you for nice finding.
        In the current implementation, compression text file only support non-split
        it can't use disk volume scheduling. but If compressed file size less than a block size, we can use volume scheduling.

        ex) test.snappy
        
        hdfs block size : 64MB
        disk volume : 1
        file size <= 64 MB
        
        Show
        jhkim Jinho Kim added a comment - - edited Thank you for nice finding. In the current implementation, compression text file only support non-split it can't use disk volume scheduling. but If compressed file size less than a block size, we can use volume scheduling. ex) test.snappy hdfs block size : 64MB disk volume : 1 file size <= 64 MB
        Hide
        jihoonson Jihoon Son added a comment -

        Thanks, Jinho.
        +1 for the latest patch.

        Show
        jihoonson Jihoon Son added a comment - Thanks, Jinho. +1 for the latest patch.
        Hide
        jhkim Jinho Kim added a comment -

        Thanks. all guys.
        I've just committed it.

        Show
        jhkim Jinho Kim added a comment - Thanks. all guys. I've just committed it.
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Tajo-trunk-postcommit #623 (See https://builds.apache.org/job/Tajo-trunk-postcommit/623/)
        TAJO-421: Improve split for compression file. (jinho) (jinossy: https://git-wip-us.apache.org/repos/asf?p=incubator-tajo.git&a=commit&h=14160face45894a73f742f528e87b5f8ec2e10b9)

        • CHANGES.txt
        • tajo-core/tajo-core-storage/src/main/java/org/apache/tajo/storage/AbstractStorageManager.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Tajo-trunk-postcommit #623 (See https://builds.apache.org/job/Tajo-trunk-postcommit/623/ ) TAJO-421 : Improve split for compression file. (jinho) (jinossy: https://git-wip-us.apache.org/repos/asf?p=incubator-tajo.git&a=commit&h=14160face45894a73f742f528e87b5f8ec2e10b9 ) CHANGES.txt tajo-core/tajo-core-storage/src/main/java/org/apache/tajo/storage/AbstractStorageManager.java

          People

          • Assignee:
            jhkim Jinho Kim
            Reporter:
            jhkim Jinho Kim
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development