Details

    • Type: Improvement
    • Status: Reopened
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Java
    • Labels:

      Issue Links

        Activity

        Hide
        iamhumanbeing iamhumanbeing added a comment -

        reopen it

        Show
        iamhumanbeing iamhumanbeing added a comment - reopen it
        Hide
        owen.omalley Owen O'Malley added a comment -

        Is the ISA-L API-compatible with zlib or is the change bigger than that?

        Show
        owen.omalley Owen O'Malley added a comment - Is the ISA-L API-compatible with zlib or is the change bigger than that?
        Hide
        iamhumanbeing iamhumanbeing added a comment -

        the ISA-L API is compatible with ZLIB C API。We have tested it on Hive 1.2.1。For “insert table” SQL, 10% performance improvement。

        Show
        iamhumanbeing iamhumanbeing added a comment - the ISA-L API is compatible with ZLIB C API。We have tested it on Hive 1.2.1。For “insert table” SQL, 10% performance improvement。
        Hide
        gopalv Gopal V added a comment - - edited

        iamhumanbeing: the really interesting part is not the compressor, but the inflate_fast() performance - do you have any numbers on the read throughput gains?

        Show
        gopalv Gopal V added a comment - - edited iamhumanbeing : the really interesting part is not the compressor, but the inflate_fast() performance - do you have any numbers on the read throughput gains?
        Hide
        iamhumanbeing iamhumanbeing added a comment - - edited

        Gopal V:inflate performance--》 ZLIB:197MB/s;ISA-L:356MB/s

        perf@perf-master:~/git/isa-l/igzip$ ./igzip_inflate_perf part-00127
        isal_inflate_perf:
        Using igzip compression
        igzip_zlib_inflate_perf: part-00127 215 iterations
        file part-00127 - in_size=4632710 out_size=4632710 iter=215
        igzip_file: runtime = 5053893 usecs, bandwidth 949 MB in 5.0539 sec = 197.08 MB/s
        End of igzip_zlib_inflate_perf

        isal_inflate_stateless_perf: part-00127 215 iterations
        file part-00127 - in_size=4632710 out_size=4632710 iter=215
        igzip_file: runtime = 2794267 usecs, bandwidth 949 MB in 2.7943 sec = 356.46 MB/s
        End of isal_inflate_stateless_perf

        Show
        iamhumanbeing iamhumanbeing added a comment - - edited Gopal V :inflate performance--》 ZLIB:197MB/s;ISA-L:356MB/s perf@perf-master:~/git/isa-l/igzip$ ./igzip_inflate_perf part-00127 isal_inflate_perf: Using igzip compression igzip_zlib_inflate_perf: part-00127 215 iterations file part-00127 - in_size=4632710 out_size=4632710 iter=215 igzip_file: runtime = 5053893 usecs, bandwidth 949 MB in 5.0539 sec = 197.08 MB/s End of igzip_zlib_inflate_perf isal_inflate_stateless_perf: part-00127 215 iterations file part-00127 - in_size=4632710 out_size=4632710 iter=215 igzip_file: runtime = 2794267 usecs, bandwidth 949 MB in 2.7943 sec = 356.46 MB/s End of isal_inflate_stateless_perf
        Hide
        gopalv Gopal V added a comment -

        igzip is not testing the same codepath as ORC, because ORC does use a mix of LZ77 widths and tries to use faster decode loops, at lower compression levels (it uses different zlib combinations for different column types).

        https://github.com/apache/orc/blob/master/java/core/src/java/org/apache/orc/impl/ZlibCodec.java#L138

        Does ISA-L offer any meaningful speedups for those combinations of Zlib-compatible algorithms?

        Show
        gopalv Gopal V added a comment - igzip is not testing the same codepath as ORC, because ORC does use a mix of LZ77 widths and tries to use faster decode loops, at lower compression levels (it uses different zlib combinations for different column types). https://github.com/apache/orc/blob/master/java/core/src/java/org/apache/orc/impl/ZlibCodec.java#L138 Does ISA-L offer any meaningful speedups for those combinations of Zlib-compatible algorithms?
        Hide
        iamhumanbeing iamhumanbeing added a comment - - edited

        Gopal V:got orc-benchmark data for ISA-L
        ColumnProjectionBenchmark.orc isal taxi avgt 3 944503.179 卤 43834.261 us/op
        ColumnProjectionBenchmark.orc zlib taxi avgt 3 1029682.551 卤 26364.565 us/op

        FullReadBenchmark.orc isal taxi avgt 3 14192224.371 卤 180230.436 us/op
        FullReadBenchmark.orc zlib taxi avgt 3 16234465.657 卤 415264.953 us/op

        seems 8%-14% speedups

        file size almost the same
        rw-rr- 1 pengxiao pengxiao 495492595 8鏈▒ 15 14:21 orc.isal
        rw-r-r- 1 pengxiao pengxiao 491068985 8鏈▒ 15 09:09 orc.zlib

        Show
        iamhumanbeing iamhumanbeing added a comment - - edited Gopal V :got orc-benchmark data for ISA-L ColumnProjectionBenchmark.orc isal taxi avgt 3 944503.179 卤 43834.261 us/op ColumnProjectionBenchmark.orc zlib taxi avgt 3 1029682.551 卤 26364.565 us/op FullReadBenchmark.orc isal taxi avgt 3 14192224.371 卤 180230.436 us/op FullReadBenchmark.orc zlib taxi avgt 3 16234465.657 卤 415264.953 us/op seems 8%-14% speedups file size almost the same rw-r r - 1 pengxiao pengxiao 495492595 8鏈▒ 15 14:21 orc.isal rw-r- r - 1 pengxiao pengxiao 491068985 8鏈▒ 15 09:09 orc.zlib
        Hide
        iamhumanbeing iamhumanbeing added a comment -
        Show
        iamhumanbeing iamhumanbeing added a comment - Owen O'Malley
        Show
        iamhumanbeing iamhumanbeing added a comment - https://github.com/apache/orc/pull/159
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user t3rmin4t0r commented on a diff in the pull request:

        https://github.com/apache/orc/pull/159#discussion_r134427978

        — Diff: java/bench/src/java/org/apache/orc/bench/CompressionKind.java —
        @@ -53,6 +54,8 @@ public OutputStream create(OutputStream out) throws IOException {
        return new GZIPOutputStream(out);
        case SNAPPY:
        return new SnappyCodec().createOutputStream(out);
        + case ISAL:
        — End diff –

        ISAL is just the library for gzip? That doesn't need a new codec - have you tried LD_PRELOAD to load ISAL instead of libz?

        Show
        githubbot ASF GitHub Bot added a comment - Github user t3rmin4t0r commented on a diff in the pull request: https://github.com/apache/orc/pull/159#discussion_r134427978 — Diff: java/bench/src/java/org/apache/orc/bench/CompressionKind.java — @@ -53,6 +54,8 @@ public OutputStream create(OutputStream out) throws IOException { return new GZIPOutputStream(out); case SNAPPY: return new SnappyCodec().createOutputStream(out); + case ISAL: — End diff – ISAL is just the library for gzip? That doesn't need a new codec - have you tried LD_PRELOAD to load ISAL instead of libz?
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user iamhumanbeing commented on a diff in the pull request:

        https://github.com/apache/orc/pull/159#discussion_r134650087

        — Diff: java/bench/src/java/org/apache/orc/bench/CompressionKind.java —
        @@ -53,6 +54,8 @@ public OutputStream create(OutputStream out) throws IOException {
        return new GZIPOutputStream(out);
        case SNAPPY:
        return new SnappyCodec().createOutputStream(out);
        + case ISAL:
        — End diff –

        1. igzip is only part of the ISAL.
        2. igzip only support level 0 - level 1 compression, it can not replace libz totally.
        3. igzip's API can not replace ligz's API directly.

        Show
        githubbot ASF GitHub Bot added a comment - Github user iamhumanbeing commented on a diff in the pull request: https://github.com/apache/orc/pull/159#discussion_r134650087 — Diff: java/bench/src/java/org/apache/orc/bench/CompressionKind.java — @@ -53,6 +54,8 @@ public OutputStream create(OutputStream out) throws IOException { return new GZIPOutputStream(out); case SNAPPY: return new SnappyCodec().createOutputStream(out); + case ISAL: — End diff – 1. igzip is only part of the ISAL. 2. igzip only support level 0 - level 1 compression, it can not replace libz totally. 3. igzip's API can not replace ligz's API directly.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user 10110346 commented on the issue:

        https://github.com/apache/orc/pull/159

        LGTM

        Show
        githubbot ASF GitHub Bot added a comment - Github user 10110346 commented on the issue: https://github.com/apache/orc/pull/159 LGTM
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user omalley commented on the issue:

        https://github.com/apache/orc/pull/159

        Is the ISA-L zlib support sufficient to read and write the ORC files with zlib compression? I agree with Gopal that it doesn't feel like a separate compression codec.

        I'm don't think it is a good idea to build in support for proprietary compression formats. If it is just an optimization on performance, that might be workable. As a unique codec, it isn't.

        Show
        githubbot ASF GitHub Bot added a comment - Github user omalley commented on the issue: https://github.com/apache/orc/pull/159 Is the ISA-L zlib support sufficient to read and write the ORC files with zlib compression? I agree with Gopal that it doesn't feel like a separate compression codec. I'm don't think it is a good idea to build in support for proprietary compression formats. If it is just an optimization on performance, that might be workable. As a unique codec, it isn't.
        Hide
        gopalv Gopal V added a comment -

        The fact that the underlying codec is compatible means that we should be able to extend the decompression speedups to all users who already use Zlib for their storage - which is a big advantage if it can be pulled off.

        If Hadoop's ZlibCodec could be built to work against ISA-L, then all of the projects would get ISA-L support at the same time.

        That might be much easier than trying to build it into ORC alone, since hadoop-common already has a partial dependency on ISA-L and JNI code which depends on it.

        https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/CMakeLists.txt#L126

        find_library(ISAL_LIBRARY
            NAMES isal
             PATHS ${CUSTOM_ISAL_PREFIX} ${CUSTOM_ISAL_PREFIX}/lib
                      ${CUSTOM_ISAL_PREFIX}/lib64 ${CUSTOM_ISAL_LIB} /usr/lib)
        
        Show
        gopalv Gopal V added a comment - The fact that the underlying codec is compatible means that we should be able to extend the decompression speedups to all users who already use Zlib for their storage - which is a big advantage if it can be pulled off. If Hadoop's ZlibCodec could be built to work against ISA-L, then all of the projects would get ISA-L support at the same time. That might be much easier than trying to build it into ORC alone, since hadoop-common already has a partial dependency on ISA-L and JNI code which depends on it. https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/CMakeLists.txt#L126 find_library(ISAL_LIBRARY NAMES isal PATHS ${CUSTOM_ISAL_PREFIX} ${CUSTOM_ISAL_PREFIX}/lib ${CUSTOM_ISAL_PREFIX}/lib64 ${CUSTOM_ISAL_LIB} /usr/lib)
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user iamhumanbeing commented on the issue:

        https://github.com/apache/orc/pull/159

        @omalley: How about we add a compiling option which use ISA-L ZLIB support for ZLIB compression&decompression? this option is just an optimization on performance.

        Show
        githubbot ASF GitHub Bot added a comment - Github user iamhumanbeing commented on the issue: https://github.com/apache/orc/pull/159 @omalley: How about we add a compiling option which use ISA-L ZLIB support for ZLIB compression&decompression? this option is just an optimization on performance.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user xndai commented on the issue:

        https://github.com/apache/orc/pull/159

        @iamhumanbeing did you compare it with zstd? Based on my experience, zstd is way better than igzip. I would expect a similar result with ISA-L. It doesn't seem to be adding a lot of value if we plan to support zstd in near future.

        Show
        githubbot ASF GitHub Bot added a comment - Github user xndai commented on the issue: https://github.com/apache/orc/pull/159 @iamhumanbeing did you compare it with zstd? Based on my experience, zstd is way better than igzip. I would expect a similar result with ISA-L. It doesn't seem to be adding a lot of value if we plan to support zstd in near future.

          People

          • Assignee:
            iamhumanbeing iamhumanbeing
            Reporter:
            iamhumanbeing iamhumanbeing
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:

              Time Tracking

              Estimated:
              Original Estimate - 336h
              336h
              Remaining:
              Remaining Estimate - 336h
              336h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development