Details

      Description

      The memory representation of Hdfs files in the catalog is highly inefficient and can be significantly improved. Currently, the Catalog uses ~400-500 bytes per THdfsFileDescriptor object which essentially includes: a) the file name and b) a list of THdfsFileBlocks. Every file block stores information about replicas, disks ids and whether the replica is cached or not. All that information is currently stored in Thrift objects and can be significantly compressed.

      Also, the catalog and the Impalad services spend a lot of time (and memory) serializing/deserializing Thrift objects. Using a more efficient serialization library (e.g. FlatBufffers) can significantly improve memory efficiency and speed while processing catalog updates.

        Activity

        Hide
        dtsirogiannis Dimitris Tsirogiannis added a comment -

        Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
        Reviewed-on: http://gerrit.cloudera.org:8080/6406
        Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
        Tested-by: Impala Public Jenkins

        M CMakeLists.txt
        A common/fbs/CMakeLists.txt
        A common/fbs/CatalogObjects.fbs
        M common/thrift/CatalogObjects.thrift
        M fe/CMakeLists.txt
        M fe/pom.xml
        M fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java
        M fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java
        M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
        M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
        M fe/src/main/java/org/apache/impala/catalog/Table.java
        M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
        M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java
        M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
        14 files changed, 572 insertions, 323 deletions

        Approvals:
        Impala Public Jenkins: Verified
        Dimitris Tsirogiannis: Looks good to me, approved

        Show
        dtsirogiannis Dimitris Tsirogiannis added a comment - Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 Reviewed-on: http://gerrit.cloudera.org:8080/6406 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Impala Public Jenkins — M CMakeLists.txt A common/fbs/CMakeLists.txt A common/fbs/CatalogObjects.fbs M common/thrift/CatalogObjects.thrift M fe/CMakeLists.txt M fe/pom.xml M fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java M fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java 14 files changed, 572 insertions , 323 deletions Approvals: Impala Public Jenkins: Verified Dimitris Tsirogiannis: Looks good to me, approved

          People

          • Assignee:
            dtsirogiannis Dimitris Tsirogiannis
            Reporter:
            dtsirogiannis Dimitris Tsirogiannis
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development