Details

      Description

      Given that unaligned memory accesses have been getting faster on modern architectures, we should revisit our tuple memory layout which adds padding to avoid unaligned accesses.

      The code for computing our mem layout is in TupldDescriptor.java, and changes in the layout need to be reflected in descriptors.cc TupleDescriptor::GenerateLlvmStruct().

      I did a simple experiment (diff attached), where we switch to a packed layout, and the results look encouraging.

      Perf results vs. cdh5-trunk are here:

      TPCH-300
      http://sandbox.jenkins.cloudera.com/view/Impala/view/Cluster%20Runs/view/10-node-cdh5/job/impala-workload-runner-10node-cdh5/1239/

      TPCDS-500
      http://sandbox.jenkins.cloudera.com/view/Impala/view/Cluster%20Runs/view/10-node-cdh5/job/impala-workload-runner-10node-cdh5/1238/

      I think we could further optimize the layout by organizing the slots in descending order of their size, and by putting the null bits last. We could also pack var-len slots into 12 bytes (4 byte len + ptr) instead of 16 (4 byte len padded + ptr)

        Attachments

          Activity

            People

            • Assignee:
              alex.behm Alexander Behm
              Reporter:
              alex.behm Alexander Behm
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: