Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4300

Use SIMD to speedup BloomFilter::Or

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Impala 2.6.0
    • Fix Version/s: None
    • Component/s: Backend
    • Labels:

      Description

      BloomFilter::Or can benefit from SIMD instructions to speedup merging the bloom filters

      void BloomFilter::Or(const TBloomFilter& in, TBloomFilter* out) {
        DCHECK(out != NULL);
        DCHECK_EQ(in.log_heap_space, out->log_heap_space);
        out->always_true |= in.always_true;
        if (out->always_true) {
          out->directory.resize(0);
          return;
        }
      
        for (int i = 0; i < in.directory.size(); ++i) out->directory[i] |= in.directory[i];
      }
      

      When the coordinator is merging the filters around 10-15% of the CPU is spent BloomFilter::Or

      impala::BloomFilter::Or(impala::TBloomFilter const&, impala::TBloomFilter*)  /opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.208/lib/impala/sbin-retail/impalad
       21.89 │30:   movzbl (%rax,%rbx,1),%r12d
       12.83 │      mov    0x10(%rbp),%rax
        2.65 │      mov    -0x8(%rax),%edx
        8.96 │      test   %edx,%edx
             │    ↓ js     4c
             │      mov    %r13,%rdi
             │    → callq  std::string::_M_leak_hard()@plt
             │      mov    0x10(%rbp),%rax
       12.74 │4c:   or     %r12b,(%rax,%rbx,1)
       35.67 │      add    $0x1,%rbx
        2.89 │      mov    0x10(%r14),%rax
        2.37 │      cmp    -0x18(%rax),%rbx
             │    ↑ jb     30
      

        Attachments

          Activity

            People

            • Assignee:
              jbapple Jim Apple
              Reporter:
              mmokhtar Mostafa Mokhtar
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: