Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-7214

Update Impala docs to reflect coordinator/executor separation and decoupling from DataNodes.

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Impala 2.12.0
    • Fix Version/s: Impala 3.2.0
    • Component/s: Docs
    • Labels:
      None
    • Epic Color:
      ghx-label-6

      Description

      The docs tend to conflate DataNodes (a HDFS service) and Impala daemons. I think this stems from the original deployment practice of always colocating Impala daemons with HDFS datanodes so that HDFS data could always be read from a local DataNode.

      I'm a bit pedantic so the conflation feels wrong to me regardless, but I think this will become increasingly confusing as alternative deployments without colocated HDFS DataNodes become more common (e.g. running against S3, running with a separate HDFS service).

      E.g. picking an example at random:

              In Impala 1.4.0 and higher, the <codeph>LIMIT</codeph> clause is now optional (rather than required) for
              queries that use the <codeph>ORDER BY</codeph> clause. Impala automatically uses a temporary disk work area
              to perform the sort if the sort operation would otherwise exceed the Impala memory limit for a particular
              DataNode.
      

      This is wrong because the memory limit is for an Impala daemon, which is the process that does the actual sorting. So here I think it should be "Impala daemon" instead of "DataNode".

        Attachments

          Activity

            People

            • Assignee:
              arodoni Alexandra Rodoni
              Reporter:
              tarmstrong Tim Armstrong
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: