Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21638

Warning message of RF is not accurate

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.3.0
    • Fix Version/s: 2.3.0
    • Component/s: ML
    • Labels:
      None

      Description

      When train RF model, there is many warning message like this:

      WARN RandomForest: Tree learning is using approximately 268492800 bytes per iteration, which exceeds requested limit maxMemoryUsage=268435456. This allows splitting 2622 nodes in this iteration.

      This warning message is unnecessary and the data is not accurate.

      Actually, if all the nodes cannot split in one iteration, it will show this warning. For most of the case, all the nodes cannot split just in one iteration, so for most of the case, it will show this warning for each iteration.

      This is because:

      while (nodeStack.nonEmpty && (memUsage < maxMemoryUsage || memUsage == 0)) {
            val (treeIndex, node) = nodeStack.top
            // Choose subset of features for node (if subsampling).
            val featureSubset: Option[Array[Int]] = if (metadata.subsamplingFeatures) {
              Some(SamplingUtils.reservoirSampleAndCount(Range(0,
                metadata.numFeatures).iterator, metadata.numFeaturesPerNode, rng.nextLong())._1)
            } else {
              None
            }
            // Check if enough memory remains to add this node to the group.
            val nodeMemUsage = RandomForest.aggregateSizeForNode(metadata, featureSubset) * 8L
            if (memUsage + nodeMemUsage <= maxMemoryUsage || memUsage == 0) {
              nodeStack.pop()
              mutableNodesForGroup.getOrElseUpdate(treeIndex, new mutable.ArrayBuffer[LearningNode]()) +=
                node
              mutableTreeToNodeToIndexInfo
                .getOrElseUpdate(treeIndex, new mutable.HashMap[Int, NodeIndexInfo]())(node.id)
                = new NodeIndexInfo(numNodesInGroup, featureSubset)
            }
            numNodesInGroup += 1   //we not add the node to mutableNodesForGroup, but we add memUsage here.
            memUsage += nodeMemUsage
          }
          if (memUsage > maxMemoryUsage) {
            // If maxMemoryUsage is 0, we should still allow splitting 1 node.
            logWarning(s"Tree learning is using approximately $memUsage bytes per iteration, which" +
              s" exceeds requested limit maxMemoryUsage=$maxMemoryUsage. This allows splitting" +
              s" $numNodesInGroup nodes in this iteration.")
          }
      

        Attachments

          Activity

            People

            • Assignee:
              peng.meng@intel.com Peng Meng
              Reporter:
              peng.meng@intel.com Peng Meng
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: