Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21638

Warning message of RF is not accurate

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.3.0
    • 2.3.0
    • ML
    • None

    Description

      When train RF model, there is many warning message like this:

      WARN RandomForest: Tree learning is using approximately 268492800 bytes per iteration, which exceeds requested limit maxMemoryUsage=268435456. This allows splitting 2622 nodes in this iteration.

      This warning message is unnecessary and the data is not accurate.

      Actually, if all the nodes cannot split in one iteration, it will show this warning. For most of the case, all the nodes cannot split just in one iteration, so for most of the case, it will show this warning for each iteration.

      This is because:

      while (nodeStack.nonEmpty && (memUsage < maxMemoryUsage || memUsage == 0)) {
            val (treeIndex, node) = nodeStack.top
            // Choose subset of features for node (if subsampling).
            val featureSubset: Option[Array[Int]] = if (metadata.subsamplingFeatures) {
              Some(SamplingUtils.reservoirSampleAndCount(Range(0,
                metadata.numFeatures).iterator, metadata.numFeaturesPerNode, rng.nextLong())._1)
            } else {
              None
            }
            // Check if enough memory remains to add this node to the group.
            val nodeMemUsage = RandomForest.aggregateSizeForNode(metadata, featureSubset) * 8L
            if (memUsage + nodeMemUsage <= maxMemoryUsage || memUsage == 0) {
              nodeStack.pop()
              mutableNodesForGroup.getOrElseUpdate(treeIndex, new mutable.ArrayBuffer[LearningNode]()) +=
                node
              mutableTreeToNodeToIndexInfo
                .getOrElseUpdate(treeIndex, new mutable.HashMap[Int, NodeIndexInfo]())(node.id)
                = new NodeIndexInfo(numNodesInGroup, featureSubset)
            }
            numNodesInGroup += 1   //we not add the node to mutableNodesForGroup, but we add memUsage here.
            memUsage += nodeMemUsage
          }
          if (memUsage > maxMemoryUsage) {
            // If maxMemoryUsage is 0, we should still allow splitting 1 node.
            logWarning(s"Tree learning is using approximately $memUsage bytes per iteration, which" +
              s" exceeds requested limit maxMemoryUsage=$maxMemoryUsage. This allows splitting" +
              s" $numNodesInGroup nodes in this iteration.")
          }
      

      Attachments

        Activity

          People

            peng.meng@intel.com Peng Meng
            peng.meng@intel.com Peng Meng
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: