Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-14567

Add instrumentation logs to MLlib training algorithms

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Umbrella
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.2.0
    • ML, MLlib
    • None

    Description

      In order to debug performance issues when training mllib algorithms,
      it is useful to log some metrics about the training dataset, the training parameters, etc.

      This ticket is an umbrella to add some simple logging messages to the most common MLlib estimators. There should be no performance impact on the current implementation, and the output is simply printed in the logs.

      Here are some values that are of interest when debugging training tasks:

      • number of features
      • number of instances
      • number of partitions
      • number of classes
      • input RDD/DF cache level
      • hyper-parameters

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            timhunter Timothy Hunter
            timhunter Timothy Hunter
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment