[SPARK-14567] Add instrumentation logs to MLlib training algorithms - ASF JIRA

Attach files

Attach Screenshot

Voters

Watch issue

Watchers

Create sub-task

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Umbrella
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.2.0
Component/s: ML, MLlib
Labels:
None

Description

In order to debug performance issues when training mllib algorithms,
it is useful to log some metrics about the training dataset, the training parameters, etc.

This ticket is an umbrella to add some simple logging messages to the most common MLlib estimators. There should be no performance impact on the current implementation, and the output is simply printed in the logs.

Here are some values that are of interest when debugging training tasks:

number of features
number of instances
number of partitions
number of classes
input RDD/DF cache level
hyper-parameters

Attachments

Sub-Tasks

Create Sub-Task

1.	Log instrumentation in logistic regression as a first task	Resolved	Timothy Hunter	Actions
2.	Log instrumentation in KMeans	Resolved	Xin Ren	Actions
3.	Log instrumentation in Random forests	Resolved	Benjamin Fradet	Actions
4.	Log instrumentation in ALS	Resolved	Miao Wang	Actions
5.	Log instrumentation in GBTs	Resolved	Seth Hendrickson	Actions
6.	Log instrumentation in GMM	Resolved	Ruifeng Zheng	Actions
7.	Log instrumentation in OneVsRest, CrossValidator, TrainValidationSplit	Resolved	Sue Ann Hong	Actions
8.	Log instrumentation in CrossValidator	Closed	Unassigned	Actions
9.	Log instrumentation in MPC, NB, LDA, AFT, GLR, Isotonic, LinReg	Resolved	Ruifeng Zheng	Actions

Activity

Comment

This comment will be Viewable by All Users Viewable by All Users

Cancel

People

Assignee:: Timothy Hunter

Reporter:: Timothy Hunter

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 12/Apr/16 18:34

Updated:: 17/Jan/17 23:40

Resolved:: 17/Jan/17 23:40

Agile

View on Board

Add instrumentation logs to MLlib training algorithms

Details

Description

Attachments

Attachments

Sub-Tasks

Activity

People

Dates

Agile

Slack

Issue deployment