Details
-
Sub-task
-
Status: Closed
-
Major
-
Resolution: Won't Do
-
None
-
None
Description
We provide summary statistics for Table through Summarizer. User can easily get the total count and the basic column-wise metrics: max, min, mean, variance, standardDeviation, normL1, normL2, the number of missing values and the number of valid values.
SparkML has same function, http://spark.apache.org/docs/latest/ml-statistics.html#summarizer
Example
String[] colNames = new String[]{"id", "height", "weight"}; Row[] data = new Row[]{ Row.of(1, 168, 48.1), Row.of(2, 165, 45.8), Row.of(3, 160, 45.3), Row.of(4, 163, 41.9), Row.of(5, 149, 40.5), }; Table input = MLSession.createBatchTable(data, colNames); TableSummary summary = new Summarizer(input).collectResult(); System.out.println(summary.mean("height")); System.out.println(summary);