XMLWordPrintableJSON

Details

    Description

      We provide summary statistics for Table through Summarizer. User can easily get the total count and the basic column-wise metrics: max, min, mean, variance, standardDeviation, normL1, normL2, the number of missing values and the number of valid values.

      SparkML has same function, http://spark.apache.org/docs/latest/ml-statistics.html#summarizer

       

      Example
      
              String[] colNames = new String[]{"id", "height", "weight"};
      
              Row[] data = new Row[]{
                  Row.of(1, 168, 48.1),
                  Row.of(2, 165, 45.8),
                  Row.of(3, 160, 45.3),
                  Row.of(4, 163, 41.9),
                  Row.of(5, 149, 40.5),
              };
      
              Table input = MLSession.createBatchTable(data, colNames);
      
              TableSummary summary = new Summarizer(input).collectResult();
      
              System.out.println(summary.mean("height"));
      
              System.out.println(summary);
      

       

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            xuyang1706 Xu Yang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: