Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-12470 FLIP39: Flink ML pipeline and ML libs
  3. FLINK-13924

Add summarizer and summary for sparse vector and dense vector.

    XMLWordPrintableJSON

Details

    Description

      Summarizer is the class for calculating statistics, summary is the result class of summarizer. Summary defines methods to get statistics. Assuming that the data has dense vector and sparse vector, vectors size are not equal also, so if DenseVectorSummarizer visit a sparse vector, it will change to SparseVectorSummarizer.
      Statistics include vectorSize, count, mean, variance, min, max, standardDeviation, normL1, normL2.

      • Add SparseVectorSummarizer which will calculate statistics for sparse vector.
      • Add SparseVectorSummary which can get statistics of sparse vector.
      • Add DenseVectorSummarizer which will calculate statistics for dense vector.
      • Add DenseVectorSummary which can get statistics of sparse vector.
      • Add StatisticsUtil which provides utility functions for summarizer and summary.
      • Add VectorSummarizerUtil which provides utility functions for VectorSummarizer.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              xuyang1706 Xu Yang
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 10m
                  10m