Uploaded image for project: 'SystemDS'
  1. SystemDS
  2. SYSTEMDS-1379

Investigate script metadata to simplify MLContext script interaction

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Won't Fix
    • None
    • Not Applicable
    • Algorithms, APIs
    • None

    Description

      Currently many scripts contain usage comments such as the following:

      # THIS SCRIPT COMPUTES AN APPROXIMATE FACTORIZATIONOF A LOW-RANK MATRIX X INTO TWO MATRICES U AND V 
      # USING ALTERNATING-LEAST-SQUARES (ALS) ALGORITHM WITH CONJUGATE GRADIENT 
      # MATRICES U AND V ARE COMPUTED BY MINIMIZING A LOSS FUNCTION (WITH REGULARIZATION)
      #
      # INPUT   PARAMETERS:
      # ---------------------------------------------------------------------------------------------
      # NAME    TYPE     DEFAULT  MEANING
      # ---------------------------------------------------------------------------------------------
      # X       String   ---      Location to read the input matrix X to be factorized
      # U       String   ---      Location to write the factor matrix U
      # V       String   ---      Location to write the factor matrix V
      # rank    Int      10       Rank of the factorization
      # reg     String   "L2"	    Regularization: 
      #                           "L2" = L2 regularization;
      #                           "wL2" = weighted L2 regularization
      # lambda  Double   0.000001 Regularization parameter, no regularization if 0.0
      # maxi    Int      50       Maximum number of iterations
      # check   Boolean  FALSE    Check for convergence after every iteration, i.e., updating U and V once
      # thr     Double   0.0001   Assuming check is set to TRUE, the algorithm stops and convergence is declared 
      #                           if the decrease in loss in any two consecutive iterations falls below this threshold; 
      #                           if check is FALSE thr is ignored
      # fmt     String   "text"   The output format of the factor matrices L and R, such as "text" or "csv"
      # ---------------------------------------------------------------------------------------------
      # OUTPUT: 
      # 1- An m x r matrix U, where r is the factorization rank 
      # 2- An r x n matrix V
      #
      # HOW TO INVOKE THIS SCRIPT - EXAMPLE:
      # hadoop jar SystemML.jar -f ALS-CG.dml -nvargs X=INPUT_DIR/X U=OUTPUT_DIR/U V=OUTPUT_DIR/V rank=10 reg="L2" lambda=0.0001 fmt=csv
      

      Comments such as these are difficult to refer to from a programmatic interactive environment such as the Spark Shell. If similar information is provided in a parseable format, such as JSON or XML, it can potentially be parsed and used to provide such information programmatically, such as through the MLContext API in the Spark Shell.

      Attachments

        Activity

          People

            deron Jon Deron Eriksson
            deron Jon Deron Eriksson
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: