Uploaded image for project: 'Apache MADlib'
  1. Apache MADlib
  2. MADLIB-819

LDA: better error messaging for invalid test data

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:

      Description

      Related to MADLIB-658

      When the datatypes of docid, wordid in test table for LDA predict are not INT4 then the lda_predict function returns an error. The message returned, however, is confusing and unclear. This needs to be improved to reflect the actual problem.

      • Input to reproduce:
        -- vocabulary table
        CREATE TABLE my_vocab(wordid INT4, word TEXT);
        INSERT INTO my_vocab VALUES
            (0, 'code'), (1, 'data'), (2, 'graph'), (3, 'image'),
            (4, 'input'), (5, 'layer'), (6, 'learner'), (7, 'loss'),
            (8, 'model'), (9, 'network'), (10, 'neuron'), (11, 'object'),
            (12, 'output'), (13, 'rate'), (14, 'set'), (15, 'signal'),
            (16, 'sparse'), (17, 'spatial'), (18, 'system'), (19, 'training');
        
        -- training data table
        CREATE TABLE my_training
        (
            docid INT4,
            wordid INT4,
            count INT4
        );
        INSERT INTO my_training VALUES
            (0, 0, 2),(0, 3, 2),(0, 5, 1),(0, 7, 1),(0, 8, 1),(0, 9, 1),(0, 11, 1),
            (0, 13, 1), (1, 0, 1),(1, 3, 1),(1, 4, 1),(1, 5, 1),(1, 6, 1),(1, 7, 1),
            (1, 10, 1),(1, 14, 1),(1, 17, 1),(1, 18, 1), (2, 4, 2),(2, 5, 1),(2, 6, 2),
            (2, 12, 1),(2, 13, 1),(2, 15, 1),(2, 18, 2), (3, 0, 1),(3, 1, 2),(3, 12, 3),
            (3, 16, 1),(3, 17, 2),(3, 19, 1), (4, 1, 1),(4, 2, 1),(4, 3, 1),(4, 5, 1),
            (4, 6, 1),(4, 10, 1),(4, 11, 1),(4, 14, 1),(4, 18, 1),(4, 19, 1), (5, 0, 1),
            (5, 2, 1),(5, 5, 1),(5, 7, 1),(5, 10, 1),(5, 12, 1),(5, 16, 1),(5, 18, 1),
            (5, 19, 2),(6, 1, 1),(6, 3, 1),(6, 12, 2),(6, 13, 1),(6, 14, 2),(6, 15, 1),
            (6, 16, 1),(6, 17, 1), (7, 0, 1),(7, 2, 1),(7, 4, 1),(7, 5, 1),(7, 7, 2),
            (7, 8, 1),(7, 11, 1),(7, 14, 1),(7, 16, 1), (8, 2, 1),(8, 4, 4),(8, 6, 2),
            (8, 11, 1),(8, 15, 1),(8, 18, 1), (9, 0, 1),(9, 1, 1),(9, 4, 1),(9, 9, 2),
            (9, 12, 2),(9, 15, 1),(9, 18, 1),(9, 19, 1);
        
        -- testing data
        DROP TABLE IF EXISTS my_testing;
        CREATE TABLE my_testing
        (
            docid float8,  -- same error if this is TEXT
            wordid float8, -- same error if this is TEXT
            count INT4
        );
        INSERT INTO my_testing VALUES
            (0, 0, 2),(0, 8, 1),(0, 9, 1),(0, 10, 1),(0, 12, 1),(0, 15, 2),(0, 18, 1),
            (0, 19, 1), (1, 0, 1),(1, 2, 1),(1, 5, 1),(1, 7, 1),(1, 12, 2),(1, 13, 1),
            (1, 16, 1),(1, 17, 1),(1, 18, 1), (2, 0, 1),(2, 1, 1),(2, 2, 1),(2, 3, 1),
            (2, 4, 1),(2, 5, 1),(2, 6, 1),(2, 12, 1),(2, 14, 1),(2, 18, 1), (3, 2, 2),
            (3, 6, 2),(3, 7, 1),(3, 9, 1),(3, 11, 2),(3, 14, 1),(3, 15, 1), (4, 1, 1),
            (4, 2, 2),(4, 3, 1),(4, 5, 2),(4, 6, 1),(4, 11, 1),(4, 18, 2);
        
        SELECT madlib.lda_train( 'my_training',
                                 'my_model',
                                 'my_outdata',
                                 20,
                                 5,
                                 10,
                                 5,
                                 0.01
                               );
        
        SELECT madlib.lda_predict( 'my_testing',
                                   'my_model',
                                   'my_pred'
                                 );
        
      • Ouptut error:
        # ERROR:  plpy.Error: the my_testing must exist and should have docid, wordid, and count (plpython.c:4648)
        DETAIL:  columns
        CONTEXT:  Traceback (most recent call last):
          PL/Python function "lda_predict", line 22, in <module>
            lda.lda_predict(schema_madlib, data_table, model_table, output_table)
          PL/Python function "lda_predict", line 453, in lda_predict
          PL/Python function "lda_predict", line 938, in __validate_data_table
          PL/Python function "lda_predict", line 929, in __check_data_table
          PL/Python function "lda_predict", line 894, in __assert
        PL/Python function "lda_predict"
        madlibtest=# 
        

        Attachments

          Activity

            People

            • Assignee:
              riyer Rahul Iyer
              Reporter:
              riyer Rahul Iyer
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated: