Uploaded image for project: 'Apache MADlib'
  1. Apache MADlib
  2. MADLIB-1270

Unexepcted behavior in vec2cols function

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • v1.15.1
    • Module: Utilities
    • None

    Description

      There is some unexpected behavior when vector column to be split contains different numbers of elements in the vectors. E.g.

      Input table:

      select * from test order by id;
      id | t
      ---+--------
      1 | {a,b}
      2 | {c,d}
      3 | {e,f}
      4 | {g,h,i}
      5 | {j}
      (5 rows)

       

      select madlib.vec2cols('test','test_out_5','t',array['c1','c2','c3'],'id');
      ERROR: plpy.Error: vec2cols: Mismatch between size of vector_col and number of cols in feature_names.
      CONTEXT: Traceback (most recent call last):
      PL/Python function "vec2cols", line 23, in <module>
      return vec2cols_obj.vec2cols(**globals())
      PL/Python function "vec2cols", line 149, in vec2cols
      PL/Python function "vec2cols", line 112, in get_names_for_split_output_cols
      PL/Python function "vec2cols", line 77, in _assert
      PL/Python function "vec2cols"

       

      select madlib.vec2cols('test','test_out_5','t',array['c1','c2'],'id');
      vec2cols
      ----------

      (1 row)

      select * from test_out_5 order by id;
      id | c1 | c2
      ---++-------
      1 | a | b
      2 | c | d
      3 | e | f
      4 | g | h
      5 | j |
      (5 rows)

       

       

      select madlib.vec2cols('test','test_out_6','t',array['c1'],'id');

      ERROR: plpy.Error: vec2cols: Mismatch between size of vector_col and number of cols in feature_names.
      CONTEXT: Traceback (most recent call last):
      PL/Python function "vec2cols", line 23, in <module>
      return vec2cols_obj.vec2cols(**globals())
      PL/Python function "vec2cols", line 149, in vec2cols
      PL/Python function "vec2cols", line 112, in get_names_for_split_output_cols
      PL/Python function "vec2cols", line 77, in _assert
      PL/Python function "vec2cols"

       

      — Update-----

      There are a couple of decisions to be made regarding supporting arrays of different lengths:
      -If we choose the array with maximal length in the vector_col, what do we do if the user's passed-in feature_names does not have the same number of elements?
      -What are the performance issues with looking through our vector_col for the array with maximal length?
      -How will we handle default feature names: will we create a feature name for every element of the longest array entry?

      Attachments

        Activity

          People

            Unassigned Unassigned
            rashmi.raghu@gmail.com Rashmi Raghu
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: