Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
None
-
None
Description
There is some unexpected behavior when vector column to be split contains different numbers of elements in the vectors. E.g.
Input table:
select * from test order by id;
id | t
---+--------
1 | {a,b}
2 | {c,d}
3 | {e,f}
4 | {g,h,i}
5 | {j}
(5 rows)
select madlib.vec2cols('test','test_out_5','t',array['c1','c2','c3'],'id');
ERROR: plpy.Error: vec2cols: Mismatch between size of vector_col and number of cols in feature_names.
CONTEXT: Traceback (most recent call last):
PL/Python function "vec2cols", line 23, in <module>
return vec2cols_obj.vec2cols(**globals())
PL/Python function "vec2cols", line 149, in vec2cols
PL/Python function "vec2cols", line 112, in get_names_for_split_output_cols
PL/Python function "vec2cols", line 77, in _assert
PL/Python function "vec2cols"
select madlib.vec2cols('test','test_out_5','t',array['c1','c2'],'id');
vec2cols
----------
(1 row)
select * from test_out_5 order by id;
id | c1 | c2
---++-------
1 | a | b
2 | c | d
3 | e | f
4 | g | h
5 | j |
(5 rows)
select madlib.vec2cols('test','test_out_6','t',array['c1'],'id');
ERROR: plpy.Error: vec2cols: Mismatch between size of vector_col and number of cols in feature_names.
CONTEXT: Traceback (most recent call last):
PL/Python function "vec2cols", line 23, in <module>
return vec2cols_obj.vec2cols(**globals())
PL/Python function "vec2cols", line 149, in vec2cols
PL/Python function "vec2cols", line 112, in get_names_for_split_output_cols
PL/Python function "vec2cols", line 77, in _assert
PL/Python function "vec2cols"
— Update-----
There are a couple of decisions to be made regarding supporting arrays of different lengths:
-If we choose the array with maximal length in the vector_col, what do we do if the user's passed-in feature_names does not have the same number of elements?
-What are the performance issues with looking through our vector_col for the array with maximal length?
-How will we handle default feature names: will we create a feature name for every element of the longest array entry?