Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
hi MADLib developers,
we have been trying to use MADlib on Greenplum to in-database perform linear regression calculation on a large amount of data (789,626,243 rows of data, segmented in ~475,000 groups). However, after running the following SQL statement for a little bit more than ten minutes, the following error message occurs:
SQL statement:
SELECT madlib.linregr_train(
'xinos_plus_case_dlinterference_v2.temp_neighbor_pair_cqi_prb_nonull',
'xinos_plus_case_dlinterference_v2.taipei_lm_result_temp',
'average_cqi', 'array[1, prb_utilization]',
'main_lnbts_id,main_lncel_id,lnbts_id,lncel_id');
Error message:
ERROR: plpy.SPIError: Function "madlib.linregr_merge_states(madlib.bytea8,madlib.bytea8)": ByteString improperly aligned for alignment request in seek(). (UDF_impl.hpp:210) (seg2 59-120-199-107.HINET-IP.hinet.net:50002 pid=9137) (plpython.c:4648)
If we downsize the input data to 269837688 rows, then the same SQL statement can run with successful result.
We are not sure if what we encountered here is a bug or an issue with how we use this MADLib linear regression function and we will appreciate it a lot if you could give us some pointers.
We are willing to provide more information about input data (e.g. data schema) for further investigation if needed.
thank you very much for taking care of this issue.
David