We have a very strange LLVM issue on our cluster which has been prohibiting us from using codegen for quite some time.
The symptoms are simple, execute a query which utilizing runtime code generation on a DOUBLE column, the query will not only fail, but crash all the daemons which took part in the query. The following behavior is observed when issuing the query from the shell
From an operations point of view, these nodes die off with no warning, or error. With GLOG set to 2, the last known message was that LLVM was going to be used.
As mentioned in the title, the crashes seem to only be reproducible with doubles.
Here is an example query with LLVM on and off, calculating the AVG of a column.
With GLOG = 2, here is the last observed line before the daemon has an unexpected crash.
The issue seems to only affect doubles, not only can I use AVG, MAX, etc.. with INTs, but I think this query shows it the best.
LLVM ON With INTS
LVM ON With INTS & DOUBLES
1 to 1.0
This bug only seems to occurr on this specific hardware and CentOS 5.x. I reformatted this node to CentOS 6.x and this issue effectively dissapeared, it seems to only affect this specific hardware on this family of Red Hat.
In addition, I attempted to reproduce this using systest, virtual machines, etc on CentOS 5.x and was unsuccessful.
Attached is a GLOG 2 version of an LLVM query failing. I tried to analyze a core dump, but since there is compiled code in there, GDB is unable to look into it due to a Unsupported JIT Version error...
With --module_output=/tmp/module.out added, module output is written on ALL LLVM queries except the one's that crash. I'm assuming those queries are crashing before it gets to the point of writing module output.