Description
Currently, SerializeRowBlock operates column-by-column, which means we have to iterate over the selection bitmap once for each column. This code isn't particularly well optimized – in TPCH Q6, about 10% of CPU is spent in BitmapFindFirst. We should look at alternate implementations here that better amortize the bitmap iteration cost across all of the columns and generally micro-optimize it.