With Lucene's default codec, when writing dimensional points, we only give BKDWriter 16 MB heap to use for sorting, regardless of how large IW's indexing buffer is. A custom codec can change this but that's a little steep.
I've been testing indexing performance on a points-heavy dataset, 1.2 billion taxi rides from http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml , indexing with a 1 GB IW buffer, and the small 16 MB heap limit causes clear performance problems because flushing the large segments forces BKDwriter to switch to offline sorting which causes the DWPTs take too long to flush. They then fall behind, and Lucene does a hard stall on incoming indexing threads until they catch up.
Robert Muir had a simple idea to let IW pass the allowed temp heap usage to PointsWriter.writeField.