The methods WriteDictionaryPage(), CheckDictionarySizeLimit(), WriteValues(), and WriteValuesSpaced() in TypedColumnWriterImpl (cpp/src/parquet/column_writer.cc) perform dynamic_casts of the current_dict_ object to either DictEncoder or ValueEncoderType pointers. When calling WriteBatch() with a large number of values this is ok, but when writing batches of 1 (as when using the stream api), these dynamic casts can consume a great deal of cpu. Using gperftools against code I wrote to do a log structured merge of several parquet files, I measured the dynamic_casts taking as much as 25% of execution time.
By modifying TypedColumnWriterImpl to save downcasted observer pointers of the appropriate types, I was able to cut my execution time from 32 to 24 seconds, validating the gpertools results. I've attached a patch to show what I did.