Description
We initially created the common tokens files (top 20k tokens by document frequency) in Wikipedia with Lucene 6.x. We should rerun that code with an updated Lucene on the off chance that there are slight changes in tokenization.
While doing this work, I found a trivial bug in filtering common tokens that we should fix as well.