Currently `CountVectorizer.scala` has the following requirement:
But this is not a necessary constraint. It should be able to function even for empty vocabulary case.
This also gives the ability to run the model over empty datasets. HashingTF works fine in such scenarios. CountVectorizer doesn't.
spark-user discussion reference: http://apache-spark-user-list.1001560.n3.nabble.com/Ability-to-have-CountVectorizerModel-vocab-as-empty-td38396.html