Description
Aaron Davidson has a full reproduction but he has found a case where the first run returns corrupted results, but the second case does not. The same does not occur when reading from HDFS a second time...
sql("SELECT lang, COUNT(*) AS cnt FROM tweetTable GROUP BY lang ORDER BY cnt DESC").collect.foreach(println)
[bg,16636]
[16266,16266]
[16223,16223]
[16161,16161]
[16047,16047]
[lt,11405]
[hu,11380]
[el,10845]
[da,10289]
[fi,10261]
[9897,9897]
[9765,9765]
[9751,9751]