Aaron Davidson has a full reproduction but he has found a case where the first run returns corrupted results, but the second case does not. The same does not occur when reading from HDFS a second time...
sql("SELECT lang, COUNT(*) AS cnt FROM tweetTable GROUP BY lang ORDER BY cnt DESC").collect.foreach(println)