Spark duplicates returned datasets when `path` serde is present in a parquet table.
Confirmed versions affected: Spark 2.2, Spark 2.3, Spark 2.4.
Confirmed unaffected versions: Spark 2.1 and earlier (tested with Spark 1.6 at least).
(all is good at this point, now exist session and run in Hive for example - )
So LOCATION and serde `path` property would point to the same location.
Now see count returns two records instead of one:
Also notice that the presence of `path` serde property makes TABLE location
show up twice -
We have some applications that create parquet tables in Hive with `path` serde property
and it makes data duplicate in query results.
Hive, Impala etc and Spark version 2.1 and earlier read such tables fine, but not Spark 2.2 and later releases.