Description
When the left outer join is performed on a non-empty JavaPairRDD with a JavaPairRDD which was created with the emptyRDD() method the resulting RDD is empty. In the following unit test the latest assert fails.
import static org.assertj.core.api.Assertions.assertThat; import java.util.Collections; import lombok.val; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaSparkContext; import org.junit.Test; import scala.Tuple2; public class SparkTest { @Test public void joinEmptyRDDTest() { val sparkConf = new SparkConf().setAppName("test").setMaster("local"); try (val sparkContext = new JavaSparkContext(sparkConf)) { val oneRdd = sparkContext.parallelize(Collections.singletonList("one")); val twoRdd = sparkContext.parallelize(Collections.singletonList("two")); val threeRdd = sparkContext.emptyRDD(); val onePair = oneRdd.mapToPair(t -> new Tuple2<Integer, String>(1, t)); val twoPair = twoRdd.groupBy(t -> 1); val threePair = threeRdd.groupBy(t -> 1); assertThat(onePair.leftOuterJoin(twoPair).collect()).isNotEmpty(); assertThat(onePair.leftOuterJoin(threePair).collect()).isNotEmpty(); } } }
Attachments
Issue Links
- is duplicated by
-
SPARK-8048 Explicit partitionning of an RDD with 0 partition will yield empty outer join
- Resolved
- links to