[CRUNCH-586] SparkPipeline does not work with HBaseSourceTarget - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.13.0
Fix Version/s: 0.14.0
Component/s: Spark
Labels:
None

Description

final Pipeline pipeline = new SparkPipeline("local", "crunchhbase", HBaseInputSource.class, conf);
final PTable<ImmutableBytesWritable, Result> read = pipeline.read(new HBaseSourceTarget("t1", new Scan()));

return an empty table, while it works with MRPipeline.
root cause is the combination of sparks getJavaRDDLike method:

source.configureSource(job, -1);
Converter converter = source.getConverter();
JavaPairRDD<?, ?> input = runtime.getSparkContext().newAPIHadoopRDD(
job.getConfiguration(),
CrunchInputFormat.class,
converter.getKeyClass(),
converter.getValueClass());
That assumes "CrunchInputFormat.class" (and always uses -1)
and hbase configureSoruce method:

if (inputId == -1)

{ job.setMapperClass(CrunchMapper.class); job.setInputFormatClass(inputBundle.getFormatClass()); inputBundle.configure(conf); }

else

{ Path dummy = new Path("/hbase/" + table); CrunchInputs.addInputPath(job, dummy, inputBundle, inputId); }

easiest solution I see, is always calling CrunchInputs.addInputPath, in every source.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

CRUNCH-586.patch
19/Jan/16 06:34
11 kB
Josh Wills

Activity

People

Assignee:: Josh Wills

Reporter:: Stefan De Smit

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 13/Jan/16 12:47

Updated:: 08/May/16 04:14

Resolved:: 23/Feb/16 18:45