Uploaded image for project: 'Crunch (Retired)'
  1. Crunch (Retired)
  2. CRUNCH-586

SparkPipeline does not work with HBaseSourceTarget

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.13.0
    • 0.14.0
    • Spark
    • None

    Description

      final Pipeline pipeline = new SparkPipeline("local", "crunchhbase", HBaseInputSource.class, conf);
      final PTable<ImmutableBytesWritable, Result> read = pipeline.read(new HBaseSourceTarget("t1", new Scan()));

      return an empty table, while it works with MRPipeline.
      root cause is the combination of sparks getJavaRDDLike method:

      source.configureSource(job, -1);
      Converter converter = source.getConverter();
      JavaPairRDD<?, ?> input = runtime.getSparkContext().newAPIHadoopRDD(
      job.getConfiguration(),
      CrunchInputFormat.class,
      converter.getKeyClass(),
      converter.getValueClass());
      That assumes "CrunchInputFormat.class" (and always uses -1)
      and hbase configureSoruce method:

      if (inputId == -1)

      { job.setMapperClass(CrunchMapper.class); job.setInputFormatClass(inputBundle.getFormatClass()); inputBundle.configure(conf); }

      else

      { Path dummy = new Path("/hbase/" + table); CrunchInputs.addInputPath(job, dummy, inputBundle, inputId); }

      easiest solution I see, is always calling CrunchInputs.addInputPath, in every source.

      Attachments

        1. CRUNCH-586.patch
          11 kB
          Josh Wills

        Activity

          People

            jwills Josh Wills
            desmit Stefan De Smit
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: