Uploaded image for project: 'Crunch'
  1. Crunch
  2. CRUNCH-586

SparkPipeline does not work with HBaseSourceTarget

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.13.0
    • Fix Version/s: 0.14.0
    • Component/s: Spark
    • Labels:
      None

      Description

      final Pipeline pipeline = new SparkPipeline("local", "crunchhbase", HBaseInputSource.class, conf);
      final PTable<ImmutableBytesWritable, Result> read = pipeline.read(new HBaseSourceTarget("t1", new Scan()));

      return an empty table, while it works with MRPipeline.
      root cause is the combination of sparks getJavaRDDLike method:

      source.configureSource(job, -1);
      Converter converter = source.getConverter();
      JavaPairRDD<?, ?> input = runtime.getSparkContext().newAPIHadoopRDD(
      job.getConfiguration(),
      CrunchInputFormat.class,
      converter.getKeyClass(),
      converter.getValueClass());
      That assumes "CrunchInputFormat.class" (and always uses -1)
      and hbase configureSoruce method:

      if (inputId == -1)

      { job.setMapperClass(CrunchMapper.class); job.setInputFormatClass(inputBundle.getFormatClass()); inputBundle.configure(conf); }

      else

      { Path dummy = new Path("/hbase/" + table); CrunchInputs.addInputPath(job, dummy, inputBundle, inputId); }

      easiest solution I see, is always calling CrunchInputs.addInputPath, in every source.

        Attachments

        1. CRUNCH-586.patch
          11 kB
          Josh Wills

          Activity

            People

            • Assignee:
              jwills Josh Wills
              Reporter:
              desmit Stefan De Smit
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: