Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-1443

Unable to Access MongoDB GridFS data with Spark using mongo-hadoop API

    Details

      Description

      I saved a 2GB pdf file into MongoDB using GridFS. now i want process those GridFS collection data using Java Spark Mapreduce API. previously i have successfully processed mongoDB collections with Apache spark using Mongo-Hadoop connector. now i'm unable to GridFS collections with the following code.

      MongoConfigUtil.setInputURI(config, "mongodb://localhost:27017/pdfbooks.fs.chunks" );
      MongoConfigUtil.setOutputURI(config,"mongodb://localhost:27017/"+output );
      JavaPairRDD<Object, BSONObject> mongoRDD = sc.newAPIHadoopRDD(config,
      com.mongodb.hadoop.MongoInputFormat.class, Object.class,
      BSONObject.class);
      JavaRDD<String> words = mongoRDD.flatMap(new FlatMapFunction<Tuple2<Object,BSONObject>,
      String>() {
      @Override
      public Iterable<String> call(Tuple2<Object, BSONObject> arg) {
      System.out.println(arg._2.toString());
      ...
      Please suggest/provide better API methods to access MongoDB GridFS data.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              PavanKumarVarma Pavan Kumar Varma
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 12h
                12h
                Remaining:
                Remaining Estimate - 12h
                12h
                Logged:
                Time Spent - Not Specified
                Not Specified