Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5820

Unable to process mongodb gridfs collection data in Hadoop Mapreduce

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Invalid
    • 2.2.0
    • None
    • task
    • None
    • Hadoop, Mongodb

    Description

      I saved a 2GB pdf file into MongoDB using GridFS. now i want process those GridFS collection data using Java Spark Mapreduce. previously i have succesfully processed mongoDB collections with Hadoop mapreduce using Mongo-Hadoop connector. now i'm unable to handle binary data which is coming from input GridFS collections.

      MongoConfigUtil.setInputURI(config, "mongodb://localhost:27017/pdfbooks.fs.chunks" );
      MongoConfigUtil.setOutputURI(config,"mongodb://localhost:27017/"+output );
      JavaPairRDD<Object, BSONObject> mongoRDD = sc.newAPIHadoopRDD(config,
      com.mongodb.hadoop.MongoInputFormat.class, Object.class,
      BSONObject.class);
      JavaRDD<String> words = mongoRDD.flatMap(new FlatMapFunction<Tuple2<Object,BSONObject>,
      String>() {
      @Override
      public Iterable<String> call(Tuple2<Object, BSONObject> arg) {
      System.out.println(arg._2.toString());
      ...
      In the above code i'm accesing fs.chunks collection as input to my mapper. so mapper is taking it as BsonObject. but the problem is that input BSONObject data is in unreadable binary format. for example the above program "System.out.println(arg._2.toString());" statement giving following result:

      { "_id" :

      { "$oid" : "533e53048f0c8bcb0b3a7ff7"}

      , "files_id" :

      { "$oid" : "533e5303fac7a2e2c4afea08"}

      , "n" : 0 , "data" : <Binary Data>}

      How Do i print/access that data in readable format. Can i use GridFS Api to do that. if so please suggest me how to convert input BSONObject to GridFS object and other best ways to do...Thank you in Advance!!!

      Attachments

        Activity

          People

            Unassigned Unassigned
            vegi.sivaram sivaram
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: