Details
-
Improvement
-
Status: Resolved
-
Critical
-
Resolution: Done
-
0.9.0
-
None
-
Java 1.7,Hadoop 2.2.0,Spark 0.9.0,Ubuntu 12.4,
Description
I saved a 2GB pdf file into MongoDB using GridFS. now i want process those GridFS collection data using Java Spark Mapreduce API. previously i have successfully processed mongoDB collections with Apache spark using Mongo-Hadoop connector. now i'm unable to GridFS collections with the following code.
MongoConfigUtil.setInputURI(config, "mongodb://localhost:27017/pdfbooks.fs.chunks" );
MongoConfigUtil.setOutputURI(config,"mongodb://localhost:27017/"+output );
JavaPairRDD<Object, BSONObject> mongoRDD = sc.newAPIHadoopRDD(config,
com.mongodb.hadoop.MongoInputFormat.class, Object.class,
BSONObject.class);
JavaRDD<String> words = mongoRDD.flatMap(new FlatMapFunction<Tuple2<Object,BSONObject>,
String>() {
@Override
public Iterable<String> call(Tuple2<Object, BSONObject> arg) {
System.out.println(arg._2.toString());
...
Please suggest/provide better API methods to access MongoDB GridFS data.