Details
-
IT Help
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
3.4.1
-
None
-
None
-
Python3.8
pyspark 3.4.1
operating system:Ubuntu 20.04
Description
If Hadoop is not deployed, PySpark APIs read data from OBS buckets and convert the data to RDD. How can I achieve it?
The following code reports an error: No FileSystem for scheme "obs",Can Spark read and write OBS without Hadoop installation and configuration?
And I'm not familiar with pyspark. Is the code wrong?
// code placeholder from pyspark import SparkConf from pyspark.sql import SparkSession conf = SparkConf() conf.set("spark.app.name", "read and write OBS") conf.set("spark.security.credentials.hbase.enabled", "true") conf.set("spark.hadoop.fs.obs.access.key", ak) conf.set("spark.hadoop.fs.obs.secret.key", sk) conf.set("spark.hadoop.fs.obs.endpoint", "http://xxx") spark = SparkSession.builder.config(conf=conf).getOrCreate() df = spark.read.json('obs://bucket_name/xxx.json') df.coalesce(2).write.json("obs://bucket_name/", "overwrite")