Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
3.0.0
-
None
-
20/06/29 07:52:19 WARN Utils: Your hostname, sanjeevs-MacBook-Pro-2.local resolves to a loopback address: 127.0.0.1; using 10.0.0.8 instead (on interface en0)
20/06/29 07:52:19 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
20/06/29 07:52:19 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
20/06/29 07:52:26 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
Spark context Web UI available at http://10.0.0.8:4041
Spark context available as 'sc' (master = local[*], app id = local-1593442346864).
Spark session available as 'spark'.
Welcome to
____ __
/ _/_ ___ ____/ /_
\ \/ _ \/ _ `/ __/ '/
/__/ ./_,// //_\ version 3.0.0
/_/Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_251)
Type in expressions to have them evaluated.
Type :help for more information.20/06/29 07:52:19 WARN Utils: Your hostname, sanjeevs-MacBook-Pro-2.local resolves to a loopback address: 127.0.0.1; using 10.0.0.8 instead (on interface en0) 20/06/29 07:52:19 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 20/06/29 07:52:19 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 20/06/29 07:52:26 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041. Spark context Web UI available at http://10.0.0.8:4041 Spark context available as 'sc' (master = local [*] , app id = local-1593442346864). Spark session available as 'spark'. Welcome to ____ __ / _ / _ ___ ____ / / _ \ \/ _ \/ _ `/ __/ ' / /__ / . /_, / / / /_\ version 3.0.0 /_/ Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_251) Type in expressions to have them evaluated. Type :help for more information.
Description
We are planning to move to Spark 3 but the read performance of our json files is unacceptable. Following is the performance numbers when compared to Spark 2.4
Spark 2.4
scala> spark.time(spark.read.json("/data/20200528"))
Time taken: 19691 ms
res61: org.apache.spark.sql.DataFrame = [created: bigint, id: string ... 5 more fields]
scala> spark.time(res61.count())
Time taken: 7113 ms
res64: Long = 2605349
Spark 3.0
scala> spark.time(spark.read.json("/data/20200528"))
20/06/29 08:06:53 WARN package: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.sql.debug.maxToStringFields'.
Time taken: 849652 ms
res0: org.apache.spark.sql.DataFrame = [created: bigint, id: string ... 5 more fields]
scala> spark.time(res0.count())
Time taken: 8201 ms
res2: Long = 2605349
I am attaching a sample data (please delete is once you are able to reproduce the issue) that is much smaller than the actual size but the performance comparison can still be verified.
The sample tar contains bunch of json.gz files, each line of the file is self contained json doc as shown below
To reproduce the issue please untar the attachment - it will have multiple .json.gz files whose contents will look similar to following
{"id":"954e7819e91a11e981f60050569979b6","created":1570463599492,"properties":\{"WANAccessType":"2","deviceClassifiers":["ARRIS HNC IGD","Annex F Gateway","Supports.Collect.Optimized.Workflow","Fast.Inform","Supports.TR98.Traceroute","InternetGatewayDevice:1.4","Motorola.ServiceType.IP","Supports Arris FastPath Speed Test","Arris.NVG468MQ.9.3.0h0","Wireless.Common.IGD.DualRadio","001E46.NVG468MQ.Is.WANIP","Device.Supports.HNC","Device.Type.RG","[Arris.NVG4xx.Missing.CA|http://arris.nvg4xx.missing.ca/]","Supports.TR98.IPPing","Arris.NVG468MQ.9.3.0+","Wireless","ARRIS HNC IGD EUROPA","Arris.NVG.Wireless","WLAN.Radios.Action.Common.TR098","VoiceService:1.0","ConnecticutDeviceTypes","Device.Supports.SpeedTest","Motorola.Device.Supports.VoIP","Arris.NVG468MQ","Motorola.device","CaptivePortal:1","Arris.NVG4xx","All.TR069.RG.Devices","TraceRoute:1","Arris.NVG4xx.9.3.0+","datamodel.igd","Arris.NVG4xxQ","IPPing:1","Device.ServiceType.IP","001E46.NVG468MQ.Is.WANEth","Arris.NVG468MQ.9.2.4+","broken.device.no.notification"],"deviceType":"IGD","firstInform":"1570463619543","groups":["Self-Service Diagnostics","SLF-SRVC_DGNSTCS000","TCW - NVG4xx - First Contact"],"hardwareVersion":"NVG468MQ_0200240031004E","hncEnable":"0","lastBoot":"1587765844155","lastInform":"1590624062260","lastPeriodic":"1590624062260","manufacturerName":"Motorola","modelName":"NVG468MQ","productClass":"NVG468MQ","protocolVersion":"cwmp10","provisioningCode":"","softwareVersion":"9.3.0h0d55","tags":["default"],"timeZone":"EST+5EDT,M3.2.0/2,M11.1.0/2","wan":\{"ethDuplexMode":"Full","ethSyncBitRate":"1000"},"wifi":\\{"0":{"Enable":"1","SSID":"Frontier3136","SSIDAdvertisementEnabled":"1"},"1":\\{"Enable":"0","SSID":"Guest3136","SSIDAdvertisementEnabled":"1"},"2":\\{"Enable":"0","SSID":"Frontier3136_D2","SSIDAdvertisementEnabled":"1"},"3":\\{"Enable":"0","SSID":"Frontier3136_D3","SSIDAdvertisementEnabled":"1"},"4":\\{"Enable":"1","SSID":"Frontier3136_5G","SSIDAdvertisementEnabled":"1"},"5":\\{"Enable":"0","SSID":"Guest3136_5G","SSIDAdvertisementEnabled":"1"},"6":\\{"Enable":"1","SSID":"Frontier3136_5G-TV","SSIDAdvertisementEnabled":"0"},"7":\\{"Enable":"0","SSID":"Frontier3136_5G_D2","SSIDAdvertisementEnabled":"1"}}},"ts":1590624062260}
{"id":"bf0448736d09e2e677ea383ef857d5bc","created":1517843609967,"properties":\{"WANAccessType":"2","arrisNvgDbCheck":"1:success","deviceClassifiers":["ARRIS HNC IGD","Annex F Gateway","Supports.Collect.Optimized.Workflow","Fast.Inform","InternetGatewayDevice:1.4","Supports.TR98.Traceroute","Supports Arris FastPath Speed Test","Motorola.ServiceType.IP","Arris.NVG468MQ.9.3.0h0","Wireless.Common.IGD.DualRadio","001E46.NVG468MQ.Is.WANIP","Device.Supports.HNC","Device.Type.RG","[Arris.NVG4xx.Missing.CA|http://arris.nvg4xx.missing.ca/]","Supports.TR98.IPPing","Arris.NVG468MQ.9.3.0+","Wireless","ARRIS HNC IGD EUROPA","Arris.NVG.Wireless","VoiceService:1.0","WLAN.Radios.Action.Common.TR098","ConnecticutDeviceTypes","Device.Supports.SpeedTest","Motorola.Device.Supports.VoIP","Arris.NVG468MQ","Motorola.device","CaptivePortal:1","Arris.NVG4xx","All.TR069.RG.Devices","TraceRoute:1","Arris.NVG4xx.9.3.0+","datamodel.igd","Arris.NVG4xxQ","IPPing:1","Device.ServiceType.IP","001E46.NVG468MQ.Is.WANEth","Arris.NVG468MQ.9.2.4+","broken.device.no.notification"],"deviceType":"IGD","firstInform":"1517843629132","groups":["Total Control","GPON_100M_100M","Self-Service Diagnostics","HSI","SLF-SRVC_DGNSTCS000","HS002","TTL_CNTRL000","GPN_100M_100M001"],"hardwareVersion":"NVG468MQ_0200240031004E","hncEnable":"0","lastBoot":"1590196375084","lastInform":"1590624060253","lastPeriodic":"1590624060253","manufacturerName":"Motorola","modelName":"NVG468MQ","productClass":"NVG468MQ","protocolVersion":"cwmp10","provisioningCode":"","softwareVersion":"9.3.0h0d55","tags":["default"],"timeZone":"EST+5EDT,M3.2.0/2,M11.1.0/2","wan":\{"ethDuplexMode":"Full","ethSyncBitRate":"1000"},"wifi":\\{"0":{"Enable":"1","SSID":"NE-TB12-GOAT-2G","SSIDAdvertisementEnabled":"1"},"1":\\{"Enable":"1","SSID":"TP-Link_extender_2.4GHz","SSIDAdvertisementEnabled":"1"},"2":\\{"Enable":"0","SSID":"Frontier5360_D2","SSIDAdvertisementEnabled":"1"},"3":\\{"Enable":"0","SSID":"Frontier5360_D3","SSIDAdvertisementEnabled":"1"},"4":\\{"Enable":"1","SSID":"NE-TB12-GOAT-5G","SSIDAdvertisementEnabled":"1"},"5":\\{"Enable":"0","SSID":"Guest5360_5G","SSIDAdvertisementEnabled":"1"},"6":\\{"Enable":"1","SSID":"Frontier5360_5G-TV","SSIDAdvertisementEnabled":"0"},"7":\\{"Enable":"0","SSID":"Frontier5360_5G_D2","SSIDAdvertisementEnabled":"1"}}},"ts":1590624060253}
{"id":"1512b1b8526211e9acf100505699063c","created":1553891682535,"properties":\{"WANAccessType":"2","arrisNvgDbCheck":"1:success","deviceClassifiers":["ARRIS HNC IGD","Annex F Gateway","Supports.Collect.Optimized.Workflow","Fast.Inform","InternetGatewayDevice:1.4","Supports.TR98.Traceroute","Motorola.ServiceType.IP","Supports Arris FastPath Speed Test","Arris.NVG468MQ.9.3.0h0","Wireless.Common.IGD.DualRadio","001E46.NVG468MQ.Is.WANIP","Device.Supports.HNC","[Arris.NVG4xx.Missing.CA|http://arris.nvg4xx.missing.ca/]","Device.Type.RG","Supports.TR98.IPPing","Arris.NVG468MQ.9.3.0+","Wireless","ARRIS HNC IGD EUROPA","Arris.NVG.Wireless","WLAN.Radios.Action.Common.TR098","VoiceService:1.0","ConnecticutDeviceTypes","Device.Supports.SpeedTest","Motorola.Device.Supports.VoIP","Arris.NVG468MQ","Motorola.device","Arris.NVG4xx","CaptivePortal:1","All.TR069.RG.Devices","TraceRoute:1","Arris.NVG4xx.9.3.0+","datamodel.igd","Arris.NVG4xxQ","IPPing:1","Device.ServiceType.IP","001E46.NVG468MQ.Is.WANEth","Arris.NVG468MQ.9.2.4+","broken.device.no.notification"],"deviceType":"IGD","firstInform":"1553891708717","groups":["Total Control","HSI","Self-Service Diagnostics","SLF-SRVC_DGNSTCS000","HS004","TTL_CNTRL000","TCW - NVG4xx - First Contact","GPON_200M_200M","TCW Enabled","GPN_200M_200M000"],"hardwareVersion":"NVG468MQ_0200240031004E","hncEnable":"1","lastBoot":"1590537703734","lastInform":"1590624061415","lastPeriodic":"1590624061415","manufacturerName":"Motorola","modelName":"NVG468MQ","productClass":"NVG468MQ","protocolVersion":"cwmp10","provisioningCode":"","softwareVersion":"9.3.0h0d55","tags":["default"],"timeZone":"EST+5EDT,M3.2.0/2,M11.1.0/2","wan":\{"ethDuplexMode":"Full","ethSyncBitRate":"1000"},"wifi":\\{"0":{"Enable":"1","SSID":"Frontier7968","SSIDAdvertisementEnabled":"1"},"1":\\{"Enable":"0","SSID":"Guest7968","SSIDAdvertisementEnabled":"1"},"2":\\{"Enable":"0","SSID":"Frontier7968_D2","SSIDAdvertisementEnabled":"1"},"3":\\{"Enable":"0","SSID":"Frontier7968_D3","SSIDAdvertisementEnabled":"1"},"4":\\{"Enable":"1","SSID":"Frontier7968","SSIDAdvertisementEnabled":"1"},"5":\\{"Enable":"0","SSID":"Guest7968_5G","SSIDAdvertisementEnabled":"1"},"6":\\{"Enable":"1","SSID":"Frontier7968_5G-TV","SSIDAdvertisementEnabled":"0"},"7":\\{"Enable":"0","SSID":"Frontier7968_5G_D2","SSIDAdvertisementEnabled":"1"}}},"ts":1590624061415}
Attachments
Attachments
Issue Links
- is caused by
-
SPARK-26246 Infer timestamp types from JSON
- Resolved
- links to