Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-17842

S3a parquet reads slow with Spark on Kubernetes (EKS)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Works for Me
    • 3.2.0
    • None
    • fs/s3
    • None

    Description

      I am trying to read parquet saved in S3 via Spark on EKS using hadoop-AWS 3.2.0. There are 112 partitions (each around 130MB) for a particular month.

       

      The data is being read but very very slowly. I just keep seeing below and very small dataset actually being fetched.

       

      21/08/09 05:07:05 DEBUG Executor task launch worker for task 60.0 in stage 3.0 (TID 63) Invoker: Values passed - text: read on s3a://uat1-prp-rftu-25-045552507264-us-east-1/xxxx/yyyy/zzzz/table_fact_mtd_c/ptn_val_txt=20200229/part-00012-32dbfb10-b43c-4066-a70e-d3575ea530d5-c000.snappy.parquet, idempotent: true, Retried: org.apache.hadoop.fs.s3a.S3AFileSystem$$Lambda$1199/2130521693@5259f9d0, Operation:org.apache.hadoop.fs.s3a.Invoker$$Lambda$1239/37396157@454de3d3

      21/08/09 05:07:05 DEBUG Executor task launch worker for task 60.0 in stage 3.0 (TID 63) Invoker: retryUntranslated begin

      21/08/09 05:07:05 DEBUG Executor task launch worker for task 60.0 in stage 3.0 (TID 63) Invoker: Values passed - text: lazySeek on s3a://uat1-prp-rftu-25-045552507264-us-east-1/xxxx/yyyy/zzzz/table_fact_mtd_c/ptn_val_txt=20200229/part-00012-32dbfb10-b43c-4066-a70e-d3575ea530d5-c000.snappy.parquet, idempotent: true, Retried: org.apache.hadoop.fs.s3a.S3AFileSystem$$Lambda$1199/2130521693@5259f9d0, Operation:org.apache.hadoop.fs.s3a.Invoker$$Lambda$1239/37396157@3776ef6c

      21/08/09 05:07:05 DEBUG Executor task launch worker for task 60.0 in stage 3.0 (TID 63) Invoker: retryUntranslated begin

      21/08/09 05:07:05 DEBUG Executor task launch worker for task 60.0 in stage 3.0 (TID 63) Invoker: Values passed - text: read on s3a://uat1-prp-rftu-25-045552507264-us-east-1/xxxx/yyyy/zzzz/table_fact_mtd_c/ptn_val_txt=20200229/part-00012-32dbfb10-b43c-4066-a70e-d3575ea530d5-c000.snappy.parquet, idempotent: true, Retried: org.apache.hadoop.fs.s3a.S3AFileSystem$$Lambda$1199/2130521693@5259f9d0, Operation:org.apache.hadoop.fs.s3a.Invoker$$Lambda$1239/37396157@3602676a

      21/08/09 05:07:05 DEBUG Executor task launch worker for task 60.0 in stage 3.0 (TID 63) Invoker: retryUntranslated begin

       

      Here is the spark config for hadoop-aws.

      spark.hadoop.fs.s3a.assumed.role.sts.endpoint: https://sts.amazonaws.com
      spark.hadoop.fs.s3a.assumed.role.sts.endpoint.region: us-east-1
      spark.hadoop.fs.s3a.attempts.maximum: 20
      spark.hadoop.fs.s3a.aws.credentials.provider: org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider
      spark.hadoop.fs.s3a.block.size: 128M
      spark.hadoop.fs.s3a.connection.establish.timeout: 50000
      spark.hadoop.fs.s3a.connection.maximum: 50
      spark.hadoop.fs.s3a.connection.ssl.enabled: true
      spark.hadoop.fs.s3a.connection.timeout: 2000000
      spark.hadoop.fs.s3a.endpoint: s3.us-east-1.amazonaws.com
      spark.hadoop.fs.s3a.etag.checksum.enabled: false
      spark.hadoop.fs.s3a.experimental.input.fadvise: normal
      spark.hadoop.fs.s3a.fast.buffer.size: 1048576
      spark.hadoop.fs.s3a.fast.upload: true
      spark.hadoop.fs.s3a.fast.upload.active.blocks: 8
      spark.hadoop.fs.s3a.fast.upload.buffer: bytebuffer
      spark.hadoop.fs.s3a.impl: org.apache.hadoop.fs.s3a.S3AFileSystem
      spark.hadoop.fs.s3a.list.version: 2
      spark.hadoop.fs.s3a.max.total.tasks: 30
      spark.hadoop.fs.s3a.metadatastore.authoritative: false
      spark.hadoop.fs.s3a.metadatastore.impl: org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
      spark.hadoop.fs.s3a.multiobjectdelete.enable: true
      spark.hadoop.fs.s3a.multipart.purge: true
      spark.hadoop.fs.s3a.multipart.purge.age: 86400
      spark.hadoop.fs.s3a.multipart.size: 32M
      spark.hadoop.fs.s3a.multipart.threshold: 64M
      spark.hadoop.fs.s3a.paging.maximum: 5000
      spark.hadoop.fs.s3a.readahead.range: 65536
      spark.hadoop.fs.s3a.retry.interval: 500ms
      spark.hadoop.fs.s3a.retry.limit: 20
      spark.hadoop.fs.s3a.retry.throttle.interval: 500ms
      spark.hadoop.fs.s3a.retry.throttle.limit: 20
      spark.hadoop.fs.s3a.s3.client.factory.impl: org.apache.hadoop.fs.s3a.DefaultS3ClientFactory
      spark.hadoop.fs.s3a.s3guard.ddb.background.sleep: 25
      spark.hadoop.fs.s3a.s3guard.ddb.max.retries: 20
      spark.hadoop.fs.s3a.s3guard.ddb.region: us-east-1
      spark.hadoop.fs.s3a.s3guard.ddb.table: s3-data-guard-master
      spark.hadoop.fs.s3a.s3guard.ddb.table.capacity.read: 500
      spark.hadoop.fs.s3a.s3guard.ddb.table.capacity.write: 100
      spark.hadoop.fs.s3a.s3guard.ddb.table.create: true
      spark.hadoop.fs.s3a.s3guard.ddb.throttle.retry.interval: 1s
      spark.hadoop.fs.s3a.socket.recv.buffer: 8388608
      spark.hadoop.fs.s3a.socket.send.buffer: 8388608
      spark.hadoop.fs.s3a.threads.keepalivetime: 60
      spark.hadoop.fs.s3a.threads.max: 50

       

      Not sure if you need it - still putting it across (other spark configuration)

      spark.app.id: spark-b97cb651f3f14c6cb3197079376a74c7
      spark.app.startTime: 1628476986471
      spark.blockManager.port: 0
      spark.broadcast.compress: true
      spark.checkpoint.compress: true
      spark.cleaner.periodicGC.interval: 2min
      spark.cleaner.referenceTracking: true
      spark.cleaner.referenceTracking.blocking: true
      spark.cleaner.referenceTracking.blocking.shuffle: true
      spark.cleaner.referenceTracking.cleanCheckpoints: true
      spark.cores.max: 5
      spark.driver.bindAddress: 28.132.124.86
      spark.driver.blockManager.port: 0
      spark.driver.cores: 5
      spark.driver.extraJavaOptions: -XX:+UseG1GC -XX:+UnlockDiagnosticVMOptions -XX:+G1SummarizeConcMark -XX:InitiatingHeapOccupancyPercent=35 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:OnOutOfMemoryError='kill -9 %p'
      spark.driver.host: xxx-xxxx-xxx-8be6777b28caacc7-driver-svc.default.svc
      spark.driver.maxResultSize: 10008m
      spark.driver.memory: 10008m
      spark.driver.memoryOverhead: 384m
      spark.driver.port: 7078
      spark.driver.rpc.io.clientThreads: 5
      spark.driver.rpc.io.serverThreads: 5
      spark.driver.rpc.netty.dispatcher.numThreads: 5
      spark.driver.shuffle.io.clientThreads: 5
      spark.driver.shuffle.io.serverThreads: 5
      spark.dynamicAllocation.cachedExecutorIdleTimeout: 600s
      spark.dynamicAllocation.enabled: false
      spark.dynamicAllocation.executorAllocationRatio: 1.0
      spark.dynamicAllocation.executorIdleTimeout: 60s
      spark.dynamicAllocation.initialExecutors: 1
      spark.dynamicAllocation.maxExecutors: 2147483647
      spark.dynamicAllocation.minExecutors: 1
      spark.dynamicAllocation.schedulerBacklogTimeout: 1s
      spark.dynamicAllocation.shuffleTracking.enabled: true
      spark.dynamicAllocation.shuffleTracking.timeout: 600s
      spark.dynamicAllocation.sustainedSchedulerBacklogTimeout: 1s
      spark.eventLog.dir: /opt/efs/spark
      spark.eventLog.enabled: true
      spark.eventLog.logStageExecutorMetrics: false
      spark.excludeOnFailure.enabled: true
      spark.executor.cores: 5
      spark.executor.extraJavaOptions: -XX:+UseG1GC -XX:+UnlockDiagnosticVMOptions -XX:+G1SummarizeConcMark -XX:InitiatingHeapOccupancyPercent=35 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:OnOutOfMemoryError='kill -9 %p'
      spark.executor.id: driver
      spark.executor.instances: 22
      spark.executor.logs.rolling.enableCompression: false
      spark.executor.logs.rolling.maxRetainedFiles: 5
      spark.executor.logs.rolling.maxSize: 10m
      spark.executor.logs.rolling.strategy: size
      spark.executor.memory: 10008m
      spark.executor.memoryOverhead: 384m
      spark.executor.processTreeMetrics.enabled: false
      spark.executor.rpc.io.clientThreads: 5
      spark.executor.rpc.io.serverThreads: 5
      spark.executor.rpc.netty.dispatcher.numThreads: 5
      spark.executor.shuffle.io.clientThreads: 5
      spark.executor.shuffle.io.serverThreads: 5
      spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version: 2
      spark.history.fs.driverlog.cleaner.enabled: true
      spark.history.fs.driverlog.cleaner.maxAge: 2d
      spark.history.fs.logDirectory: /opt/efs/spark
      spark.history.ui.port: 4040
      spark.io.compression.codec: org.apache.spark.io.SnappyCompressionCodec
      spark.io.compression.snappy.blockSize: 32k
      spark.jars: local:///opt/spark/examples/xxx.jar,local:///opt/spark/examples/yyy.jar
      spark.kryo.referenceTracking: false
      spark.kryo.registrationRequired: false
      spark.kryo.unsafe: true
      spark.kryoserializer.buffer: 8m
      spark.kryoserializer.buffer.max: 1024m
      spark.kubernetes.allocation.batch.delay: 1s
      spark.kubernetes.allocation.batch.size: 5
      spark.kubernetes.allocation.executor.timeout: 600s
      spark.kubernetes.appKillPodDeletionGracePeriod: 5s
      spark.kubernetes.authenticate.driver.serviceAccountName: spark
      spark.kubernetes.configMap.maxSize: 1572864
      spark.kubernetes.container.image: xxx/xxx:latest
      spark.kubernetes.container.image.pullPolicy: Always
      spark.kubernetes.driver.connectionTimeout: 10000
      spark.kubernetes.driver.limit.cores: 8
      spark.kubernetes.driver.master: https://asdkadalksjdas.gr7.us-east-1.eks.amazonaws.com:443
      spark.kubernetes.driver.pod.name: xxx-ddd-rrrr-8be6777b28caacc7-driver
      spark.kubernetes.driver.request.cores: 5
      spark.kubernetes.driver.requestTimeout: 10000
      spark.kubernetes.driver.volumes.persistentVolumeClaim.efs-pvc-mount-d.mount.path: /opt/efs/spark
      spark.kubernetes.driver.volumes.persistentVolumeClaim.efs-pvc-mount-d.mount.readOnly: false
      spark.kubernetes.driver.volumes.persistentVolumeClaim.efs-pvc-mount-d.mount.subPath: spark
      spark.kubernetes.driver.volumes.persistentVolumeClaim.efs-pvc-mount-d.options.claimName: efs-pvc
      spark.kubernetes.driver.volumes.persistentVolumeClaim.efs-pvc-mount-d.options.storageClass: manual
      spark.kubernetes.dynamicAllocation.deleteGracePeriod: 5s
      spark.kubernetes.executor.apiPollingInterval: 60s
      spark.kubernetes.executor.checkAllContainers: true
      spark.kubernetes.executor.deleteOnTermination: false
      spark.kubernetes.executor.eventProcessingInterval: 5s
      spark.kubernetes.executor.limit.cores: 8
      spark.kubernetes.executor.missingPodDetectDelta: 30s
      spark.kubernetes.executor.podNamePrefix: uscb-exec
      spark.kubernetes.executor.request.cores: 5
      spark.kubernetes.executor.volumes.persistentVolumeClaim.efs-pvc-mount-e.mount.path: /opt/efs/spark
      spark.kubernetes.executor.volumes.persistentVolumeClaim.efs-pvc-mount-e.mount.readOnly: false
      spark.kubernetes.executor.volumes.persistentVolumeClaim.efs-pvc-mount-e.mount.subPath: spark
      spark.kubernetes.executor.volumes.persistentVolumeClaim.efs-pvc-mount-e.options.claimName: efs-pvc
      spark.kubernetes.executor.volumes.persistentVolumeClaim.efs-pvc-mount-e.options.storageClass: manual
      spark.kubernetes.local.dirs.tmpfs: false
      spark.kubernetes.memoryOverheadFactor: 0.1
      spark.kubernetes.namespace: default
      spark.kubernetes.report.interval: 5s
      spark.kubernetes.resource.type: java
      spark.kubernetes.submission.connectionTimeout: 10000
      spark.kubernetes.submission.requestTimeout: 10000
      spark.kubernetes.submission.waitAppCompletion: true
      spark.kubernetes.submitInDriver: true
      spark.local.dir: /tmp
      spark.locality.wait: 3s
      spark.locality.wait.node: 3s
      spark.locality.wait.process: 3s
      spark.locality.wait.rack: 3s
      spark.master: k8s://https://NKSLODISNJSKSJSKKLS.gr7.us-east-1.eks.amazonaws.com:443
      spark.memory.fraction: 0.6
      spark.memory.offHeap.enabled: false
      spark.memory.storageFraction: 0.5
      spark.network.io.preferDirectBufs: true
      spark.network.maxRemoteBlockSizeFetchToMem: 200m
      spark.network.timeout: 120s
      spark.port.maxRetries: 16
      spark.rdd.compress: false
      spark.reducer.maxBlocksInFlightPerAddress: 2147483647
      spark.reducer.maxReqsInFlight: 2147483647
      spark.reducer.maxSizeInFlight: 48m
      spark.repl.local.jars: local:///opt/spark/examples/asdasdasd.jar
      spark.rpc.askTimeout: 120s
      spark.rpc.io.backLog: 256
      spark.rpc.io.clientThreads: 5
      spark.rpc.io.serverThreads: 5
      spark.rpc.lookupTimeout: 120s
      spark.rpc.message.maxSize: 128
      spark.rpc.netty.dispatcher.numThreads: 5
      spark.rpc.numRetries: 3
      spark.rpc.retry.wait: 3s
      spark.scheduler.excludeOnFailure.unschedulableTaskSetTimeout: 120s
      spark.scheduler.listenerbus.eventqueue.appStatus.capacity: 10000
      spark.scheduler.listenerbus.eventqueue.capacity: 10000
      spark.scheduler.listenerbus.eventqueue.eventLog.capacity: 10000
      spark.scheduler.listenerbus.eventqueue.executorManagement.capacity: 10000
      spark.scheduler.listenerbus.eventqueue.shared.capacity: 10000
      spark.scheduler.maxRegisteredResourcesWaitingTime: 30s
      spark.scheduler.minRegisteredResourcesRatio: 0.8
      spark.scheduler.mode: FIFO
      spark.scheduler.resource.profileMergeConflicts: false
      spark.scheduler.revive.interval: 1s
      spark.serializer: org.apache.spark.serializer.KryoSerializer
      spark.serializer.objectStreamReset: 100
      spark.shuffle.accurateBlockThreshold: 104857600
      spark.shuffle.compress: true
      spark.shuffle.file.buffer: 128m
      spark.shuffle.io.backLog: -1
      spark.shuffle.io.maxRetries: 3
      spark.shuffle.io.numConnectionsPerPeer: 4
      spark.shuffle.io.preferDirectBufs: true
      spark.shuffle.io.retryWait: 5s
      spark.shuffle.maxChunksBeingTransferred: 9223372036854775807
      spark.shuffle.registration.maxAttempts: 3
      spark.shuffle.registration.timeout: 200
      spark.shuffle.service.enabled: false
      spark.shuffle.service.index.cache.size: 100m
      spark.shuffle.service.port: 7737
      spark.shuffle.sort.bypassMergeThreshold: 200
      spark.shuffle.spill.compress: true
      spark.speculation: false
      spark.speculation.interval: 5s
      spark.speculation.multiplier: 1.5
      spark.speculation.quantile: 0.75
      spark.speculation.task.duration.threshold: 10s
      spark.sql.adaptive.coalescePartitions.enabled: true
      spark.sql.adaptive.enabled: true
      spark.sql.adaptive.fetchShuffleBlocksInBatch: true
      spark.sql.adaptive.forceApply: false
      spark.sql.adaptive.localShuffleReader.enabled: true
      spark.sql.adaptive.logLevel: debug
      spark.sql.adaptive.nonEmptyPartitionRatioForBroadcastJoin: 0
      spark.sql.adaptive.skewJoin.enabled: true
      spark.sql.adaptive.skewJoin.skewedPartitionFactor: 5
      spark.sql.adaptive.skewJoin.skewedPartitionThresholdInByte: 256MB
      spark.sql.addPartitionInBatch.size: 100
      spark.sql.analyzer.failAmbiguousSelfJoin: true
      spark.sql.analyzer.maxIterations: 100
      spark.sql.ansi.enabled: false
      spark.sql.autoBroadcastJoinThreshold: 10MB
      spark.sql.avro.filterPushdown.enabled: true
      spark.sql.broadcastExchange.maxThreadThreshold: 128
      spark.sql.bucketing.coalesceBucketsInJoin.enabled: false
      spark.sql.bucketing.coalesceBucketsInJoin.maxBucketRatio: 4
      spark.sql.cache.serializer: org.apache.spark.sql.execution.columnar.DefaultCachedBatchSerializer
      spark.sql.cartesianProductExec.buffer.in.memory.threshold: 4096
      spark.sql.caseSensitive: false
      spark.sql.catalogImplementation: in-memory
      spark.sql.cbo.enabled: false
      spark.sql.cbo.joinReorder.card.weight: 0
      spark.sql.cbo.joinReorder.dp.star.filter: false
      spark.sql.cbo.joinReorder.dp.threshold: 12
      spark.sql.cbo.joinReorder.enabled: false
      spark.sql.cbo.planStats.enabled: false
      spark.sql.cbo.starJoinFTRatio: 0
      spark.sql.cbo.starSchemaDetection: false
      spark.sql.codegen.aggregate.fastHashMap.capacityBit: 16
      spark.sql.codegen.aggregate.map.twolevel.enabled: true
      spark.sql.codegen.aggregate.map.vectorized.enable: false
      spark.sql.codegen.aggregate.splitAggregateFunc.enabled: true
      spark.sql.codegen.cache.maxEntries: 100
      spark.sql.codegen.comments: false
      spark.sql.codegen.fallback: true
      spark.sql.codegen.hugeMethodLimit: 65535
      spark.sql.codegen.logging.maxLines: 1000
      spark.sql.codegen.maxFields: 100
      spark.sql.codegen.methodSplitThreshold: 1024
      spark.sql.codegen.splitConsumeFuncByOperator: true
      spark.sql.codegen.useIdInClassName: true
      spark.sql.codegen.wholeStage: true
      spark.sql.columnVector.offheap.enabled: false
      spark.sql.constraintPropagation.enabled: true
      spark.sql.crossJoin.enabled: true
      spark.sql.csv.filterPushdown.enabled: true
      spark.sql.csv.parser.columnPruning.enabled: true
      spark.sql.datetime.java8API.enabled: false
      spark.sql.debug: false
      spark.sql.debug.maxToStringFields: 25
      spark.sql.decimalOperations.allowPrecisionLoss: true
      spark.sql.event.truncate.length: 2147483647
      spark.sql.exchange.reuse: true
      spark.sql.execution.arrow.enabled: false
      spark.sql.execution.arrow.fallback.enabled: true
      spark.sql.execution.arrow.maxRecordsPerBatch: 10000
      spark.sql.execution.arrow.sparkr.enabled: false
      spark.sql.execution.broadcastHashJoin.outputPartitioningExpandLimit: 8
      spark.sql.execution.fastFailOnFileFormatOutput: false
      spark.sql.execution.pandas.convertToArrowArraySafely: false
      spark.sql.execution.pyspark.udf.simplifiedTraceback.enabled: false
      spark.sql.execution.rangeExchange.sampleSizePerPartition: 100
      spark.sql.execution.removeRedundantProjects: true
      spark.sql.execution.removeRedundantSorts: true
      spark.sql.execution.reuseSubquery: true
      spark.sql.execution.sortBeforeRepartition: true
      spark.sql.execution.useObjectHashAggregateExec: true
      spark.sql.files.ignoreCorruptFiles: false
      spark.sql.files.ignoreMissingFiles: false
      spark.sql.files.maxPartitionBytes: 128MB
      spark.sql.files.maxRecordsPerFile: 0
      spark.sql.filesourceTableRelationCacheSize: 1000
      spark.sql.function.concatBinaryAsString: false
      spark.sql.function.eltOutputAsString: false
      spark.sql.globalTempDatabase: global_temp
      spark.sql.groupByAliases: true
      spark.sql.groupByOrdinal: true
      spark.sql.hive.advancedPartitionPredicatePushdown.enabled: true
      spark.sql.hive.convertCTAS: false
      spark.sql.hive.gatherFastStats: true
      spark.sql.hive.manageFilesourcePartitions: true
      spark.sql.hive.metastorePartitionPruning: true
      spark.sql.hive.metastorePartitionPruningInSetThreshold: 1000
      spark.sql.hive.verifyPartitionPath: false
      spark.sql.inMemoryColumnarStorage.batchSize: 10000
      spark.sql.inMemoryColumnarStorage.compressed: true
      spark.sql.inMemoryColumnarStorage.enableVectorizedReader: true
      spark.sql.inMemoryColumnarStorage.partitionPruning: true
      spark.sql.inMemoryTableScanStatistics.enable: false
      spark.sql.join.preferSortMergeJoin: true
      spark.sql.json.filterPushdown.enabled: true
      spark.sql.jsonGenerator.ignoreNullFields: true
      spark.sql.legacy.addSingleFileInAddFile: false
      spark.sql.legacy.allowHashOnMapType: false
      spark.sql.legacy.allowNegativeScaleOfDecimal: false
      spark.sql.legacy.allowParameterlessCount: false
      spark.sql.legacy.allowUntypedScalaUDF: false
      spark.sql.legacy.bucketedTableScan.outputOrdering: false
      spark.sql.legacy.castComplexTypesToString.enabled: false
      spark.sql.legacy.charVarcharAsString: false
      spark.sql.legacy.createEmptyCollectionUsingStringType: false
      spark.sql.legacy.createHiveTableByDefault: true
      spark.sql.legacy.dataset.nameNonStructGroupingKeyAsValue: false
      spark.sql.legacy.doLooseUpcast: false
      spark.sql.legacy.execution.pandas.groupedMap.assignColumnsByName: true
      spark.sql.legacy.exponentLiteralAsDecimal.enabled: false
      spark.sql.legacy.extraOptionsBehavior.enabled: false
      spark.sql.legacy.followThreeValuedLogicInArrayExists: true
      spark.sql.legacy.fromDayTimeString.enabled: false
      spark.sql.legacy.integerGroupingId: false
      spark.sql.legacy.json.allowEmptyString.enabled: false
      spark.sql.legacy.keepCommandOutputSchema: false
      spark.sql.legacy.literal.pickMinimumPrecision: true
      spark.sql.legacy.notReserveProperties: false
      spark.sql.legacy.parseNullPartitionSpecAsStringLiteral: false
      spark.sql.legacy.parser.havingWithoutGroupByAsWhere: false
      spark.sql.legacy.pathOptionBehavior.enabled: false
      spark.sql.legacy.sessionInitWithConfigDefaults: false
      spark.sql.legacy.setCommandRejectsSparkCoreConfs: true
      spark.sql.legacy.setopsPrecedence.enabled: false
      spark.sql.legacy.sizeOfNull: true
      spark.sql.legacy.statisticalAggregate: false
      spark.sql.legacy.storeAnalyzedPlanForView: false
      spark.sql.legacy.typeCoercion.datetimeToString.enabled: false
      spark.sql.legacy.useCurrentConfigsForView: false
      spark.sql.limit.scaleUpFactor: 4
      spark.sql.maxMetadataStringLength: 100
      spark.sql.metadataCacheTTLSeconds: -1
      spark.sql.objectHashAggregate.sortBased.fallbackThreshold: 128
      spark.sql.optimizeNullAwareAntiJoin: true
      spark.sql.optimizer.disableHints: false
      spark.sql.optimizer.dynamicPartitionPruning.enabled: true
      spark.sql.optimizer.dynamicPartitionPruning.fallbackFilterRatio: 0
      spark.sql.optimizer.dynamicPartitionPruning.reuseBroadcastOnly: true
      spark.sql.optimizer.dynamicPartitionPruning.useStats: true
      spark.sql.optimizer.enableJsonExpressionOptimization: true
      spark.sql.optimizer.expression.nestedPruning.enabled: true
      spark.sql.optimizer.inSetConversionThreshold: 10
      spark.sql.optimizer.inSetSwitchThreshold: 400
      spark.sql.optimizer.maxIterations: 100
      spark.sql.optimizer.metadataOnly: false
      spark.sql.optimizer.nestedPredicatePushdown.supportedFileSources: parquet,orc
      spark.sql.optimizer.nestedSchemaPruning.enabled: true
      spark.sql.optimizer.replaceExceptWithFilter: true
      spark.sql.optimizer.serializer.nestedSchemaPruning.enabled: true
      spark.sql.orderByOrdinal: true
      spark.sql.parquet.binaryAsString: false
      spark.sql.parquet.columnarReaderBatchSize: 4096
      spark.sql.parquet.compression.codec: snappy
      spark.sql.parquet.enableVectorizedReader: true
      spark.sql.parquet.filterPushdown: true
      spark.sql.parquet.filterPushdown.date: true
      spark.sql.parquet.filterPushdown.decimal: true
      spark.sql.parquet.filterPushdown.string.startsWith: true
      spark.sql.parquet.filterPushdown.timestamp: true
      spark.sql.parquet.int96AsTimestamp: true
      spark.sql.parquet.int96TimestampConversion: false
      spark.sql.parquet.mergeSchema: false
      spark.sql.parquet.output.committer.class: org.apache.parquet.hadoop.ParquetOutputCommitter
      spark.sql.parquet.pushdown.inFilterThreshold: 10
      spark.sql.parquet.recordLevelFilter.enabled: false
      spark.sql.parquet.respectSummaryFiles: false
      spark.sql.parquet.writeLegacyFormat: false
      spark.sql.parser.escapedStringLiterals: false
      spark.sql.parser.quotedRegexColumnNames: false
      spark.sql.pivotMaxValues: 10000
      spark.sql.planChangeLog.level: trace
      spark.sql.pyspark.jvmStacktrace.enabled: false
      spark.sql.repl.eagerEval.enabled: false
      spark.sql.repl.eagerEval.maxNumRows: 20
      spark.sql.repl.eagerEval.truncate: 20
      spark.sql.retainGroupColumns: true
      spark.sql.runSQLOnFiles: true
      spark.sql.scriptTransformation.exitTimeoutInSeconds: 5s
      spark.sql.selfJoinAutoResolveAmbiguity: true
      spark.sql.shuffle.partitions: 200
      spark.sql.sort.enableRadixSort: true
      spark.sql.sources.binaryFile.maxLength: 2147483647
      spark.sql.sources.bucketing.autoBucketedScan.enabled: true
      spark.sql.sources.bucketing.enabled: true
      spark.sql.sources.bucketing.maxBuckets: 100000
      spark.sql.sources.commitProtocolClass: org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol
      spark.sql.sources.default: parquet
      spark.sql.sources.fileCompressionFactor: 1
      spark.sql.sources.ignoreDataLocality: false
      spark.sql.sources.parallelPartitionDiscovery.parallelism: 10000
      spark.sql.sources.parallelPartitionDiscovery.threshold: 32
      spark.sql.sources.partitionColumnTypeInference.enabled: true
      spark.sql.sources.validatePartitionColumns: true
      spark.sql.statistics.fallBackToHdfs: false
      spark.sql.statistics.histogram.enabled: false
      spark.sql.statistics.histogram.numBins: 254
      spark.sql.statistics.ndv.maxError: 0
      spark.sql.statistics.parallelFileListingInStatsComputation.enabled: true
      spark.sql.statistics.percentile.accuracy: 10000
      spark.sql.statistics.size.autoUpdate.enabled: false
      spark.sql.streaming.continuous.epochBacklogQueueSize: 10000
      spark.sql.streaming.continuous.executorPollIntervalMs: 100
      spark.sql.streaming.continuous.executorQueueSize: 1024
      spark.sql.streaming.metricsEnabled: true
      spark.sql.subexpressionElimination.cache.maxEntries: 100
      spark.sql.subexpressionElimination.enabled: true
      spark.sql.subquery.maxThreadThreshold: 16
      spark.sql.thriftServer.incrementalCollect: false
      spark.sql.thriftServer.queryTimeout: 20s
      spark.sql.thriftserver.ui.retainedSessions: 200
      spark.sql.thriftserver.ui.retainedStatements: 200
      spark.sql.truncateTable.ignorePermissionAcl.enabled: false
      spark.sql.ui.explainMode: formatted
      spark.sql.ui.retainedExecutions: 500
      spark.sql.variable.substitute: true
      spark.sql.view.maxNestedViewDepth: 100
      spark.sql.warehouse.dir: file:/opt/spark/work-dir/spark-warehouse
      spark.sql.windowExec.buffer.in.memory.threshold: 4096
      spark.stage.maxConsecutiveAttempts: 4
      spark.storage.replication.proactive: true
      spark.submit.deployMode: cluster
      spark.submit.pyFiles: 
      spark.task.cpus: 1
      spark.task.maxFailures: 4
      spark.task.reaper.enabled: true
      spark.task.reaper.killTimeout: -1
      spark.task.reaper.pollingInterval: 20s
      spark.task.reaper.threadDump: true

       

      Any quick help will be greatly appreciated.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              abhinavofficial Abhinav Kumar
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: