Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Works for Me
-
3.2.0
-
None
-
None
Description
I am trying to read parquet saved in S3 via Spark on EKS using hadoop-AWS 3.2.0. There are 112 partitions (each around 130MB) for a particular month.
The data is being read but very very slowly. I just keep seeing below and very small dataset actually being fetched.
21/08/09 05:07:05 DEBUG Executor task launch worker for task 60.0 in stage 3.0 (TID 63) Invoker: Values passed - text: read on s3a://uat1-prp-rftu-25-045552507264-us-east-1/xxxx/yyyy/zzzz/table_fact_mtd_c/ptn_val_txt=20200229/part-00012-32dbfb10-b43c-4066-a70e-d3575ea530d5-c000.snappy.parquet, idempotent: true, Retried: org.apache.hadoop.fs.s3a.S3AFileSystem$$Lambda$1199/2130521693@5259f9d0, Operation:org.apache.hadoop.fs.s3a.Invoker$$Lambda$1239/37396157@454de3d3
21/08/09 05:07:05 DEBUG Executor task launch worker for task 60.0 in stage 3.0 (TID 63) Invoker: retryUntranslated begin
21/08/09 05:07:05 DEBUG Executor task launch worker for task 60.0 in stage 3.0 (TID 63) Invoker: Values passed - text: lazySeek on s3a://uat1-prp-rftu-25-045552507264-us-east-1/xxxx/yyyy/zzzz/table_fact_mtd_c/ptn_val_txt=20200229/part-00012-32dbfb10-b43c-4066-a70e-d3575ea530d5-c000.snappy.parquet, idempotent: true, Retried: org.apache.hadoop.fs.s3a.S3AFileSystem$$Lambda$1199/2130521693@5259f9d0, Operation:org.apache.hadoop.fs.s3a.Invoker$$Lambda$1239/37396157@3776ef6c
21/08/09 05:07:05 DEBUG Executor task launch worker for task 60.0 in stage 3.0 (TID 63) Invoker: retryUntranslated begin
21/08/09 05:07:05 DEBUG Executor task launch worker for task 60.0 in stage 3.0 (TID 63) Invoker: Values passed - text: read on s3a://uat1-prp-rftu-25-045552507264-us-east-1/xxxx/yyyy/zzzz/table_fact_mtd_c/ptn_val_txt=20200229/part-00012-32dbfb10-b43c-4066-a70e-d3575ea530d5-c000.snappy.parquet, idempotent: true, Retried: org.apache.hadoop.fs.s3a.S3AFileSystem$$Lambda$1199/2130521693@5259f9d0, Operation:org.apache.hadoop.fs.s3a.Invoker$$Lambda$1239/37396157@3602676a
21/08/09 05:07:05 DEBUG Executor task launch worker for task 60.0 in stage 3.0 (TID 63) Invoker: retryUntranslated begin
Here is the spark config for hadoop-aws.
spark.hadoop.fs.s3a.assumed.role.sts.endpoint: https://sts.amazonaws.com |
spark.hadoop.fs.s3a.assumed.role.sts.endpoint.region: us-east-1 |
spark.hadoop.fs.s3a.attempts.maximum: 20 |
spark.hadoop.fs.s3a.aws.credentials.provider: org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider |
spark.hadoop.fs.s3a.block.size: 128M |
spark.hadoop.fs.s3a.connection.establish.timeout: 50000 |
spark.hadoop.fs.s3a.connection.maximum: 50 |
spark.hadoop.fs.s3a.connection.ssl.enabled: true |
spark.hadoop.fs.s3a.connection.timeout: 2000000 |
spark.hadoop.fs.s3a.endpoint: s3.us-east-1.amazonaws.com |
spark.hadoop.fs.s3a.etag.checksum.enabled: false |
spark.hadoop.fs.s3a.experimental.input.fadvise: normal |
spark.hadoop.fs.s3a.fast.buffer.size: 1048576 |
spark.hadoop.fs.s3a.fast.upload: true |
spark.hadoop.fs.s3a.fast.upload.active.blocks: 8 |
spark.hadoop.fs.s3a.fast.upload.buffer: bytebuffer |
spark.hadoop.fs.s3a.impl: org.apache.hadoop.fs.s3a.S3AFileSystem |
spark.hadoop.fs.s3a.list.version: 2 |
spark.hadoop.fs.s3a.max.total.tasks: 30 |
spark.hadoop.fs.s3a.metadatastore.authoritative: false |
spark.hadoop.fs.s3a.metadatastore.impl: org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore |
spark.hadoop.fs.s3a.multiobjectdelete.enable: true |
spark.hadoop.fs.s3a.multipart.purge: true |
spark.hadoop.fs.s3a.multipart.purge.age: 86400 |
spark.hadoop.fs.s3a.multipart.size: 32M |
spark.hadoop.fs.s3a.multipart.threshold: 64M |
spark.hadoop.fs.s3a.paging.maximum: 5000 |
spark.hadoop.fs.s3a.readahead.range: 65536 |
spark.hadoop.fs.s3a.retry.interval: 500ms |
spark.hadoop.fs.s3a.retry.limit: 20 |
spark.hadoop.fs.s3a.retry.throttle.interval: 500ms |
spark.hadoop.fs.s3a.retry.throttle.limit: 20 |
spark.hadoop.fs.s3a.s3.client.factory.impl: org.apache.hadoop.fs.s3a.DefaultS3ClientFactory |
spark.hadoop.fs.s3a.s3guard.ddb.background.sleep: 25 |
spark.hadoop.fs.s3a.s3guard.ddb.max.retries: 20 |
spark.hadoop.fs.s3a.s3guard.ddb.region: us-east-1 |
spark.hadoop.fs.s3a.s3guard.ddb.table: s3-data-guard-master |
spark.hadoop.fs.s3a.s3guard.ddb.table.capacity.read: 500 |
spark.hadoop.fs.s3a.s3guard.ddb.table.capacity.write: 100 |
spark.hadoop.fs.s3a.s3guard.ddb.table.create: true |
spark.hadoop.fs.s3a.s3guard.ddb.throttle.retry.interval: 1s |
spark.hadoop.fs.s3a.socket.recv.buffer: 8388608 |
spark.hadoop.fs.s3a.socket.send.buffer: 8388608 |
spark.hadoop.fs.s3a.threads.keepalivetime: 60 |
spark.hadoop.fs.s3a.threads.max: 50 |
Not sure if you need it - still putting it across (other spark configuration)
spark.app.id: spark-b97cb651f3f14c6cb3197079376a74c7 |
spark.app.startTime: 1628476986471 |
spark.blockManager.port: 0 |
spark.broadcast.compress: true |
spark.checkpoint.compress: true |
spark.cleaner.periodicGC.interval: 2min |
spark.cleaner.referenceTracking: true |
spark.cleaner.referenceTracking.blocking: true |
spark.cleaner.referenceTracking.blocking.shuffle: true |
spark.cleaner.referenceTracking.cleanCheckpoints: true |
spark.cores.max: 5 |
spark.driver.bindAddress: 28.132.124.86 |
spark.driver.blockManager.port: 0 |
spark.driver.cores: 5 |
spark.driver.extraJavaOptions: -XX:+UseG1GC -XX:+UnlockDiagnosticVMOptions -XX:+G1SummarizeConcMark -XX:InitiatingHeapOccupancyPercent=35 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:OnOutOfMemoryError='kill -9 %p' |
spark.driver.host: xxx-xxxx-xxx-8be6777b28caacc7-driver-svc.default.svc |
spark.driver.maxResultSize: 10008m |
spark.driver.memory: 10008m |
spark.driver.memoryOverhead: 384m |
spark.driver.port: 7078 |
spark.driver.rpc.io.clientThreads: 5 |
spark.driver.rpc.io.serverThreads: 5 |
spark.driver.rpc.netty.dispatcher.numThreads: 5 |
spark.driver.shuffle.io.clientThreads: 5 |
spark.driver.shuffle.io.serverThreads: 5 |
spark.dynamicAllocation.cachedExecutorIdleTimeout: 600s |
spark.dynamicAllocation.enabled: false |
spark.dynamicAllocation.executorAllocationRatio: 1.0 |
spark.dynamicAllocation.executorIdleTimeout: 60s |
spark.dynamicAllocation.initialExecutors: 1 |
spark.dynamicAllocation.maxExecutors: 2147483647 |
spark.dynamicAllocation.minExecutors: 1 |
spark.dynamicAllocation.schedulerBacklogTimeout: 1s |
spark.dynamicAllocation.shuffleTracking.enabled: true |
spark.dynamicAllocation.shuffleTracking.timeout: 600s |
spark.dynamicAllocation.sustainedSchedulerBacklogTimeout: 1s |
spark.eventLog.dir: /opt/efs/spark |
spark.eventLog.enabled: true |
spark.eventLog.logStageExecutorMetrics: false |
spark.excludeOnFailure.enabled: true |
spark.executor.cores: 5 |
spark.executor.extraJavaOptions: -XX:+UseG1GC -XX:+UnlockDiagnosticVMOptions -XX:+G1SummarizeConcMark -XX:InitiatingHeapOccupancyPercent=35 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:OnOutOfMemoryError='kill -9 %p' |
spark.executor.id: driver |
spark.executor.instances: 22 |
spark.executor.logs.rolling.enableCompression: false |
spark.executor.logs.rolling.maxRetainedFiles: 5 |
spark.executor.logs.rolling.maxSize: 10m |
spark.executor.logs.rolling.strategy: size |
spark.executor.memory: 10008m |
spark.executor.memoryOverhead: 384m |
spark.executor.processTreeMetrics.enabled: false |
spark.executor.rpc.io.clientThreads: 5 |
spark.executor.rpc.io.serverThreads: 5 |
spark.executor.rpc.netty.dispatcher.numThreads: 5 |
spark.executor.shuffle.io.clientThreads: 5 |
spark.executor.shuffle.io.serverThreads: 5 |
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version: 2 |
spark.history.fs.driverlog.cleaner.enabled: true |
spark.history.fs.driverlog.cleaner.maxAge: 2d |
spark.history.fs.logDirectory: /opt/efs/spark |
spark.history.ui.port: 4040 |
spark.io.compression.codec: org.apache.spark.io.SnappyCompressionCodec |
spark.io.compression.snappy.blockSize: 32k |
spark.jars: local:///opt/spark/examples/xxx.jar,local:///opt/spark/examples/yyy.jar |
spark.kryo.referenceTracking: false |
spark.kryo.registrationRequired: false |
spark.kryo.unsafe: true |
spark.kryoserializer.buffer: 8m |
spark.kryoserializer.buffer.max: 1024m |
spark.kubernetes.allocation.batch.delay: 1s |
spark.kubernetes.allocation.batch.size: 5 |
spark.kubernetes.allocation.executor.timeout: 600s |
spark.kubernetes.appKillPodDeletionGracePeriod: 5s |
spark.kubernetes.authenticate.driver.serviceAccountName: spark |
spark.kubernetes.configMap.maxSize: 1572864 |
spark.kubernetes.container.image: xxx/xxx:latest |
spark.kubernetes.container.image.pullPolicy: Always |
spark.kubernetes.driver.connectionTimeout: 10000 |
spark.kubernetes.driver.limit.cores: 8 |
spark.kubernetes.driver.master: https://asdkadalksjdas.gr7.us-east-1.eks.amazonaws.com:443 |
spark.kubernetes.driver.pod.name: xxx-ddd-rrrr-8be6777b28caacc7-driver |
spark.kubernetes.driver.request.cores: 5 |
spark.kubernetes.driver.requestTimeout: 10000 |
spark.kubernetes.driver.volumes.persistentVolumeClaim.efs-pvc-mount-d.mount.path: /opt/efs/spark |
spark.kubernetes.driver.volumes.persistentVolumeClaim.efs-pvc-mount-d.mount.readOnly: false |
spark.kubernetes.driver.volumes.persistentVolumeClaim.efs-pvc-mount-d.mount.subPath: spark |
spark.kubernetes.driver.volumes.persistentVolumeClaim.efs-pvc-mount-d.options.claimName: efs-pvc |
spark.kubernetes.driver.volumes.persistentVolumeClaim.efs-pvc-mount-d.options.storageClass: manual |
spark.kubernetes.dynamicAllocation.deleteGracePeriod: 5s |
spark.kubernetes.executor.apiPollingInterval: 60s |
spark.kubernetes.executor.checkAllContainers: true |
spark.kubernetes.executor.deleteOnTermination: false |
spark.kubernetes.executor.eventProcessingInterval: 5s |
spark.kubernetes.executor.limit.cores: 8 |
spark.kubernetes.executor.missingPodDetectDelta: 30s |
spark.kubernetes.executor.podNamePrefix: uscb-exec |
spark.kubernetes.executor.request.cores: 5 |
spark.kubernetes.executor.volumes.persistentVolumeClaim.efs-pvc-mount-e.mount.path: /opt/efs/spark |
spark.kubernetes.executor.volumes.persistentVolumeClaim.efs-pvc-mount-e.mount.readOnly: false |
spark.kubernetes.executor.volumes.persistentVolumeClaim.efs-pvc-mount-e.mount.subPath: spark |
spark.kubernetes.executor.volumes.persistentVolumeClaim.efs-pvc-mount-e.options.claimName: efs-pvc |
spark.kubernetes.executor.volumes.persistentVolumeClaim.efs-pvc-mount-e.options.storageClass: manual |
spark.kubernetes.local.dirs.tmpfs: false |
spark.kubernetes.memoryOverheadFactor: 0.1 |
spark.kubernetes.namespace: default |
spark.kubernetes.report.interval: 5s |
spark.kubernetes.resource.type: java |
spark.kubernetes.submission.connectionTimeout: 10000 |
spark.kubernetes.submission.requestTimeout: 10000 |
spark.kubernetes.submission.waitAppCompletion: true |
spark.kubernetes.submitInDriver: true |
spark.local.dir: /tmp |
spark.locality.wait: 3s |
spark.locality.wait.node: 3s |
spark.locality.wait.process: 3s |
spark.locality.wait.rack: 3s |
spark.master: k8s://https://NKSLODISNJSKSJSKKLS.gr7.us-east-1.eks.amazonaws.com:443 |
spark.memory.fraction: 0.6 |
spark.memory.offHeap.enabled: false |
spark.memory.storageFraction: 0.5 |
spark.network.io.preferDirectBufs: true |
spark.network.maxRemoteBlockSizeFetchToMem: 200m |
spark.network.timeout: 120s |
spark.port.maxRetries: 16 |
spark.rdd.compress: false |
spark.reducer.maxBlocksInFlightPerAddress: 2147483647 |
spark.reducer.maxReqsInFlight: 2147483647 |
spark.reducer.maxSizeInFlight: 48m |
spark.repl.local.jars: local:///opt/spark/examples/asdasdasd.jar |
spark.rpc.askTimeout: 120s |
spark.rpc.io.backLog: 256 |
spark.rpc.io.clientThreads: 5 |
spark.rpc.io.serverThreads: 5 |
spark.rpc.lookupTimeout: 120s |
spark.rpc.message.maxSize: 128 |
spark.rpc.netty.dispatcher.numThreads: 5 |
spark.rpc.numRetries: 3 |
spark.rpc.retry.wait: 3s |
spark.scheduler.excludeOnFailure.unschedulableTaskSetTimeout: 120s |
spark.scheduler.listenerbus.eventqueue.appStatus.capacity: 10000 |
spark.scheduler.listenerbus.eventqueue.capacity: 10000 |
spark.scheduler.listenerbus.eventqueue.eventLog.capacity: 10000 |
spark.scheduler.listenerbus.eventqueue.executorManagement.capacity: 10000 |
spark.scheduler.listenerbus.eventqueue.shared.capacity: 10000 |
spark.scheduler.maxRegisteredResourcesWaitingTime: 30s |
spark.scheduler.minRegisteredResourcesRatio: 0.8 |
spark.scheduler.mode: FIFO |
spark.scheduler.resource.profileMergeConflicts: false |
spark.scheduler.revive.interval: 1s |
spark.serializer: org.apache.spark.serializer.KryoSerializer |
spark.serializer.objectStreamReset: 100 |
spark.shuffle.accurateBlockThreshold: 104857600 |
spark.shuffle.compress: true |
spark.shuffle.file.buffer: 128m |
spark.shuffle.io.backLog: -1 |
spark.shuffle.io.maxRetries: 3 |
spark.shuffle.io.numConnectionsPerPeer: 4 |
spark.shuffle.io.preferDirectBufs: true |
spark.shuffle.io.retryWait: 5s |
spark.shuffle.maxChunksBeingTransferred: 9223372036854775807 |
spark.shuffle.registration.maxAttempts: 3 |
spark.shuffle.registration.timeout: 200 |
spark.shuffle.service.enabled: false |
spark.shuffle.service.index.cache.size: 100m |
spark.shuffle.service.port: 7737 |
spark.shuffle.sort.bypassMergeThreshold: 200 |
spark.shuffle.spill.compress: true |
spark.speculation: false |
spark.speculation.interval: 5s |
spark.speculation.multiplier: 1.5 |
spark.speculation.quantile: 0.75 |
spark.speculation.task.duration.threshold: 10s |
spark.sql.adaptive.coalescePartitions.enabled: true |
spark.sql.adaptive.enabled: true |
spark.sql.adaptive.fetchShuffleBlocksInBatch: true |
spark.sql.adaptive.forceApply: false |
spark.sql.adaptive.localShuffleReader.enabled: true |
spark.sql.adaptive.logLevel: debug |
spark.sql.adaptive.nonEmptyPartitionRatioForBroadcastJoin: 0 |
spark.sql.adaptive.skewJoin.enabled: true |
spark.sql.adaptive.skewJoin.skewedPartitionFactor: 5 |
spark.sql.adaptive.skewJoin.skewedPartitionThresholdInByte: 256MB |
spark.sql.addPartitionInBatch.size: 100 |
spark.sql.analyzer.failAmbiguousSelfJoin: true |
spark.sql.analyzer.maxIterations: 100 |
spark.sql.ansi.enabled: false |
spark.sql.autoBroadcastJoinThreshold: 10MB |
spark.sql.avro.filterPushdown.enabled: true |
spark.sql.broadcastExchange.maxThreadThreshold: 128 |
spark.sql.bucketing.coalesceBucketsInJoin.enabled: false |
spark.sql.bucketing.coalesceBucketsInJoin.maxBucketRatio: 4 |
spark.sql.cache.serializer: org.apache.spark.sql.execution.columnar.DefaultCachedBatchSerializer |
spark.sql.cartesianProductExec.buffer.in.memory.threshold: 4096 |
spark.sql.caseSensitive: false |
spark.sql.catalogImplementation: in-memory |
spark.sql.cbo.enabled: false |
spark.sql.cbo.joinReorder.card.weight: 0 |
spark.sql.cbo.joinReorder.dp.star.filter: false |
spark.sql.cbo.joinReorder.dp.threshold: 12 |
spark.sql.cbo.joinReorder.enabled: false |
spark.sql.cbo.planStats.enabled: false |
spark.sql.cbo.starJoinFTRatio: 0 |
spark.sql.cbo.starSchemaDetection: false |
spark.sql.codegen.aggregate.fastHashMap.capacityBit: 16 |
spark.sql.codegen.aggregate.map.twolevel.enabled: true |
spark.sql.codegen.aggregate.map.vectorized.enable: false |
spark.sql.codegen.aggregate.splitAggregateFunc.enabled: true |
spark.sql.codegen.cache.maxEntries: 100 |
spark.sql.codegen.comments: false |
spark.sql.codegen.fallback: true |
spark.sql.codegen.hugeMethodLimit: 65535 |
spark.sql.codegen.logging.maxLines: 1000 |
spark.sql.codegen.maxFields: 100 |
spark.sql.codegen.methodSplitThreshold: 1024 |
spark.sql.codegen.splitConsumeFuncByOperator: true |
spark.sql.codegen.useIdInClassName: true |
spark.sql.codegen.wholeStage: true |
spark.sql.columnVector.offheap.enabled: false |
spark.sql.constraintPropagation.enabled: true |
spark.sql.crossJoin.enabled: true |
spark.sql.csv.filterPushdown.enabled: true |
spark.sql.csv.parser.columnPruning.enabled: true |
spark.sql.datetime.java8API.enabled: false |
spark.sql.debug: false |
spark.sql.debug.maxToStringFields: 25 |
spark.sql.decimalOperations.allowPrecisionLoss: true |
spark.sql.event.truncate.length: 2147483647 |
spark.sql.exchange.reuse: true |
spark.sql.execution.arrow.enabled: false |
spark.sql.execution.arrow.fallback.enabled: true |
spark.sql.execution.arrow.maxRecordsPerBatch: 10000 |
spark.sql.execution.arrow.sparkr.enabled: false |
spark.sql.execution.broadcastHashJoin.outputPartitioningExpandLimit: 8 |
spark.sql.execution.fastFailOnFileFormatOutput: false |
spark.sql.execution.pandas.convertToArrowArraySafely: false |
spark.sql.execution.pyspark.udf.simplifiedTraceback.enabled: false |
spark.sql.execution.rangeExchange.sampleSizePerPartition: 100 |
spark.sql.execution.removeRedundantProjects: true |
spark.sql.execution.removeRedundantSorts: true |
spark.sql.execution.reuseSubquery: true |
spark.sql.execution.sortBeforeRepartition: true |
spark.sql.execution.useObjectHashAggregateExec: true |
spark.sql.files.ignoreCorruptFiles: false |
spark.sql.files.ignoreMissingFiles: false |
spark.sql.files.maxPartitionBytes: 128MB |
spark.sql.files.maxRecordsPerFile: 0 |
spark.sql.filesourceTableRelationCacheSize: 1000 |
spark.sql.function.concatBinaryAsString: false |
spark.sql.function.eltOutputAsString: false |
spark.sql.globalTempDatabase: global_temp |
spark.sql.groupByAliases: true |
spark.sql.groupByOrdinal: true |
spark.sql.hive.advancedPartitionPredicatePushdown.enabled: true |
spark.sql.hive.convertCTAS: false |
spark.sql.hive.gatherFastStats: true |
spark.sql.hive.manageFilesourcePartitions: true |
spark.sql.hive.metastorePartitionPruning: true |
spark.sql.hive.metastorePartitionPruningInSetThreshold: 1000 |
spark.sql.hive.verifyPartitionPath: false |
spark.sql.inMemoryColumnarStorage.batchSize: 10000 |
spark.sql.inMemoryColumnarStorage.compressed: true |
spark.sql.inMemoryColumnarStorage.enableVectorizedReader: true |
spark.sql.inMemoryColumnarStorage.partitionPruning: true |
spark.sql.inMemoryTableScanStatistics.enable: false |
spark.sql.join.preferSortMergeJoin: true |
spark.sql.json.filterPushdown.enabled: true |
spark.sql.jsonGenerator.ignoreNullFields: true |
spark.sql.legacy.addSingleFileInAddFile: false |
spark.sql.legacy.allowHashOnMapType: false |
spark.sql.legacy.allowNegativeScaleOfDecimal: false |
spark.sql.legacy.allowParameterlessCount: false |
spark.sql.legacy.allowUntypedScalaUDF: false |
spark.sql.legacy.bucketedTableScan.outputOrdering: false |
spark.sql.legacy.castComplexTypesToString.enabled: false |
spark.sql.legacy.charVarcharAsString: false |
spark.sql.legacy.createEmptyCollectionUsingStringType: false |
spark.sql.legacy.createHiveTableByDefault: true |
spark.sql.legacy.dataset.nameNonStructGroupingKeyAsValue: false |
spark.sql.legacy.doLooseUpcast: false |
spark.sql.legacy.execution.pandas.groupedMap.assignColumnsByName: true |
spark.sql.legacy.exponentLiteralAsDecimal.enabled: false |
spark.sql.legacy.extraOptionsBehavior.enabled: false |
spark.sql.legacy.followThreeValuedLogicInArrayExists: true |
spark.sql.legacy.fromDayTimeString.enabled: false |
spark.sql.legacy.integerGroupingId: false |
spark.sql.legacy.json.allowEmptyString.enabled: false |
spark.sql.legacy.keepCommandOutputSchema: false |
spark.sql.legacy.literal.pickMinimumPrecision: true |
spark.sql.legacy.notReserveProperties: false |
spark.sql.legacy.parseNullPartitionSpecAsStringLiteral: false |
spark.sql.legacy.parser.havingWithoutGroupByAsWhere: false |
spark.sql.legacy.pathOptionBehavior.enabled: false |
spark.sql.legacy.sessionInitWithConfigDefaults: false |
spark.sql.legacy.setCommandRejectsSparkCoreConfs: true |
spark.sql.legacy.setopsPrecedence.enabled: false |
spark.sql.legacy.sizeOfNull: true |
spark.sql.legacy.statisticalAggregate: false |
spark.sql.legacy.storeAnalyzedPlanForView: false |
spark.sql.legacy.typeCoercion.datetimeToString.enabled: false |
spark.sql.legacy.useCurrentConfigsForView: false |
spark.sql.limit.scaleUpFactor: 4 |
spark.sql.maxMetadataStringLength: 100 |
spark.sql.metadataCacheTTLSeconds: -1 |
spark.sql.objectHashAggregate.sortBased.fallbackThreshold: 128 |
spark.sql.optimizeNullAwareAntiJoin: true |
spark.sql.optimizer.disableHints: false |
spark.sql.optimizer.dynamicPartitionPruning.enabled: true |
spark.sql.optimizer.dynamicPartitionPruning.fallbackFilterRatio: 0 |
spark.sql.optimizer.dynamicPartitionPruning.reuseBroadcastOnly: true |
spark.sql.optimizer.dynamicPartitionPruning.useStats: true |
spark.sql.optimizer.enableJsonExpressionOptimization: true |
spark.sql.optimizer.expression.nestedPruning.enabled: true |
spark.sql.optimizer.inSetConversionThreshold: 10 |
spark.sql.optimizer.inSetSwitchThreshold: 400 |
spark.sql.optimizer.maxIterations: 100 |
spark.sql.optimizer.metadataOnly: false |
spark.sql.optimizer.nestedPredicatePushdown.supportedFileSources: parquet,orc |
spark.sql.optimizer.nestedSchemaPruning.enabled: true |
spark.sql.optimizer.replaceExceptWithFilter: true |
spark.sql.optimizer.serializer.nestedSchemaPruning.enabled: true |
spark.sql.orderByOrdinal: true |
spark.sql.parquet.binaryAsString: false |
spark.sql.parquet.columnarReaderBatchSize: 4096 |
spark.sql.parquet.compression.codec: snappy |
spark.sql.parquet.enableVectorizedReader: true |
spark.sql.parquet.filterPushdown: true |
spark.sql.parquet.filterPushdown.date: true |
spark.sql.parquet.filterPushdown.decimal: true |
spark.sql.parquet.filterPushdown.string.startsWith: true |
spark.sql.parquet.filterPushdown.timestamp: true |
spark.sql.parquet.int96AsTimestamp: true |
spark.sql.parquet.int96TimestampConversion: false |
spark.sql.parquet.mergeSchema: false |
spark.sql.parquet.output.committer.class: org.apache.parquet.hadoop.ParquetOutputCommitter |
spark.sql.parquet.pushdown.inFilterThreshold: 10 |
spark.sql.parquet.recordLevelFilter.enabled: false |
spark.sql.parquet.respectSummaryFiles: false |
spark.sql.parquet.writeLegacyFormat: false |
spark.sql.parser.escapedStringLiterals: false |
spark.sql.parser.quotedRegexColumnNames: false |
spark.sql.pivotMaxValues: 10000 |
spark.sql.planChangeLog.level: trace |
spark.sql.pyspark.jvmStacktrace.enabled: false |
spark.sql.repl.eagerEval.enabled: false |
spark.sql.repl.eagerEval.maxNumRows: 20 |
spark.sql.repl.eagerEval.truncate: 20 |
spark.sql.retainGroupColumns: true |
spark.sql.runSQLOnFiles: true |
spark.sql.scriptTransformation.exitTimeoutInSeconds: 5s |
spark.sql.selfJoinAutoResolveAmbiguity: true |
spark.sql.shuffle.partitions: 200 |
spark.sql.sort.enableRadixSort: true |
spark.sql.sources.binaryFile.maxLength: 2147483647 |
spark.sql.sources.bucketing.autoBucketedScan.enabled: true |
spark.sql.sources.bucketing.enabled: true |
spark.sql.sources.bucketing.maxBuckets: 100000 |
spark.sql.sources.commitProtocolClass: org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol |
spark.sql.sources.default: parquet |
spark.sql.sources.fileCompressionFactor: 1 |
spark.sql.sources.ignoreDataLocality: false |
spark.sql.sources.parallelPartitionDiscovery.parallelism: 10000 |
spark.sql.sources.parallelPartitionDiscovery.threshold: 32 |
spark.sql.sources.partitionColumnTypeInference.enabled: true |
spark.sql.sources.validatePartitionColumns: true |
spark.sql.statistics.fallBackToHdfs: false |
spark.sql.statistics.histogram.enabled: false |
spark.sql.statistics.histogram.numBins: 254 |
spark.sql.statistics.ndv.maxError: 0 |
spark.sql.statistics.parallelFileListingInStatsComputation.enabled: true |
spark.sql.statistics.percentile.accuracy: 10000 |
spark.sql.statistics.size.autoUpdate.enabled: false |
spark.sql.streaming.continuous.epochBacklogQueueSize: 10000 |
spark.sql.streaming.continuous.executorPollIntervalMs: 100 |
spark.sql.streaming.continuous.executorQueueSize: 1024 |
spark.sql.streaming.metricsEnabled: true |
spark.sql.subexpressionElimination.cache.maxEntries: 100 |
spark.sql.subexpressionElimination.enabled: true |
spark.sql.subquery.maxThreadThreshold: 16 |
spark.sql.thriftServer.incrementalCollect: false |
spark.sql.thriftServer.queryTimeout: 20s |
spark.sql.thriftserver.ui.retainedSessions: 200 |
spark.sql.thriftserver.ui.retainedStatements: 200 |
spark.sql.truncateTable.ignorePermissionAcl.enabled: false |
spark.sql.ui.explainMode: formatted |
spark.sql.ui.retainedExecutions: 500 |
spark.sql.variable.substitute: true |
spark.sql.view.maxNestedViewDepth: 100 |
spark.sql.warehouse.dir: file:/opt/spark/work-dir/spark-warehouse |
spark.sql.windowExec.buffer.in.memory.threshold: 4096 |
spark.stage.maxConsecutiveAttempts: 4 |
spark.storage.replication.proactive: true |
spark.submit.deployMode: cluster |
spark.submit.pyFiles: |
spark.task.cpus: 1 |
spark.task.maxFailures: 4 |
spark.task.reaper.enabled: true |
spark.task.reaper.killTimeout: -1 |
spark.task.reaper.pollingInterval: 20s |
spark.task.reaper.threadDump: true |
Any quick help will be greatly appreciated.
Attachments
Issue Links
- is related to
-
HADOOP-18179 Boost S3A Stream Read Performance
- Open