Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-44111

Prepare Apache Spark 4.0.0

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Umbrella
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • 4.0.0
    • None
    • Build

    Description

      For now, this issue aims to collect ideas for planning Apache Spark 4.0.0.

      We will add more items which will be excluded from Apache Spark 3.5.0 (Feature Freeze: July 16th, 2023).

      Spark 1: 2014.05 (1.0.0) ~ 2016.11 (1.6.3)
      Spark 2: 2016.07 (2.0.0) ~ 2021.05 (2.4.8)
      Spark 3: 2020.06 (3.0.0) ~ 2026.xx (3.5.x)
      Spark 4: 2024.06 (4.0.0, NEW)
      

      Attachments

        Issue Links

        1.
        Drop mesos support Sub-task Resolved Sean R. Owen Actions
        2.
        Drop K8s v1.25 and lower version support Sub-task Resolved Dongjoon Hyun Actions
        3.
        Drop K8s v1.26 Support Sub-task Resolved Dongjoon Hyun Actions
        4.
        Remove shim classes for Hive prior 2.0.0 Sub-task Resolved Cheng Pan Actions
        5.
        Remove deprecated `BinaryClassificationMetrics.scoreLabelsWeight` Sub-task Resolved Dongjoon Hyun Actions
        6.
        Upgrade Scala to 2.13.12 Sub-task Resolved Yang Jie Actions
        7.
        Upgrade Scala to 2.13.13 Sub-task Resolved BingKun Pan Actions
        8.
        Enable spark.shuffle.service.removeShuffle by default Sub-task Resolved Dongjoon Hyun Actions
        9.
        Enable spark.eventLog.compress by default Sub-task Resolved Dongjoon Hyun Actions
        10.
        Enable spark.eventLog.rolling.enabled by default Sub-task Resolved Dongjoon Hyun Actions
        11.
        Enable `spark.metrics.appStatusSource.enabled` by default Sub-task Resolved Dongjoon Hyun Actions
        12.
        Update `spark.speculation.multiplier` to 3 and `spark.speculation.quantile` to 0.9 Sub-task Resolved Dongjoon Hyun Actions
        13.
        Deprecate spark.sql.parser.escapedStringLiterals Sub-task Resolved Max Gekk Actions
        14.
        Change default of spark.sql.legacy.timeParserPolicy from EXCEPTION to CORRECTED Sub-task Resolved Serge Rielau Actions
        15.
        Make EventLoggingListenerSuite independent from spark.eventLog.compress conf Sub-task Resolved Dongjoon Hyun Actions
        16.
        Fix EventLogFileWriters to handle `none` codec case Sub-task Resolved Dongjoon Hyun Actions
        17.
        Upgrade `Volcano` to 1.8.0 Sub-task Resolved Dongjoon Hyun Actions
        18.
        Upgrade `Volcano` to 1.8.1 Sub-task Resolved Dongjoon Hyun Actions
        19.
        Upgrade `Volcano` to 1.8.2 Sub-task Resolved Dongjoon Hyun Actions
        20.
        Migrate antlr4 from 4.9 to 4.10+ Sub-task Resolved Yang Jie Actions
        21.
        Upgrade Python to 3.11 in Maven builds Sub-task Resolved Hyukjin Kwon Actions
        22.
        Support Python 3.12 Sub-task Resolved Dongjoon Hyun Actions
        23.
        Remove pinned version of torch for Python 3.12 support Sub-task Resolved Hyukjin Kwon Actions
        24.
        Upgrade Pandas to 2.2.0 Sub-task Resolved Haejoon Lee Actions
        25.
        Remove `distutils` usage Sub-task Resolved Dongjoon Hyun Actions
        26.
        Remove deprecated Hadoop-2 `LocatedFileStatus` constructor Sub-task Resolved Dongjoon Hyun Actions
        27.
        Support AWS_ENDPOINT_URL env variable Sub-task Resolved Dongjoon Hyun Actions
        28.
        Improve InMemoryFileIndex to use FileSystem.listFiles API Sub-task Resolved Dongjoon Hyun Actions
        29.
        Change RocksDB as default shuffle service db backend Sub-task Resolved Jia Fan Actions
        30.
        Eliminate unnecessary reflection invocation in Hive shim classes Sub-task Resolved Cheng Pan Actions
        31.
        Upgrade kubernetes-client to 6.9.0 for K8s 1.28 Sub-task Resolved Dongjoon Hyun Actions
        32.
        Upgrade `kubernetes-client` to 6.9.1 Sub-task Resolved Dongjoon Hyun Actions
        33.
        Upgrade kubernetes-client to 6.10.0 for K8s v1.29.0 Sub-task Resolved Bjørn Jørgensen Actions
        34.
        Upgrade kubernetes-client to 6.11.0 Sub-task Resolved Bjørn Jørgensen Actions
        35.
        Upgrade `kubernetes-client` to 6.12.0 Sub-task Resolved Dongjoon Hyun Actions
        36.
        Upgrade kubernetes-client to 6.12.1 Sub-task Resolved Bjørn Jørgensen Actions
        37.
        Use `built-in` storage classes in PVTestsSuite Sub-task Resolved Dongjoon Hyun Actions
        38.
        Create and use a K8s test tag for `PersistentVolume` Sub-task Resolved Dongjoon Hyun Actions
        39.
        Use the latest minikube in K8s IT Sub-task Resolved Dongjoon Hyun Actions
        40.
        Remove threeten-extra exclusion in enforceBytecodeVersion rule Sub-task Resolved Dongjoon Hyun Actions
        41.
        Update `YuniKorn` docs with v1.4 Sub-task Resolved Dongjoon Hyun Actions
        42.
        Update `YuniKorn` docs with v1.5 Sub-task Resolved Dongjoon Hyun Actions
        43.
        Upgrade Apache ORC to 2.0 Sub-task Resolved Dongjoon Hyun Actions
        44.
        Support ORC Brotli codec Sub-task Resolved dzcxzl Actions
        45.
        Fix ORC tests to be independent from default compression Sub-task Resolved Dongjoon Hyun Actions
        46.
        Use `zstd` as the default ORC compression Sub-task Resolved Dongjoon Hyun Actions
        47.
        Use the default ORC compression in OrcReadBenchmark Sub-task Resolved Dongjoon Hyun Actions
        48.
        Improve `TPCDSQueryBenchmark` to support other file formats Sub-task Resolved Dongjoon Hyun Actions
        49.
        Use default ORC compression in data source benchmarks Sub-task Resolved Dongjoon Hyun Actions
        50.
        Upgrade Avro to 1.11.3 Sub-task Resolved Dongjoon Hyun Actions
        51.
        Add `VolumeSuite` to K8s IT Sub-task Resolved Dongjoon Hyun Actions
        52.
        Enable `spark.ui.prometheus.enabled` by default Sub-task Resolved Dongjoon Hyun Actions
        53.
        Document a few missed `spark.ui.*` configs to `Configuration` page Sub-task Resolved Dongjoon Hyun Actions
        54.
        Upgrade Maven to 3.9.6 for MNG-7913 Sub-task Resolved Dongjoon Hyun Actions
        55.
        Use Scala 2.13 Spark distribution in HiveExternalCatalogVersionsSuite Sub-task Resolved Dongjoon Hyun Actions
        56.
        Add Apple Silicon Maven build test to GitHub Action CI Sub-task Resolved Dongjoon Hyun Actions
        57.
        Add Daily Apple Silicon Github Action Job (Java/Scala) Sub-task Resolved Hyukjin Kwon Actions
        58.
        Migrate from AppVeyor to GitHub Actions for SparkR tests on Windows Sub-task Resolved Hyukjin Kwon Actions
        59.
        Fix docker-integration-tests on Apple Chips Sub-task Resolved Kent Yao Actions
        60.
        Attach codec extension to avro datasource files Sub-task Resolved Kent Yao Actions
        61.
        Benchmarking Avro with Compression Codecs Sub-task Resolved Kent Yao Actions
        62.
        Codec xz and zstandard support compression level for avro files Sub-task Resolved Kent Yao Actions
        63.
        Disable unsupported `ExtendedLevelDBTest` on `MacOS/aarch64` Sub-task Resolved Yang Jie Actions
        64.
        Change to use bcprov/bcpkix-jdk18on for test Sub-task Resolved Yang Jie Actions
        65.
        Add `bouncy-castle` test dependencies to `sql/core` module for Hadoop 3.4.0 Sub-task Resolved Yang Jie Actions
        66.
        Add `bcpkix-jdk18on` test dependencies to `hive` module for Hadoop 3.4.0 Sub-task Resolved Dongjoon Hyun Actions
        67.
        Upgrade `bouncycastle` to 1.78 Sub-task Resolved Dongjoon Hyun Actions
        68.
        Use Hadoop 3.3.5 winutils in AppVeyor build Sub-task Resolved BingKun Pan Actions
        69.
        Upgrade Hadoop to 3.3.6 Sub-task Resolved Dongjoon Hyun Actions
        70.
        Fix `IsolatedClientLoader.supportsHadoopShadedClient` to handle Hadoop 3.4+ Sub-task Resolved Dongjoon Hyun Actions
        71.
        Exclude `logback` dependency from SBT like Maven Sub-task Resolved Dongjoon Hyun Actions
        72.
        Ignore `IntentionallyFaultyConnectionProvider` error in `CliSuite` Sub-task Resolved Dongjoon Hyun Actions
        73.
        Upgrade Hadoop to 3.4.0 Sub-task Resolved Dongjoon Hyun Actions
        74.
        Set spark.hadoop.fs.s3a.connection.establish.timeout to 30s Sub-task Resolved Dongjoon Hyun Actions
        75.
        Regenerate benchmark results Sub-task Resolved Dongjoon Hyun Actions
        76.
        Use hadoop 3.4.0 in some docs Sub-task Resolved BingKun Pan Actions
        77.
        Upgrade R version from 4.3.1 to 4.3.2 in AppVeyor Sub-task Resolved Hyukjin Kwon Actions
        78.
        Use R 4.3.3 in `windows` R GitHub Action job Sub-task Resolved Dongjoon Hyun Actions
        79.
        Use `Ubuntu 22.04` in `dev/infra/Dockerfile` Sub-task Resolved Dongjoon Hyun Actions
        80.
        Support MergeInto in DataFrameWriterV2 Sub-task Resolved Huaxin Gao Actions
        81.
        Upgrade Arrow to 14.0.0 Sub-task Resolved Yang Jie Actions
        82.
        Upgrade Arrow to 14.0.1 Sub-task Resolved Dongjoon Hyun Actions
        83.
        Upgrade Arrow to 14.0.2 Sub-task Resolved Dongjoon Hyun Actions
        84.
        Upgrade Arrow to 15.0.0 Sub-task Resolved Yang Jie Actions
        85.
        Upgrade Arrow to 15.0.2 Sub-task Resolved BingKun Pan Actions
        86.
        Upgrade the minimum version of PyArrow to 10.0.0 Sub-task Resolved Haejoon Lee Actions
        87.
        Upgrade the minimum version of `arrow` R package to 10.0.0 Sub-task Resolved Dongjoon Hyun Actions
        88.
        Move `o.a.s.variant` to `o.a.s.types.variant` Sub-task Resolved Dongjoon Hyun Actions
        89.
        Remove Spark 3.0~3.2 pyspark/version.py workaround from release scripts Sub-task Resolved Dongjoon Hyun Actions
        90.
        Add `slf4j-api` jar to the class path first before the others of `jars` directory Sub-task Resolved Dongjoon Hyun Actions
        91.
        Use Java 21 instead of 21-jre in K8s Dockerfile Sub-task Resolved Dongjoon Hyun Actions
        92.
        Make Spark build with -release instead of -target Sub-task Resolved Yang Jie Actions
        93.
        Use `HiveConf.getConfVars` or Hive conf names directly Sub-task Resolved Dongjoon Hyun Actions
        94.
        Upgrade hive-service-rpc 4.0.0 Sub-task Resolved Cheng Pan Actions
        95.
        Upgrade Kafka to 3.7.0 Sub-task Resolved BingKun Pan Actions
        96.
        Fix `GenerateMIMAIgnore.isPackagePrivateModule` to return false instead of failing Sub-task Resolved Dongjoon Hyun Actions
        97.
        Remove redundant rules from `MimaExcludes` Sub-task Resolved Dongjoon Hyun Actions
        98.
        Skip deleting pod from k8s if the pod does not exists Sub-task Resolved leesf Actions
        99.
        Run `ANSI` SQL CI twice per day Sub-task Resolved Dongjoon Hyun Actions
        100.
        Use ANSI SQL mode by default Sub-task Resolved Dongjoon Hyun Actions
        101.
        Switch ANSI SQL CI job to NON-ANSI SQL CI job Sub-task Resolved Dongjoon Hyun Actions
        102.
        Switch `spark.history.store.serializer` to use `PROTOBUF` by default Sub-task In Progress Dongjoon Hyun Actions
        103.
        Use Magic Committer for all S3 buckets by default Sub-task In Progress Dongjoon Hyun Actions
        104.
        Support Hive 4.0 metastore Sub-task Open Attila Zsolt Piros Actions
        105.
        Spark to support S3 Express One Zone Storage Sub-task Open Unassigned Actions
        106.
        Enable `spark.authenticate` by default in K8s environment Sub-task Open Unassigned Actions
        107.
        Remove/Reduce usage of TypeTag in public APIs Sub-task Open Unassigned Actions
        108.
        Drop legacy Hive-based ORC file format Sub-task Open Unassigned Actions
        109.
        Disable spark.sql.legacy.createHiveTableByDefault by default Sub-task Open Unassigned Actions

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            dongjoon Dongjoon Hyun

            Dates

              Created:
              Updated:

              Slack

                Issue deployment