Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.1.0
    • Component/s: Shuffle
    • Labels:
      None

      Description

      Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle data safer. This feature is necessary in spark. AES is a specification for the encryption of electronic data. There are 5 common modes in AES. CTR is one of the modes. We use two codec JceAesCtrCryptoCodec and OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms jdk provides while OpensslAesCtrCryptoCodec uses encrypted algorithms openssl provides.
      Because ugi credential info is used in the process of encrypted shuffle, we first enable encrypted shuffle on spark-on-yarn framework.

        Issue Links

          Activity

          Hide
          apachespark Apache Spark added a comment -

          User 'kellyzly' has created a pull request for this issue:
          https://github.com/apache/spark/pull/4491

          Show
          apachespark Apache Spark added a comment - User 'kellyzly' has created a pull request for this issue: https://github.com/apache/spark/pull/4491
          Hide
          kellyzly liyunzhang added a comment -

          How to test encrypted shuffle feature on spark on yarn framework:

          1. build(because i modified the pom.xml) : mvn package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.6 -Dhadoop.version=2.6.0 -Phive –DskipTests
          2. when need enable encrypted shuffle, add following in conf/spark-defaults.conf
            spark.encrypted.shuffle true
          3. start master and work: sbin/start-all.sh
          4. run wordcount in yarn-cluster and yarn-client mode:
            1. ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 --driver-memory 1g --executor-memory 1g --executor-cores 1 wordcount.jar
            2. ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client --num-executors 3 --driver-memory 1g --executor-memory 1g --executor-cores 1 wordcount.jar
          Show
          kellyzly liyunzhang added a comment - How to test encrypted shuffle feature on spark on yarn framework: build(because i modified the pom.xml) : mvn package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.6 -Dhadoop.version=2.6.0 -Phive –DskipTests when need enable encrypted shuffle, add following in conf/spark-defaults.conf spark.encrypted.shuffle true start master and work: sbin/start-all.sh run wordcount in yarn-cluster and yarn-client mode: ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 --driver-memory 1g --executor-memory 1g --executor-cores 1 wordcount.jar ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client --num-executors 3 --driver-memory 1g --executor-memory 1g --executor-cores 1 wordcount.jar
          Hide
          WangTaoTheTonic WangTaoTheTonic added a comment -

          If we try to test this on yarn mode, why should we do the third step(start master and worker)?

          Show
          WangTaoTheTonic WangTaoTheTonic added a comment - If we try to test this on yarn mode, why should we do the third step(start master and worker)?
          Hide
          srowen Sean Owen added a comment -

          If this depends on Hadoop 2.6, it can't be used in Spark anytime soon.

          Show
          srowen Sean Owen added a comment - If this depends on Hadoop 2.6, it can't be used in Spark anytime soon.
          Hide
          kellyzly liyunzhang added a comment -

          Hi Sean Owen:
          Encrypted shuffle can make the process of shuffle more safer. I think it is necessary in spark. Previous design is reusing hadoop encrypted shuffle algorithm to enable spark encrypted shuffle. The design has a big problem that it imports many crypto classes like CryptoInputStream and CryptoOutputStream which is marked "private" in hadoop. Now my teammates and i decided to write the crypto classes in spark so no dependance to hadoop 2.6. Not directly copying hadoop code to spark. we only reference the crypto algorithm like JCE/AES-NI which is used in hadoop to spark. Maybe i need rename the jira name from "Reuse hadoop encrypted shuffle algorithm to enable spark encrypted shuffle" to "Add encrypted shuffle in spark". Any advices are welcome.

          Show
          kellyzly liyunzhang added a comment - Hi Sean Owen : Encrypted shuffle can make the process of shuffle more safer. I think it is necessary in spark. Previous design is reusing hadoop encrypted shuffle algorithm to enable spark encrypted shuffle. The design has a big problem that it imports many crypto classes like CryptoInputStream and CryptoOutputStream which is marked "private" in hadoop. Now my teammates and i decided to write the crypto classes in spark so no dependance to hadoop 2.6. Not directly copying hadoop code to spark. we only reference the crypto algorithm like JCE/AES-NI which is used in hadoop to spark. Maybe i need rename the jira name from "Reuse hadoop encrypted shuffle algorithm to enable spark encrypted shuffle" to "Add encrypted shuffle in spark". Any advices are welcome.
          Hide
          kellyzly liyunzhang added a comment -

          sorry to reply so late. If run spark on yarn mode, no need to start master and worker? i'm a newbie to spark.Any guidance are welcome

          Show
          kellyzly liyunzhang added a comment - sorry to reply so late. If run spark on yarn mode, no need to start master and worker? i'm a newbie to spark.Any guidance are welcome
          Hide
          kellyzly liyunzhang added a comment -

          Sean Owen, i have submitted new design doc-Design Document of Encrypted Spark Shuffle_20150318 and also submitted newest code to pull request. In this submit, following big changes are made:

          • Delete hadoop2.6 profile. We don't depend on hadoop 2.6 because I add crypto classes like CryptoInputStream.scala,CryptoOutputStream.scala and so on in core module org.apache.Spark.crypto package.
          • AES is a specification for the encryption of electronic data. There are 5 common modes in AES. CTR is one of the modes. We use two codec JceAesCtrCryptoCodec and OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms jdk provides while OpensslAesCtrCryptoCodec uses encrypted algorithms openssl provides. In current code, we only implement JceAesCtrypoCodec and will implement OpensslAesCtrCryptoCodec later.

          How to test?

          • download code from https://github.com/kellyzly/spark
          • build : mvn package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive -DskipTests
          • when need enable encrypted shuffle, add following in conf/spark-defaults.conf
            spark.encrypted.shuffle true
            spark.job.encrypted-intermediate-data true
            spark.security.crypto.cipher.suite AES/CTR/NoPadding
            spark.security.crypto.codec.classes.aes.ctr.nopadding org.apache.spark.crypto.JceAesCtrCryptoCodec
          • start master and work: sbin/start-all.sh
          • edit SparkPi source code to worldcount, run wordcount
            • ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 --driver-memory 1g --executor-memory 1g --executor-cores 1 examples/target/my.spark-examples_2.10-1.3.0-SNAPSHOT.jar
            • ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client --num-executors 3 --driver-memory 1g --executor-memory 1g --executor-cores 1 examples/target/my.spark-examples_2.10-1.3.0-SNAPSHOT.jar
          Show
          kellyzly liyunzhang added a comment - Sean Owen , i have submitted new design doc-Design Document of Encrypted Spark Shuffle_20150318 and also submitted newest code to pull request. In this submit, following big changes are made: Delete hadoop2.6 profile. We don't depend on hadoop 2.6 because I add crypto classes like CryptoInputStream.scala,CryptoOutputStream.scala and so on in core module org.apache.Spark.crypto package. AES is a specification for the encryption of electronic data. There are 5 common modes in AES. CTR is one of the modes. We use two codec JceAesCtrCryptoCodec and OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms jdk provides while OpensslAesCtrCryptoCodec uses encrypted algorithms openssl provides. In current code, we only implement JceAesCtrypoCodec and will implement OpensslAesCtrCryptoCodec later. How to test? download code from https://github.com/kellyzly/spark build : mvn package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive -DskipTests when need enable encrypted shuffle, add following in conf/spark-defaults.conf spark.encrypted.shuffle true spark.job.encrypted-intermediate-data true spark.security.crypto.cipher.suite AES/CTR/NoPadding spark.security.crypto.codec.classes.aes.ctr.nopadding org.apache.spark.crypto.JceAesCtrCryptoCodec start master and work: sbin/start-all.sh edit SparkPi source code to worldcount, run wordcount ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 --driver-memory 1g --executor-memory 1g --executor-cores 1 examples/target/my.spark-examples_2.10-1.3.0-SNAPSHOT.jar ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client --num-executors 3 --driver-memory 1g --executor-memory 1g --executor-cores 1 examples/target/my.spark-examples_2.10-1.3.0-SNAPSHOT.jar
          Hide
          kellyzly liyunzhang added a comment - - edited

          Hi all:
          There are two methods to not use encrypted classes like
          CryptoInputStream.java provided in hadoop2.6:

          • Isolate code like
            CryptoInputStream/CryptoOutputStream from hadoop source code to a seperated lib and put it to maven repository and let other projects to depend on.
          • Write CryptoInputStream/CryptoOutputStream and so on in spark code.

          Both method has its advantages and disadvantages:

          • Method1:
            Disadvantage:It need hadoop project or spark community to review the code in the seperated lib.
            After all the code is finished reviewed and the seperated lib has been put to maven repository, we will introduce it to spark code. Maybe it need much time.
            Advantage: After the recognition of hadoop or spark community, we can ensure the quality of the code. If some fixes about crypto classes are made, someone update the seperated lib and then we modify the maven dependence in spark.
          • Method2:
            Disadvantage: We need keep an eye on the later fixes about crypto classes are made in later hadoop release. If some changes, we need update the code in scala.
            Advantage: No dependance to other lib. It's convenient for us to make some changes if it is really needed in spark.

          For method1, my teammate is working on it. For method2, the code in the pull request is finished and are waited to review. Can anyone give me some advices?

          Show
          kellyzly liyunzhang added a comment - - edited Hi all: There are two methods to not use encrypted classes like CryptoInputStream.java provided in hadoop2.6: Isolate code like CryptoInputStream/CryptoOutputStream from hadoop source code to a seperated lib and put it to maven repository and let other projects to depend on. Write CryptoInputStream/CryptoOutputStream and so on in spark code. Both method has its advantages and disadvantages: Method1: Disadvantage:It need hadoop project or spark community to review the code in the seperated lib. After all the code is finished reviewed and the seperated lib has been put to maven repository, we will introduce it to spark code. Maybe it need much time. Advantage: After the recognition of hadoop or spark community, we can ensure the quality of the code. If some fixes about crypto classes are made, someone update the seperated lib and then we modify the maven dependence in spark. Method2: Disadvantage: We need keep an eye on the later fixes about crypto classes are made in later hadoop release. If some changes, we need update the code in scala. Advantage: No dependance to other lib. It's convenient for us to make some changes if it is really needed in spark. For method1, my teammate is working on it. For method2, the code in the pull request is finished and are waited to review. Can anyone give me some advices?
          Hide
          kellyzly liyunzhang added a comment -

          Hi all:I have a question:is there any api in spark like getInstance(className:String):AnyRef ? I saw org.apache.spark.sql.hive .thriftserver.ReflectionUtils.scala, but not provide getInstance function in it.
          Now I wrote a function getInstance:
          org.apache.spark.crypto.CryptoCodec#getInstance: in my getInstance(className:String), i judge classname with "JceAesCtrCryptoCodec" and "OpensslAesCtrCryptoCodec" and if the name equals "JceAesCtrCryptoCodec", it creates the instance by scala.reflect.runtime.universe api. The code can be better like following way but I do not know how to write it. If some knows, please tell me.

             def getInstance1(className:String):AnyRef={
                 val m = universe.runtimeMirror(getClass.getClassLoader)
                 var classLoader: ClassLoader = Thread.currentThread.getContextClassLoader
                 val aClass:Class[_] =   Class.forName(className, true, classLoader)
                 val aType: scala.reflect.api.TypeTags.TypeTag =  // how to write this line?
                 val classCryptoCodec = universe.typeOf[aType]
                   .typeSymbol.asClass
                 val cm = m.reflectClass(classCryptoCodec)
                 val ctor = universe.typeOf[aType].declaration(
                   universe.nme.CONSTRUCTOR).asMethod
                 val ctorm = cm.reflectConstructor(ctor)
                 val p = ctorm()
                 p
               }
          
          Show
          kellyzly liyunzhang added a comment - Hi all:I have a question:is there any api in spark like getInstance(className:String):AnyRef ? I saw org.apache.spark.sql.hive .thriftserver.ReflectionUtils.scala, but not provide getInstance function in it. Now I wrote a function getInstance: org.apache.spark.crypto.CryptoCodec#getInstance : in my getInstance(className:String), i judge classname with "JceAesCtrCryptoCodec" and "OpensslAesCtrCryptoCodec" and if the name equals "JceAesCtrCryptoCodec", it creates the instance by scala.reflect.runtime.universe api. The code can be better like following way but I do not know how to write it. If some knows, please tell me. def getInstance1(className: String ):AnyRef={ val m = universe.runtimeMirror(getClass.getClassLoader) var classLoader: ClassLoader = Thread .currentThread.getContextClassLoader val aClass: Class [_] = Class .forName(className, true , classLoader) val aType: scala.reflect.api.TypeTags.TypeTag = // how to write this line? val classCryptoCodec = universe.typeOf[aType] .typeSymbol.asClass val cm = m.reflectClass(classCryptoCodec) val ctor = universe.typeOf[aType].declaration( universe.nme.CONSTRUCTOR).asMethod val ctorm = cm.reflectConstructor(ctor) val p = ctorm() p }
          Hide
          apachespark Apache Spark added a comment -

          User 'kellyzly' has created a pull request for this issue:
          https://github.com/apache/spark/pull/5307

          Show
          apachespark Apache Spark added a comment - User 'kellyzly' has created a pull request for this issue: https://github.com/apache/spark/pull/5307
          Hide
          kellyzly liyunzhang added a comment - - edited

          Hi all:
          Now there are two methods to implement SPARK-5682(Add encrypted shuffle in spark).
          Method1: use Chimera(Chimera is a project which strips code related to CryptoInputStream/CryptoOutputStream from Hadoop to facilitate AES-NI based data encryption in other projects.) to implement spark encrypted shuffle. Pull request: https://github.com/apache/spark/pull/5307.
          Method2: Add crypto package in spark-core module and add CryptoInputStream.scala and CryptoOutputStream.scala and so on in this package. Pull request : https://github.com/apache/spark/pull/4491.
          The latest design doc "Design Document of Encrypted Spark Shuffle_20150402" has been submitted.
          Which one is better? Any advices/guidance are welcome!

          Show
          kellyzly liyunzhang added a comment - - edited Hi all: Now there are two methods to implement SPARK-5682 (Add encrypted shuffle in spark). Method1: use Chimera (Chimera is a project which strips code related to CryptoInputStream/CryptoOutputStream from Hadoop to facilitate AES-NI based data encryption in other projects.) to implement spark encrypted shuffle. Pull request: https://github.com/apache/spark/pull/5307 . Method2: Add crypto package in spark-core module and add CryptoInputStream.scala and CryptoOutputStream.scala and so on in this package. Pull request : https://github.com/apache/spark/pull/4491 . The latest design doc "Design Document of Encrypted Spark Shuffle_20150402" has been submitted. Which one is better? Any advices/guidance are welcome!
          Hide
          kellyzly liyunzhang added a comment -

          Design Document of Encrypted Spark Shuffle_20150506.docx is the latest design doc.

          Show
          kellyzly liyunzhang added a comment - Design Document of Encrypted Spark Shuffle_20150506.docx is the latest design doc.
          Hide
          kellyzly liyunzhang added a comment -

          hujiayin: thanks for your comment

          The solution relied on hadoop API and maybe downgrade the performance. 

          For The solution relied on hadoop API: You mean i use org.apache.hadoop.io.Text in CommonConfigurationKeys .
          But i have different idea for this:

          @Stringable
          @InterfaceAudience.Public
          @InterfaceStability.Stable
          public class Text extends BinaryComparable
          org.apache.hadoop.io.Text  
          

          it shows that org.apache.hadoop.io.Text is stable which means the interfaces it provides will be not changed a lot in the later release.

          For downgrade the performance: have you any test results to show this?

          Show
          kellyzly liyunzhang added a comment - hujiayin : thanks for your comment The solution relied on hadoop API and maybe downgrade the performance.  For The solution relied on hadoop API: You mean i use org.apache.hadoop.io.Text in CommonConfigurationKeys . But i have different idea for this: @Stringable @InterfaceAudience.Public @InterfaceStability.Stable public class Text extends BinaryComparable org.apache.hadoop.io.Text it shows that org.apache.hadoop.io.Text is stable which means the interfaces it provides will be not changed a lot in the later release. For downgrade the performance: have you any test results to show this?
          Hide
          kellyzly liyunzhang added a comment -

          hujiayin: thanks for your comment.

          This feature is not based on hadooop2.6. it is based on hadoop2.6 in original design. In the latest design doc(20150506), It shows that now there are two ways to implement encrypted shuffle in spark. Currently we only implement it on spark-on-yarn framework. One is based on Chimera(Chimera is a project which strips code related to CryptoInputStream/CryptoOutputStream from Hadoop to facilitate AES-NI based data encryption in other projects)(see https://github.com/apache/spark/pull/5307). In the other way,we implement all the crypto classes like CryptoInputStream/CryptoOutputStream in scala under core/src/main/scala/org/apache/spark/crypto/ package(see https://github.com/apache/spark/pull/4491).

          For the problem of importing hadoop api in spark, if the interface of hadoop class is public and stable,it can be use in spark.
          in https://hadoop.apache.org/docs/current/api/org/apache/hadoop/classification/InterfaceStability.html, it says:

          Incompatible changes must not be made to classes marked as stable.

          which means when a class is marked stable, later release will not change it.

          Show
          kellyzly liyunzhang added a comment - hujiayin : thanks for your comment. This feature is not based on hadooop2.6. it is based on hadoop2.6 in original design. In the latest design doc(20150506), It shows that now there are two ways to implement encrypted shuffle in spark. Currently we only implement it on spark-on-yarn framework. One is based on Chimera(Chimera is a project which strips code related to CryptoInputStream/CryptoOutputStream from Hadoop to facilitate AES-NI based data encryption in other projects) (see https://github.com/apache/spark/pull/5307 ). In the other way,we implement all the crypto classes like CryptoInputStream/CryptoOutputStream in scala under core/src/main/scala/org/apache/spark/crypto/ package(see https://github.com/apache/spark/pull/4491 ). For the problem of importing hadoop api in spark, if the interface of hadoop class is public and stable,it can be use in spark. in https://hadoop.apache.org/docs/current/api/org/apache/hadoop/classification/InterfaceStability.html , it says: Incompatible changes must not be made to classes marked as stable. which means when a class is marked stable, later release will not change it.
          Hide
          kellyzly liyunzhang added a comment -

          hujiayin:

          the AES solution is a bit heavy to encode/decode the live steaming data.

          Is there any other solution to encode/decode the live streaming data? please share your suggestion with us.

          Show
          kellyzly liyunzhang added a comment - hujiayin : the AES solution is a bit heavy to encode/decode the live steaming data. Is there any other solution to encode/decode the live streaming data? please share your suggestion with us.
          Hide
          yoderme Mike Yoder added a comment -

          One quick question about AES/CTR. This cipher mode has many nice properties, but is only safe to use when the key/IV pair are never reused. What assurances do you have that the key/IV aren't reused in your scheme? (I skimmed the doc, but didn't see an obvious answer; please forgive me if the answer was in there.)

          Show
          yoderme Mike Yoder added a comment - One quick question about AES/CTR. This cipher mode has many nice properties, but is only safe to use when the key/IV pair are never reused. What assurances do you have that the key/IV aren't reused in your scheme? (I skimmed the doc, but didn't see an obvious answer; please forgive me if the answer was in there.)
          Hide
          Ferd Ferdinand Xu added a comment -

          Thank you for your question. The key is generated by key gen which is instanced by specified keygen algorithm. The part of work is available in the method CryptoConf#initSparkShuffleCredentials. More detailed information is available in the PR(https://github.com/apache/spark/pull/8880). And for the IV part, we are using Chimera(https://github.com/intel-hadoop/chimera) as an external library in the latest PR(https://github.com/intel-hadoop/chimera/blob/master/src/main/java/com/intel/chimera/JceAesCtrCryptoCodec.java#L70 and https://github.com/intel-hadoop/chimera/blob/master/src/main/java/com/intel/chimera/OpensslAesCtrCryptoCodec.java#L81). You can also deep into the code about how IV is calculated by counter and initial IV(https://github.com/intel-hadoop/chimera/blob/master/src/main/java/com/intel/chimera/AesCtrCryptoCodec.java#L42). The initial IV is generated by security random.

          Show
          Ferd Ferdinand Xu added a comment - Thank you for your question. The key is generated by key gen which is instanced by specified keygen algorithm. The part of work is available in the method CryptoConf#initSparkShuffleCredentials. More detailed information is available in the PR( https://github.com/apache/spark/pull/8880 ). And for the IV part, we are using Chimera( https://github.com/intel-hadoop/chimera ) as an external library in the latest PR( https://github.com/intel-hadoop/chimera/blob/master/src/main/java/com/intel/chimera/JceAesCtrCryptoCodec.java#L70 and https://github.com/intel-hadoop/chimera/blob/master/src/main/java/com/intel/chimera/OpensslAesCtrCryptoCodec.java#L81 ). You can also deep into the code about how IV is calculated by counter and initial IV( https://github.com/intel-hadoop/chimera/blob/master/src/main/java/com/intel/chimera/AesCtrCryptoCodec.java#L42 ). The initial IV is generated by security random.
          Hide
          apachespark Apache Spark added a comment -

          User 'winningsix' has created a pull request for this issue:
          https://github.com/apache/spark/pull/8880

          Show
          apachespark Apache Spark added a comment - User 'winningsix' has created a pull request for this issue: https://github.com/apache/spark/pull/8880
          Hide
          krish.dey Krish Dey added a comment - - edited

          The constructor still seems to be the same as it is. Doesn't this to be changed to accommodate encryption of spill to disk? Moreover passing the DummySerializerInstance it should be allowed to pass any Serializer

          public UnsafeSorterSpillWriter(BlockManager blockManager, int fileBufferSize, ShuffleWriteMetrics writeMetrics, int numRecordsToWrite) throws IOException

          { final Tuple2<TempLocalBlockId, File> spilledFileInfo = blockManager.diskBlockManager().createTempLocalBlock(); this.file = spilledFileInfo._2(); this.blockId = spilledFileInfo._1(); this.numRecordsToWrite = numRecordsToWrite; // Unfortunately, we need a serializer instance in order to construct a DiskBlockObjectWriter. // Our write path doesn't actually use this serializer (since we end up calling the `write()` // OutputStream methods), but DiskBlockObjectWriter still calls some methods on it. To work // around this, we pass a dummy no-op serializer. writer = blockManager.getDiskWriter( blockId, file, DummySerializerInstance.INSTANCE, fileBufferSize, writeMetrics); // Write the number of records writeIntToBuffer(numRecordsToWrite, 0); writer.write(writeBuffer, 0, 4); }
          Show
          krish.dey Krish Dey added a comment - - edited The constructor still seems to be the same as it is. Doesn't this to be changed to accommodate encryption of spill to disk? Moreover passing the DummySerializerInstance it should be allowed to pass any Serializer public UnsafeSorterSpillWriter(BlockManager blockManager, int fileBufferSize, ShuffleWriteMetrics writeMetrics, int numRecordsToWrite) throws IOException { final Tuple2<TempLocalBlockId, File> spilledFileInfo = blockManager.diskBlockManager().createTempLocalBlock(); this.file = spilledFileInfo._2(); this.blockId = spilledFileInfo._1(); this.numRecordsToWrite = numRecordsToWrite; // Unfortunately, we need a serializer instance in order to construct a DiskBlockObjectWriter. // Our write path doesn't actually use this serializer (since we end up calling the `write()` // OutputStream methods), but DiskBlockObjectWriter still calls some methods on it. To work // around this, we pass a dummy no-op serializer. writer = blockManager.getDiskWriter( blockId, file, DummySerializerInstance.INSTANCE, fileBufferSize, writeMetrics); // Write the number of records writeIntToBuffer(numRecordsToWrite, 0); writer.write(writeBuffer, 0, 4); }

            People

            • Assignee:
              Ferd Ferdinand Xu
              Reporter:
              kellyzly liyunzhang
            • Votes:
              0 Vote for this issue
              Watchers:
              30 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development