Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • 1.11.0
    • None
    • parquet-mr
    • None

    Description

      How to reproduce:

      git clone https://github.com/apache/spark.git && cd spark
      git fetch origin pull/26804/head:PARQUET-1746
      git checkout PARQUET-1746
      build/sbt "sql/test-only *StreamSuite"
      

      output:

      sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: 
      Decoded objects do not match expected objects:
      expected: WrappedArray(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
      actual:   WrappedArray(0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 2)
      assertnotnull(upcast(getcolumnbyordinal(0, LongType), LongType, - root class: "scala.Long"))
      +- upcast(getcolumnbyordinal(0, LongType), LongType, - root class: "scala.Long")
         +- getcolumnbyordinal(0, LongType)
      
               
      	at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530)
      	at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529)
      	at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
      	at org.scalatest.Assertions.fail(Assertions.scala:1091)
      	at org.scalatest.Assertions.fail$(Assertions.scala:1087)
      	at org.scalatest.FunSuite.fail(FunSuite.scala:1560)
      	at org.apache.spark.sql.QueryTest.checkDataset(QueryTest.scala:73)
      	at org.apache.spark.sql.streaming.StreamSuite.$anonfun$new$22(StreamSuite.scala:215)
      	at org.apache.spark.sql.streaming.StreamSuite.$anonfun$new$22$adapted(StreamSuite.scala:208)
      	at org.apache.spark.sql.test.SQLTestUtils.$anonfun$withTempDir$1(SQLTestUtils.scala:76)
      	at org.apache.spark.sql.test.SQLTestUtils.$anonfun$withTempDir$1$adapted(SQLTestUtils.scala:75)
      	at org.apache.spark.SparkFunSuite.withTempDir(SparkFunSuite.scala:161)
      	at org.apache.spark.sql.streaming.StreamSuite.org$apache$spark$sql$test$SQLTestUtils$$super$withTempDir(StreamSuite.scala:51)
      	at org.apache.spark.sql.test.SQLTestUtils.withTempDir(SQLTestUtils.scala:75)
      	at org.apache.spark.sql.test.SQLTestUtils.withTempDir$(SQLTestUtils.scala:74)
      	at org.apache.spark.sql.streaming.StreamSuite.withTempDir(StreamSuite.scala:51)
      	at org.apache.spark.sql.streaming.StreamSuite.$anonfun$new$21(StreamSuite.scala:208)
      	at org.apache.spark.sql.streaming.StreamSuite.$anonfun$new$21$adapted(StreamSuite.scala:207)
      	at org.apache.spark.sql.test.SQLTestUtils.$anonfun$withTempDir$1(SQLTestUtils.scala:76)
      	at org.apache.spark.sql.test.SQLTestUtils.$anonfun$withTempDir$1$adapted(SQLTestUtils.scala:75)
      	at org.apache.spark.SparkFunSuite.withTempDir(SparkFunSuite.scala:161)
      	at org.apache.spark.sql.streaming.StreamSuite.org$apache$spark$sql$test$SQLTestUtils$$super$withTempDir(StreamSuite.scala:51)
      	at org.apache.spark.sql.test.SQLTestUtils.withTempDir(SQLTestUtils.scala:75)
      	at org.apache.spark.sql.test.SQLTestUtils.withTempDir$(SQLTestUtils.scala:74)
      	at org.apache.spark.sql.streaming.StreamSuite.withTempDir(StreamSuite.scala:51)
      	at org.apache.spark.sql.streaming.StreamSuite.assertDF$1(StreamSuite.scala:207)
      	at org.apache.spark.sql.streaming.StreamSuite.$anonfun$new$25(StreamSuite.scala:226)
      	at org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf(SQLHelper.scala:52)
      	at org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf$(SQLHelper.scala:36)
      	at org.apache.spark.sql.streaming.StreamSuite.org$apache$spark$sql$test$SQLTestUtilsBase$$super$withSQLConf(StreamSuite.scala:51)
      	at org.apache.spark.sql.test.SQLTestUtilsBase.withSQLConf(SQLTestUtils.scala:231)
      	at org.apache.spark.sql.test.SQLTestUtilsBase.withSQLConf$(SQLTestUtils.scala:229)
      	at org.apache.spark.sql.streaming.StreamSuite.withSQLConf(StreamSuite.scala:51)
      	at org.apache.spark.sql.streaming.StreamSuite.$anonfun$new$24(StreamSuite.scala:225)
      	at org.apache.spark.sql.streaming.StreamSuite.$anonfun$new$24$adapted(StreamSuite.scala:224)
      	at scala.collection.immutable.List.foreach(List.scala:392)
      	at org.apache.spark.sql.streaming.StreamSuite.$anonfun$new$20(StreamSuite.scala:224)
      	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
      	at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
      	at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
      	at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
      	at org.scalatest.Transformer.apply(Transformer.scala:22)
      	at org.scalatest.Transformer.apply(Transformer.scala:20)
      	at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
      	at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:149)
      	at org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184)
      	at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196)
      	at org.scalatest.SuperEngine.runTestImpl(Engine.scala:286)
      	at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196)
      	at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178)
      	at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:56)
      	at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:221)
      	at org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:214)
      	at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:56)
      	at org.scalatest.FunSuiteLike.$anonfun$runTests$1(FunSuiteLike.scala:229)
      	at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:393)
      	at scala.collection.immutable.List.foreach(List.scala:392)
      	at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:381)
      	at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:376)
      	at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:458)
      	at org.scalatest.FunSuiteLike.runTests(FunSuiteLike.scala:229)
      	at org.scalatest.FunSuiteLike.runTests$(FunSuiteLike.scala:228)
      	at org.scalatest.FunSuite.runTests(FunSuite.scala:1560)
      	at org.scalatest.Suite.run(Suite.scala:1124)
      	at org.scalatest.Suite.run$(Suite.scala:1106)
      	at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560)
      	at org.scalatest.FunSuiteLike.$anonfun$run$1(FunSuiteLike.scala:233)
      	at org.scalatest.SuperEngine.runImpl(Engine.scala:518)
      	at org.scalatest.FunSuiteLike.run(FunSuiteLike.scala:233)
      	at org.scalatest.FunSuiteLike.run$(FunSuiteLike.scala:232)
      	at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:56)
      	at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
      	at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
      	at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
      	at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:56)
      	at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317)
      	at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510)
      	at sbt.ForkMain$Run$2.call(ForkMain.java:296)
      	at sbt.ForkMain$Run$2.call(ForkMain.java:286)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      

      Attachments

        Issue Links

          Activity

            What exactly is reordered here? If it is a list in the parquet schema then the order shall not change and it is indeed a serious issue. However, I cannot see a reason how it could happen. Could you explain in more details from the parquet point of view?

            gszadovszky Gabor Szadovszky added a comment - What exactly is reordered here? If it is a list in the parquet schema then the order shall not change and it is indeed a serious issue. However, I cannot see a reason how it could happen. Could you explain in more details from the parquet point of view?
            yumwang Yuming Wang added a comment -

            It seems 1.12.0-SNAPSHOT works.

            yumwang Yuming Wang added a comment - It seems 1.12.0-SNAPSHOT works.

            For me the issue is reproducible with the current parquet-mr master (1.12.0-SNAPSHOT).

            gszadovszky Gabor Szadovszky added a comment - For me the issue is reproducible with the current parquet-mr master (1.12.0-SNAPSHOT).

            The related Spark test generates 22 parquet files. The first 11 are empty meaning no data is in them. (I am not sure if they are even valid this way.)

            The last 11 contains only 1 value in each:

            $> ls *.parquet| while read file; do echo "$file"; parquet-tools cat $file 2>/dev/null; done
            part-00000-19f5b358-410b-4dd4-b167-4016984ac6ef-c000.snappy.parquet
            part-00000-212d052b-d03a-413b-98f3-1348c2d06855-c000.snappy.parquet
            part-00000-311f4442-4225-47f1-aaf1-c7a8e38a875f-c000.snappy.parquet
            part-00000-459612f9-d564-43a9-bf31-2d174c996fa6-c000.snappy.parquet
            part-00000-5e20cfa6-a5d0-4d5f-a382-741907a74874-c000.snappy.parquet
            part-00000-62881d28-7226-4a78-9fe7-2ed41b895e1c-c000.snappy.parquet
            part-00000-9aaa784f-080a-43ae-9296-20bd033aa300-c000.snappy.parquet
            part-00000-a01e81ab-a987-4929-991d-60f01acab1ca-c000.snappy.parquet
            part-00000-add0de8e-26eb-406b-bf02-702924f89f1a-c000.snappy.parquet
            part-00000-e8dd315d-b97e-4257-917c-34696d0a866c-c000.snappy.parquet
            part-00000-ed8be0d2-508f-4666-b66f-93182413472e-c000.snappy.parquet
            part-00001-20b63b66-8f9a-4e3b-893c-4acb106ddac1-c000.snappy.parquet
            a = 7
            
            part-00001-227ff83d-5341-48be-97be-00cde92cb303-c000.snappy.parquet
            a = 1
            
            part-00001-38e186bb-ca67-4e3d-87fe-780585f25c84-c000.snappy.parquet
            a = 0
            
            part-00001-3b06880b-6d57-49d7-bb63-4220092ef1ae-c000.snappy.parquet
            a = 4
            
            part-00001-449026a6-f486-4fca-81fa-b7cdeaddfa3b-c000.snappy.parquet
            a = 5
            
            part-00001-567ed849-b1e9-494f-b33f-495592826b28-c000.snappy.parquet
            a = 2
            
            part-00001-70fa8c7e-9b45-4103-a99e-5b0f61b6062a-c000.snappy.parquet
            a = 10
            
            part-00001-7399d477-c393-481b-b76f-1289deb72bc0-c000.snappy.parquet
            a = 3
            
            part-00001-93678ef9-27d4-4a5d-aaa1-58492de248e7-c000.snappy.parquet
            a = 6
            
            part-00001-c1b934d8-0058-40e0-87f9-40ee7eca52ed-c000.snappy.parquet
            a = 8
            
            part-00001-c599dd4d-32c8-4032-935a-b1d45bc508e1-c000.snappy.parquet
            a = 9
            

            This way the parquet-mr library has nothing to do with the ordering of these values.

            gszadovszky Gabor Szadovszky added a comment - The related Spark test generates 22 parquet files. The first 11 are empty meaning no data is in them. (I am not sure if they are even valid this way.) The last 11 contains only 1 value in each: $> ls *.parquet| while read file; do echo "$file"; parquet-tools cat $file 2>/dev/null; done part-00000-19f5b358-410b-4dd4-b167-4016984ac6ef-c000.snappy.parquet part-00000-212d052b-d03a-413b-98f3-1348c2d06855-c000.snappy.parquet part-00000-311f4442-4225-47f1-aaf1-c7a8e38a875f-c000.snappy.parquet part-00000-459612f9-d564-43a9-bf31-2d174c996fa6-c000.snappy.parquet part-00000-5e20cfa6-a5d0-4d5f-a382-741907a74874-c000.snappy.parquet part-00000-62881d28-7226-4a78-9fe7-2ed41b895e1c-c000.snappy.parquet part-00000-9aaa784f-080a-43ae-9296-20bd033aa300-c000.snappy.parquet part-00000-a01e81ab-a987-4929-991d-60f01acab1ca-c000.snappy.parquet part-00000-add0de8e-26eb-406b-bf02-702924f89f1a-c000.snappy.parquet part-00000-e8dd315d-b97e-4257-917c-34696d0a866c-c000.snappy.parquet part-00000-ed8be0d2-508f-4666-b66f-93182413472e-c000.snappy.parquet part-00001-20b63b66-8f9a-4e3b-893c-4acb106ddac1-c000.snappy.parquet a = 7 part-00001-227ff83d-5341-48be-97be-00cde92cb303-c000.snappy.parquet a = 1 part-00001-38e186bb-ca67-4e3d-87fe-780585f25c84-c000.snappy.parquet a = 0 part-00001-3b06880b-6d57-49d7-bb63-4220092ef1ae-c000.snappy.parquet a = 4 part-00001-449026a6-f486-4fca-81fa-b7cdeaddfa3b-c000.snappy.parquet a = 5 part-00001-567ed849-b1e9-494f-b33f-495592826b28-c000.snappy.parquet a = 2 part-00001-70fa8c7e-9b45-4103-a99e-5b0f61b6062a-c000.snappy.parquet a = 10 part-00001-7399d477-c393-481b-b76f-1289deb72bc0-c000.snappy.parquet a = 3 part-00001-93678ef9-27d4-4a5d-aaa1-58492de248e7-c000.snappy.parquet a = 6 part-00001-c1b934d8-0058-40e0-87f9-40ee7eca52ed-c000.snappy.parquet a = 8 part-00001-c599dd4d-32c8-4032-935a-b1d45bc508e1-c000.snappy.parquet a = 9 This way the parquet-mr library has nothing to do with the ordering of these values.
            yumwang Yuming Wang added a comment - - edited

            We can disable parquet.page.write-checksum.enabled to workaround this issue:
            https://github.com/apache/spark/pull/26804#discussion_r561044576

            yumwang Yuming Wang added a comment - - edited We can disable parquet.page.write-checksum.enabled to workaround this issue: https://github.com/apache/spark/pull/26804#discussion_r561044576
            githubbot ASF GitHub Bot added a comment -

            wangyum opened a new pull request #857:
            URL: https://github.com/apache/parquet-mr/pull/857

            Make sure you have checked all steps below.

                1. Jira
                1. Tests
            • [ ] My PR adds the following unit tests _OR_ does not need testing for this extremely good reason:
                1. Commits
            • [ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)":
              1. Subject is separated from body by a blank line
              1. Subject is limited to 50 characters (not including Jira issue reference)
              1. Subject does not end with a period
              1. Subject uses the imperative mood ("add", not "adding")
              1. Body wraps at 72 characters
              1. Body explains "what" and "why", not "how"
                1. Documentation
            • [ ] In case of new functionality, my PR adds documentation that describes how to use it.
            • All the public functions and the classes in the PR contain Javadoc that explain what it does

            ----------------------------------------------------------------
            This is an automated message from the Apache Git Service.
            To respond to the message, please log on to GitHub and use the
            URL above to go to the specific comment.

            For queries about this service, please contact Infrastructure at:
            users@infra.apache.org

            githubbot ASF GitHub Bot added a comment - wangyum opened a new pull request #857: URL: https://github.com/apache/parquet-mr/pull/857 Make sure you have checked all steps below. Jira [ ] My PR addresses the following [Parquet Jira] ( https://issues.apache.org/jira/browse/PARQUET/ ) issues and references them in the PR title. For example, " PARQUET-1234 : My Parquet PR" https://issues.apache.org/jira/browse/PARQUET-XXX In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy] ( https://www.apache.org/legal/resolved.html#category-x ). Tests [ ] My PR adds the following unit tests _ OR _ does not need testing for this extremely good reason: Commits [ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from " [How to write a good git commit message] ( http://chris.beams.io/posts/git-commit/ )": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" Documentation [ ] In case of new functionality, my PR adds documentation that describes how to use it. All the public functions and the classes in the PR contain Javadoc that explain what it does ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
            githubbot ASF GitHub Bot added a comment -

            wangyum closed pull request #857:
            URL: https://github.com/apache/parquet-mr/pull/857

            ----------------------------------------------------------------
            This is an automated message from the Apache Git Service.
            To respond to the message, please log on to GitHub and use the
            URL above to go to the specific comment.

            For queries about this service, please contact Infrastructure at:
            users@infra.apache.org

            githubbot ASF GitHub Bot added a comment - wangyum closed pull request #857: URL: https://github.com/apache/parquet-mr/pull/857 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
            githubbot ASF GitHub Bot added a comment -

            wangyum commented on pull request #857:
            URL: https://github.com/apache/parquet-mr/pull/857#issuecomment-764048376

            Sorry, this document outdate: http://mail-archives.apache.org/mod_mbox/parquet-dev/201906.mbox/%3CJIRA.13233926.1558083819000.446944.1560276180713@Atlassian.JIRA%3E

            ----------------------------------------------------------------
            This is an automated message from the Apache Git Service.
            To respond to the message, please log on to GitHub and use the
            URL above to go to the specific comment.

            For queries about this service, please contact Infrastructure at:
            users@infra.apache.org

            githubbot ASF GitHub Bot added a comment - wangyum commented on pull request #857: URL: https://github.com/apache/parquet-mr/pull/857#issuecomment-764048376 Sorry, this document outdate: http://mail-archives.apache.org/mod_mbox/parquet-dev/201906.mbox/%3CJIRA.13233926.1558083819000.446944.1560276180713@Atlassian.JIRA%3E ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
            yumwang Yuming Wang added a comment -

            We can disable parquet.page.write-checksum.enabled to workaround this issue:
            https://github.com/apache/spark/pull/26804#discussion_r561044576

            yumwang Yuming Wang added a comment - We can disable parquet.page.write-checksum.enabled to workaround this issue: https://github.com/apache/spark/pull/26804#discussion_r561044576
            githubbot ASF GitHub Bot added a comment -

            wangyum commented on pull request #857:
            URL: https://github.com/apache/parquet-mr/pull/857#issuecomment-764048376

            Sorry, this document outdate: http://mail-archives.apache.org/mod_mbox/parquet-dev/201906.mbox/%3CJIRA.13233926.1558083819000.446944.1560276180713@Atlassian.JIRA%3E

            ----------------------------------------------------------------
            This is an automated message from the Apache Git Service.
            To respond to the message, please log on to GitHub and use the
            URL above to go to the specific comment.

            For queries about this service, please contact Infrastructure at:
            users@infra.apache.org

            githubbot ASF GitHub Bot added a comment - wangyum commented on pull request #857: URL: https://github.com/apache/parquet-mr/pull/857#issuecomment-764048376 Sorry, this document outdate: http://mail-archives.apache.org/mod_mbox/parquet-dev/201906.mbox/%3CJIRA.13233926.1558083819000.446944.1560276180713@Atlassian.JIRA%3E ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
            githubbot ASF GitHub Bot added a comment -

            wangyum opened a new pull request #857:
            URL: https://github.com/apache/parquet-mr/pull/857

            Make sure you have checked all steps below.

                1. Jira
                1. Tests
            • [ ] My PR adds the following unit tests _OR_ does not need testing for this extremely good reason:
                1. Commits
            • [ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)":
              1. Subject is separated from body by a blank line
              1. Subject is limited to 50 characters (not including Jira issue reference)
              1. Subject does not end with a period
              1. Subject uses the imperative mood ("add", not "adding")
              1. Body wraps at 72 characters
              1. Body explains "what" and "why", not "how"
                1. Documentation
            • [ ] In case of new functionality, my PR adds documentation that describes how to use it.
            • All the public functions and the classes in the PR contain Javadoc that explain what it does

            ----------------------------------------------------------------
            This is an automated message from the Apache Git Service.
            To respond to the message, please log on to GitHub and use the
            URL above to go to the specific comment.

            For queries about this service, please contact Infrastructure at:
            users@infra.apache.org

            githubbot ASF GitHub Bot added a comment - wangyum opened a new pull request #857: URL: https://github.com/apache/parquet-mr/pull/857 Make sure you have checked all steps below. Jira [ ] My PR addresses the following [Parquet Jira] ( https://issues.apache.org/jira/browse/PARQUET/ ) issues and references them in the PR title. For example, " PARQUET-1234 : My Parquet PR" https://issues.apache.org/jira/browse/PARQUET-XXX In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy] ( https://www.apache.org/legal/resolved.html#category-x ). Tests [ ] My PR adds the following unit tests _ OR _ does not need testing for this extremely good reason: Commits [ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from " [How to write a good git commit message] ( http://chris.beams.io/posts/git-commit/ )": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" Documentation [ ] In case of new functionality, my PR adds documentation that describes how to use it. All the public functions and the classes in the PR contain Javadoc that explain what it does ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
            githubbot ASF GitHub Bot added a comment -

            wangyum closed pull request #857:
            URL: https://github.com/apache/parquet-mr/pull/857

            ----------------------------------------------------------------
            This is an automated message from the Apache Git Service.
            To respond to the message, please log on to GitHub and use the
            URL above to go to the specific comment.

            For queries about this service, please contact Infrastructure at:
            users@infra.apache.org

            githubbot ASF GitHub Bot added a comment - wangyum closed pull request #857: URL: https://github.com/apache/parquet-mr/pull/857 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
            githubbot ASF GitHub Bot added a comment -

            wangyum commented on pull request #857:
            URL: https://github.com/apache/parquet-mr/pull/857#issuecomment-767180694

            @bbraams I read a outdate document: http://mail-archives.apache.org/mod_mbox/parquet-dev/201906.mbox/%3CJIRA.13233926.1558083819000.446944.1560276180713@Atlassian.JIRA%3E

            Both read and write are false:
            ```

                1. Documentation
                  The feature is feature flagged and is disabled by default. Both writing out checksums and
                  verifying them on the read path can be turned on individually, via the following two new config
                  flags:
            • `parquet.page.write-checksum.enabled` (default: false)
            • `parquet.page.verify-checksum.enabled` (default: false)
              ```

            ----------------------------------------------------------------
            This is an automated message from the Apache Git Service.
            To respond to the message, please log on to GitHub and use the
            URL above to go to the specific comment.

            For queries about this service, please contact Infrastructure at:
            users@infra.apache.org

            githubbot ASF GitHub Bot added a comment - wangyum commented on pull request #857: URL: https://github.com/apache/parquet-mr/pull/857#issuecomment-767180694 @bbraams I read a outdate document: http://mail-archives.apache.org/mod_mbox/parquet-dev/201906.mbox/%3CJIRA.13233926.1558083819000.446944.1560276180713@Atlassian.JIRA%3E Both read and write are false: ``` Documentation The feature is feature flagged and is disabled by default. Both writing out checksums and verifying them on the read path can be turned on individually, via the following two new config flags: `parquet.page.write-checksum.enabled` (default: false) `parquet.page.verify-checksum.enabled` (default: false) ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
            githubbot ASF GitHub Bot added a comment -

            wangyum commented on pull request #857:
            URL: https://github.com/apache/parquet-mr/pull/857#issuecomment-767180694

            @bbraams I read a outdate document: http://mail-archives.apache.org/mod_mbox/parquet-dev/201906.mbox/%3CJIRA.13233926.1558083819000.446944.1560276180713@Atlassian.JIRA%3E

            Both read and write are false:
            ```

                1. Documentation
                  The feature is feature flagged and is disabled by default. Both writing out checksums and
                  verifying them on the read path can be turned on individually, via the following two new config
                  flags:
            • `parquet.page.write-checksum.enabled` (default: false)
            • `parquet.page.verify-checksum.enabled` (default: false)
              ```

            ----------------------------------------------------------------
            This is an automated message from the Apache Git Service.
            To respond to the message, please log on to GitHub and use the
            URL above to go to the specific comment.

            For queries about this service, please contact Infrastructure at:
            users@infra.apache.org

            githubbot ASF GitHub Bot added a comment - wangyum commented on pull request #857: URL: https://github.com/apache/parquet-mr/pull/857#issuecomment-767180694 @bbraams I read a outdate document: http://mail-archives.apache.org/mod_mbox/parquet-dev/201906.mbox/%3CJIRA.13233926.1558083819000.446944.1560276180713@Atlassian.JIRA%3E Both read and write are false: ``` Documentation The feature is feature flagged and is disabled by default. Both writing out checksums and verifying them on the read path can be turned on individually, via the following two new config flags: `parquet.page.write-checksum.enabled` (default: false) `parquet.page.verify-checksum.enabled` (default: false) ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
            rok Rok Mihevc added a comment -

            This issue has been migrated to issue #2422 on GitHub. Please see the migration documentation for further details.

            rok Rok Mihevc added a comment - This issue has been migrated to issue #2422 on GitHub. Please see the migration documentation for further details.

            People

              Unassigned Unassigned
              yumwang Yuming Wang
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: