[PARQUET-1746] Changed the data order after DataFrame reuse - ASF JIRA

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Not A Problem
Affects Version/s: 1.11.0
Fix Version/s: None
Component/s: parquet-mr
Labels:
None

Description

How to reproduce:

git clone https://github.com/apache/spark.git && cd spark
git fetch origin pull/26804/head:PARQUET-1746
git checkout PARQUET-1746
build/sbt "sql/test-only *StreamSuite"

output:

sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: 
Decoded objects do not match expected objects:
expected: WrappedArray(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
actual:   WrappedArray(0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 2)
assertnotnull(upcast(getcolumnbyordinal(0, LongType), LongType, - root class: "scala.Long"))
+- upcast(getcolumnbyordinal(0, LongType), LongType, - root class: "scala.Long")
   +- getcolumnbyordinal(0, LongType)

         
	at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530)
	at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529)
	at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
	at org.scalatest.Assertions.fail(Assertions.scala:1091)
	at org.scalatest.Assertions.fail$(Assertions.scala:1087)
	at org.scalatest.FunSuite.fail(FunSuite.scala:1560)
	at org.apache.spark.sql.QueryTest.checkDataset(QueryTest.scala:73)
	at org.apache.spark.sql.streaming.StreamSuite.$anonfun$new$22(StreamSuite.scala:215)
	at org.apache.spark.sql.streaming.StreamSuite.$anonfun$new$22$adapted(StreamSuite.scala:208)
	at org.apache.spark.sql.test.SQLTestUtils.$anonfun$withTempDir$1(SQLTestUtils.scala:76)
	at org.apache.spark.sql.test.SQLTestUtils.$anonfun$withTempDir$1$adapted(SQLTestUtils.scala:75)
	at org.apache.spark.SparkFunSuite.withTempDir(SparkFunSuite.scala:161)
	at org.apache.spark.sql.streaming.StreamSuite.org$apache$spark$sql$test$SQLTestUtils$$super$withTempDir(StreamSuite.scala:51)
	at org.apache.spark.sql.test.SQLTestUtils.withTempDir(SQLTestUtils.scala:75)
	at org.apache.spark.sql.test.SQLTestUtils.withTempDir$(SQLTestUtils.scala:74)
	at org.apache.spark.sql.streaming.StreamSuite.withTempDir(StreamSuite.scala:51)
	at org.apache.spark.sql.streaming.StreamSuite.$anonfun$new$21(StreamSuite.scala:208)
	at org.apache.spark.sql.streaming.StreamSuite.$anonfun$new$21$adapted(StreamSuite.scala:207)
	at org.apache.spark.sql.test.SQLTestUtils.$anonfun$withTempDir$1(SQLTestUtils.scala:76)
	at org.apache.spark.sql.test.SQLTestUtils.$anonfun$withTempDir$1$adapted(SQLTestUtils.scala:75)
	at org.apache.spark.SparkFunSuite.withTempDir(SparkFunSuite.scala:161)
	at org.apache.spark.sql.streaming.StreamSuite.org$apache$spark$sql$test$SQLTestUtils$$super$withTempDir(StreamSuite.scala:51)
	at org.apache.spark.sql.test.SQLTestUtils.withTempDir(SQLTestUtils.scala:75)
	at org.apache.spark.sql.test.SQLTestUtils.withTempDir$(SQLTestUtils.scala:74)
	at org.apache.spark.sql.streaming.StreamSuite.withTempDir(StreamSuite.scala:51)
	at org.apache.spark.sql.streaming.StreamSuite.assertDF$1(StreamSuite.scala:207)
	at org.apache.spark.sql.streaming.StreamSuite.$anonfun$new$25(StreamSuite.scala:226)
	at org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf(SQLHelper.scala:52)
	at org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf$(SQLHelper.scala:36)
	at org.apache.spark.sql.streaming.StreamSuite.org$apache$spark$sql$test$SQLTestUtilsBase$$super$withSQLConf(StreamSuite.scala:51)
	at org.apache.spark.sql.test.SQLTestUtilsBase.withSQLConf(SQLTestUtils.scala:231)
	at org.apache.spark.sql.test.SQLTestUtilsBase.withSQLConf$(SQLTestUtils.scala:229)
	at org.apache.spark.sql.streaming.StreamSuite.withSQLConf(StreamSuite.scala:51)
	at org.apache.spark.sql.streaming.StreamSuite.$anonfun$new$24(StreamSuite.scala:225)
	at org.apache.spark.sql.streaming.StreamSuite.$anonfun$new$24$adapted(StreamSuite.scala:224)
	at scala.collection.immutable.List.foreach(List.scala:392)
	at org.apache.spark.sql.streaming.StreamSuite.$anonfun$new$20(StreamSuite.scala:224)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
	at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
	at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
	at org.scalatest.Transformer.apply(Transformer.scala:22)
	at org.scalatest.Transformer.apply(Transformer.scala:20)
	at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
	at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:149)
	at org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184)
	at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196)
	at org.scalatest.SuperEngine.runTestImpl(Engine.scala:286)
	at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196)
	at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178)
	at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:56)
	at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:221)
	at org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:214)
	at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:56)
	at org.scalatest.FunSuiteLike.$anonfun$runTests$1(FunSuiteLike.scala:229)
	at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:393)
	at scala.collection.immutable.List.foreach(List.scala:392)
	at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:381)
	at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:376)
	at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:458)
	at org.scalatest.FunSuiteLike.runTests(FunSuiteLike.scala:229)
	at org.scalatest.FunSuiteLike.runTests$(FunSuiteLike.scala:228)
	at org.scalatest.FunSuite.runTests(FunSuite.scala:1560)
	at org.scalatest.Suite.run(Suite.scala:1124)
	at org.scalatest.Suite.run$(Suite.scala:1106)
	at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560)
	at org.scalatest.FunSuiteLike.$anonfun$run$1(FunSuiteLike.scala:233)
	at org.scalatest.SuperEngine.runImpl(Engine.scala:518)
	at org.scalatest.FunSuiteLike.run(FunSuiteLike.scala:233)
	at org.scalatest.FunSuiteLike.run$(FunSuiteLike.scala:232)
	at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:56)
	at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
	at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
	at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
	at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:56)
	at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317)
	at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510)
	at sbt.ForkMain$Run$2.call(ForkMain.java:296)
	at sbt.ForkMain$Run$2.call(ForkMain.java:286)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Attachments

Issue Links

blocks

PARQUET-1774 Release parquet 1.11.1

Resolved

is caused by

PARQUET-1580 Page-level CRC checksum verification for DataPageV1

Resolved

Activity

Ascending order - Click to sort in descending order

Gabor Szadovszky added a comment - 09/Jan/20 14:22

What exactly is reordered here? If it is a list in the parquet schema then the order shall not change and it is indeed a serious issue. However, I cannot see a reason how it could happen. Could you explain in more details from the parquet point of view?

Gabor Szadovszky added a comment - 09/Jan/20 14:22 What exactly is reordered here? If it is a list in the parquet schema then the order shall not change and it is indeed a serious issue. However, I cannot see a reason how it could happen. Could you explain in more details from the parquet point of view?

Yuming Wang added a comment - 11/Jan/20 04:16

It seems 1.12.0-SNAPSHOT works.

Yuming Wang added a comment - 11/Jan/20 04:16 It seems 1.12.0-SNAPSHOT works.

Gabor Szadovszky added a comment - 15/Jan/20 08:55

For me the issue is reproducible with the current parquet-mr master (1.12.0-SNAPSHOT).

Gabor Szadovszky added a comment - 15/Jan/20 08:55 For me the issue is reproducible with the current parquet-mr master (1.12.0-SNAPSHOT).

Gabor Szadovszky added a comment - 20/Jan/20 14:14

The related Spark test generates 22 parquet files. The first 11 are empty meaning no data is in them. (I am not sure if they are even valid this way.)

The last 11 contains only 1 value in each:

$> ls *.parquet| while read file; do echo "$file"; parquet-tools cat $file 2>/dev/null; done
part-00000-19f5b358-410b-4dd4-b167-4016984ac6ef-c000.snappy.parquet
part-00000-212d052b-d03a-413b-98f3-1348c2d06855-c000.snappy.parquet
part-00000-311f4442-4225-47f1-aaf1-c7a8e38a875f-c000.snappy.parquet
part-00000-459612f9-d564-43a9-bf31-2d174c996fa6-c000.snappy.parquet
part-00000-5e20cfa6-a5d0-4d5f-a382-741907a74874-c000.snappy.parquet
part-00000-62881d28-7226-4a78-9fe7-2ed41b895e1c-c000.snappy.parquet
part-00000-9aaa784f-080a-43ae-9296-20bd033aa300-c000.snappy.parquet
part-00000-a01e81ab-a987-4929-991d-60f01acab1ca-c000.snappy.parquet
part-00000-add0de8e-26eb-406b-bf02-702924f89f1a-c000.snappy.parquet
part-00000-e8dd315d-b97e-4257-917c-34696d0a866c-c000.snappy.parquet
part-00000-ed8be0d2-508f-4666-b66f-93182413472e-c000.snappy.parquet
part-00001-20b63b66-8f9a-4e3b-893c-4acb106ddac1-c000.snappy.parquet
a = 7

part-00001-227ff83d-5341-48be-97be-00cde92cb303-c000.snappy.parquet
a = 1

part-00001-38e186bb-ca67-4e3d-87fe-780585f25c84-c000.snappy.parquet
a = 0

part-00001-3b06880b-6d57-49d7-bb63-4220092ef1ae-c000.snappy.parquet
a = 4

part-00001-449026a6-f486-4fca-81fa-b7cdeaddfa3b-c000.snappy.parquet
a = 5

part-00001-567ed849-b1e9-494f-b33f-495592826b28-c000.snappy.parquet
a = 2

part-00001-70fa8c7e-9b45-4103-a99e-5b0f61b6062a-c000.snappy.parquet
a = 10

part-00001-7399d477-c393-481b-b76f-1289deb72bc0-c000.snappy.parquet
a = 3

part-00001-93678ef9-27d4-4a5d-aaa1-58492de248e7-c000.snappy.parquet
a = 6

part-00001-c1b934d8-0058-40e0-87f9-40ee7eca52ed-c000.snappy.parquet
a = 8

part-00001-c599dd4d-32c8-4032-935a-b1d45bc508e1-c000.snappy.parquet
a = 9

This way the parquet-mr library has nothing to do with the ordering of these values.

Gabor Szadovszky added a comment - 20/Jan/20 14:14 The related Spark test generates 22 parquet files. The first 11 are empty meaning no data is in them. (I am not sure if they are even valid this way.) The last 11 contains only 1 value in each: $> ls *.parquet| while read file; do echo "$file"; parquet-tools cat $file 2>/dev/null; done part-00000-19f5b358-410b-4dd4-b167-4016984ac6ef-c000.snappy.parquet part-00000-212d052b-d03a-413b-98f3-1348c2d06855-c000.snappy.parquet part-00000-311f4442-4225-47f1-aaf1-c7a8e38a875f-c000.snappy.parquet part-00000-459612f9-d564-43a9-bf31-2d174c996fa6-c000.snappy.parquet part-00000-5e20cfa6-a5d0-4d5f-a382-741907a74874-c000.snappy.parquet part-00000-62881d28-7226-4a78-9fe7-2ed41b895e1c-c000.snappy.parquet part-00000-9aaa784f-080a-43ae-9296-20bd033aa300-c000.snappy.parquet part-00000-a01e81ab-a987-4929-991d-60f01acab1ca-c000.snappy.parquet part-00000-add0de8e-26eb-406b-bf02-702924f89f1a-c000.snappy.parquet part-00000-e8dd315d-b97e-4257-917c-34696d0a866c-c000.snappy.parquet part-00000-ed8be0d2-508f-4666-b66f-93182413472e-c000.snappy.parquet part-00001-20b63b66-8f9a-4e3b-893c-4acb106ddac1-c000.snappy.parquet a = 7 part-00001-227ff83d-5341-48be-97be-00cde92cb303-c000.snappy.parquet a = 1 part-00001-38e186bb-ca67-4e3d-87fe-780585f25c84-c000.snappy.parquet a = 0 part-00001-3b06880b-6d57-49d7-bb63-4220092ef1ae-c000.snappy.parquet a = 4 part-00001-449026a6-f486-4fca-81fa-b7cdeaddfa3b-c000.snappy.parquet a = 5 part-00001-567ed849-b1e9-494f-b33f-495592826b28-c000.snappy.parquet a = 2 part-00001-70fa8c7e-9b45-4103-a99e-5b0f61b6062a-c000.snappy.parquet a = 10 part-00001-7399d477-c393-481b-b76f-1289deb72bc0-c000.snappy.parquet a = 3 part-00001-93678ef9-27d4-4a5d-aaa1-58492de248e7-c000.snappy.parquet a = 6 part-00001-c1b934d8-0058-40e0-87f9-40ee7eca52ed-c000.snappy.parquet a = 8 part-00001-c599dd4d-32c8-4032-935a-b1d45bc508e1-c000.snappy.parquet a = 9 This way the parquet-mr library has nothing to do with the ordering of these values.

Yuming Wang added a comment - 20/Jan/21 16:00 - edited

We can disable parquet.page.write-checksum.enabled to workaround this issue:
https://github.com/apache/spark/pull/26804#discussion_r561044576

Yuming Wang added a comment - 20/Jan/21 16:00 - edited We can disable parquet.page.write-checksum.enabled to workaround this issue: https://github.com/apache/spark/pull/26804#discussion_r561044576

ASF GitHub Bot added a comment - 20/Jan/21 23:46

wangyum opened a new pull request #857:
URL: https://github.com/apache/parquet-mr/pull/857

Make sure you have checked all steps below.

1. 1. Jira

[ ] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in the PR title. For example, "~~PARQUET-1234~~: My Parquet PR"
https://issues.apache.org/jira/browse/PARQUET-XXX
In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).

1. 1. Tests

[ ] My PR adds the following unit tests _OR_ does not need testing for this extremely good reason:

1. 1. Commits

[ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)":
1. Subject is separated from body by a blank line
1. Subject is limited to 50 characters (not including Jira issue reference)
1. Subject does not end with a period
1. Subject uses the imperative mood ("add", not "adding")
1. Body wraps at 72 characters
1. Body explains "what" and "why", not "how"

1. 1. Documentation

[ ] In case of new functionality, my PR adds documentation that describes how to use it.
All the public functions and the classes in the PR contain Javadoc that explain what it does

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

ASF GitHub Bot added a comment - 20/Jan/21 23:46 wangyum opened a new pull request #857: URL: https://github.com/apache/parquet-mr/pull/857 Make sure you have checked all steps below. Jira [ ] My PR addresses the following [Parquet Jira] ( https://issues.apache.org/jira/browse/PARQUET/ ) issues and references them in the PR title. For example, " PARQUET-1234 : My Parquet PR" https://issues.apache.org/jira/browse/PARQUET-XXX In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy] ( https://www.apache.org/legal/resolved.html#category-x ). Tests [ ] My PR adds the following unit tests _ OR _ does not need testing for this extremely good reason: Commits [ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from " [How to write a good git commit message] ( http://chris.beams.io/posts/git-commit/ )": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" Documentation [ ] In case of new functionality, my PR adds documentation that describes how to use it. All the public functions and the classes in the PR contain Javadoc that explain what it does ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org

ASF GitHub Bot added a comment - 21/Jan/21 00:03

wangyum closed pull request #857:
URL: https://github.com/apache/parquet-mr/pull/857

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

ASF GitHub Bot added a comment - 21/Jan/21 00:03 wangyum closed pull request #857: URL: https://github.com/apache/parquet-mr/pull/857 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org

ASF GitHub Bot added a comment - 21/Jan/21 00:03

wangyum commented on pull request #857:
URL: https://github.com/apache/parquet-mr/pull/857#issuecomment-764048376

Sorry, this document outdate: http://mail-archives.apache.org/mod_mbox/parquet-dev/201906.mbox/%3CJIRA.13233926.1558083819000.446944.1560276180713@Atlassian.JIRA%3E

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

ASF GitHub Bot added a comment - 21/Jan/21 00:03 wangyum commented on pull request #857: URL: https://github.com/apache/parquet-mr/pull/857#issuecomment-764048376 Sorry, this document outdate: http://mail-archives.apache.org/mod_mbox/parquet-dev/201906.mbox/%3CJIRA.13233926.1558083819000.446944.1560276180713@Atlassian.JIRA%3E ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org

Yuming Wang added a comment - 21/Jan/21 00:54

We can disable parquet.page.write-checksum.enabled to workaround this issue:
https://github.com/apache/spark/pull/26804#discussion_r561044576

Yuming Wang added a comment - 21/Jan/21 00:54 We can disable parquet.page.write-checksum.enabled to workaround this issue: https://github.com/apache/spark/pull/26804#discussion_r561044576

ASF GitHub Bot added a comment - 22/Jan/21 05:40

wangyum commented on pull request #857:
URL: https://github.com/apache/parquet-mr/pull/857#issuecomment-764048376

Sorry, this document outdate: http://mail-archives.apache.org/mod_mbox/parquet-dev/201906.mbox/%3CJIRA.13233926.1558083819000.446944.1560276180713@Atlassian.JIRA%3E

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

ASF GitHub Bot added a comment - 22/Jan/21 05:40 wangyum commented on pull request #857: URL: https://github.com/apache/parquet-mr/pull/857#issuecomment-764048376 Sorry, this document outdate: http://mail-archives.apache.org/mod_mbox/parquet-dev/201906.mbox/%3CJIRA.13233926.1558083819000.446944.1560276180713@Atlassian.JIRA%3E ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org

ASF GitHub Bot added a comment - 22/Jan/21 06:24

wangyum opened a new pull request #857:
URL: https://github.com/apache/parquet-mr/pull/857

Make sure you have checked all steps below.

1. 1. Jira

[ ] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in the PR title. For example, "~~PARQUET-1234~~: My Parquet PR"
https://issues.apache.org/jira/browse/PARQUET-XXX
In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).

1. 1. Tests

[ ] My PR adds the following unit tests _OR_ does not need testing for this extremely good reason:

1. 1. Commits

[ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)":
1. Subject is separated from body by a blank line
1. Subject is limited to 50 characters (not including Jira issue reference)
1. Subject does not end with a period
1. Subject uses the imperative mood ("add", not "adding")
1. Body wraps at 72 characters
1. Body explains "what" and "why", not "how"

1. 1. Documentation

[ ] In case of new functionality, my PR adds documentation that describes how to use it.
All the public functions and the classes in the PR contain Javadoc that explain what it does

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

ASF GitHub Bot added a comment - 22/Jan/21 06:24 wangyum opened a new pull request #857: URL: https://github.com/apache/parquet-mr/pull/857 Make sure you have checked all steps below. Jira [ ] My PR addresses the following [Parquet Jira] ( https://issues.apache.org/jira/browse/PARQUET/ ) issues and references them in the PR title. For example, " PARQUET-1234 : My Parquet PR" https://issues.apache.org/jira/browse/PARQUET-XXX In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy] ( https://www.apache.org/legal/resolved.html#category-x ). Tests [ ] My PR adds the following unit tests _ OR _ does not need testing for this extremely good reason: Commits [ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from " [How to write a good git commit message] ( http://chris.beams.io/posts/git-commit/ )": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" Documentation [ ] In case of new functionality, my PR adds documentation that describes how to use it. All the public functions and the classes in the PR contain Javadoc that explain what it does ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org

ASF GitHub Bot added a comment - 22/Jan/21 06:36

wangyum closed pull request #857:
URL: https://github.com/apache/parquet-mr/pull/857

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

ASF GitHub Bot added a comment - 22/Jan/21 06:36 wangyum closed pull request #857: URL: https://github.com/apache/parquet-mr/pull/857 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org

ASF GitHub Bot added a comment - 25/Jan/21 23:33

wangyum commented on pull request #857:
URL: https://github.com/apache/parquet-mr/pull/857#issuecomment-767180694

@bbraams I read a outdate document: http://mail-archives.apache.org/mod_mbox/parquet-dev/201906.mbox/%3CJIRA.13233926.1558083819000.446944.1560276180713@Atlassian.JIRA%3E

Both read and write are false:
```

1. 1. Documentation
    The feature is feature flagged and is disabled by default. Both writing out checksums and
    verifying them on the read path can be turned on individually, via the following two new config
    flags:

`parquet.page.write-checksum.enabled` (default: false)
`parquet.page.verify-checksum.enabled` (default: false)
```

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

ASF GitHub Bot added a comment - 25/Jan/21 23:33 wangyum commented on pull request #857: URL: https://github.com/apache/parquet-mr/pull/857#issuecomment-767180694 @bbraams I read a outdate document: http://mail-archives.apache.org/mod_mbox/parquet-dev/201906.mbox/%3CJIRA.13233926.1558083819000.446944.1560276180713@Atlassian.JIRA%3E Both read and write are false: ``` Documentation The feature is feature flagged and is disabled by default. Both writing out checksums and verifying them on the read path can be turned on individually, via the following two new config flags: `parquet.page.write-checksum.enabled` (default: false) `parquet.page.verify-checksum.enabled` (default: false) ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org

ASF GitHub Bot added a comment - 26/Jan/21 04:28

wangyum commented on pull request #857:
URL: https://github.com/apache/parquet-mr/pull/857#issuecomment-767180694

@bbraams I read a outdate document: http://mail-archives.apache.org/mod_mbox/parquet-dev/201906.mbox/%3CJIRA.13233926.1558083819000.446944.1560276180713@Atlassian.JIRA%3E

Both read and write are false:
```

1. 1. Documentation
    The feature is feature flagged and is disabled by default. Both writing out checksums and
    verifying them on the read path can be turned on individually, via the following two new config
    flags:

`parquet.page.write-checksum.enabled` (default: false)
`parquet.page.verify-checksum.enabled` (default: false)
```

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

ASF GitHub Bot added a comment - 26/Jan/21 04:28 wangyum commented on pull request #857: URL: https://github.com/apache/parquet-mr/pull/857#issuecomment-767180694 @bbraams I read a outdate document: http://mail-archives.apache.org/mod_mbox/parquet-dev/201906.mbox/%3CJIRA.13233926.1558083819000.446944.1560276180713@Atlassian.JIRA%3E Both read and write are false: ``` Documentation The feature is feature flagged and is disabled by default. Both writing out checksums and verifying them on the read path can be turned on individually, via the following two new config flags: `parquet.page.write-checksum.enabled` (default: false) `parquet.page.verify-checksum.enabled` (default: false) ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org

Rok Mihevc added a comment - 23/Jun/24 03:31

This issue has been migrated to issue #2422 on GitHub. Please see the migration documentation for further details.

Rok Mihevc added a comment - 23/Jun/24 03:31 This issue has been migrated to issue #2422 on GitHub. Please see the migration documentation for further details.

People

Assignee:: Unassigned

Reporter:: Yuming Wang

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 09/Jan/20 05:23

Updated:: 23/Jun/24 03:31

Resolved:: 20/Jan/20 14:14