Uploaded image for project: 'ORC'
  1. ORC
  2. ORC-1754

Inconsistent parsing of large timestamps

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.1.0
    • None
    • C++, Java
    • None

    Description

      1.

      When reading https://softwareheritage.s3.amazonaws.com/graph/2024-05-16/orc/revision/revision-dd3b23d4-baa8-48c9-8e22-22e65747a67b.orc:

      Java: 54798848-02-22 03:08:48.0

      $ java -jar java/tools/orc-tools-2.1.0-SNAPSHOT-uber.jar data revision-17bb49bf-e9e4-48b0-9fbb-94fab8116ba3.orc | grep -a 0f3d534237885f58cb336b112ea5af5aaa98f50f
      [main] WARN
       org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
      Processing data file /srv/softwareheritage/ssd/data/vlorentz/datasets/2024-05-16/orc/revision/revision-17bb49bf-e9e4-48b0-9fbb-94fab8116ba3.orc [length: 4211676881]
      
      {"id":"0f3d534237885f58cb336b112ea5af5aaa98f50f","message":[55,50,48,53,55,53,57,52,48,51,55,57,50,55,57,51,53],"author":[146,112,134,90,24,255,249,220,192,75,57,71,34,58,209,120,99,150,89,242,82,160,236,151,3,92,229,36,207,251,193,34],"date":"54798848-02-22 03:08:48.0","date_offset":0,"date_raw_offset_bytes":[45,48,48,48,48],"committer":[146,112,134,90,24,255,249,220,192,75,57,71,34,58,209,120,99,150,89,242,82,160,236,151,3,92,229,36,207,251,193,34],"committer_date":"54798848-02-22 03:08:48.0","committer_offset":0,"committer_date_raw_offset_bytes":[45,48,48,48,48],"directory":"756fbb086f07dc00cb51c9ae8cb9054e38b012e4","type":"git","raw_manifest":null}
      

      C++: -2011551072-05-31 3.0 (note it's a negative year)

      $ ./tools/src/orc-contents revision-17bb49bf-e9e4-48b0-9fbb-94fab8116ba3.orc --columnNames id,date | grep -a 0f3d534237885f58cb336b112ea5af5aaa98f50f
      
      {"id": "0f3d534237885f58cb336b112ea5af5aaa98f50f", "date": "-2011551072-05-31 3.0"}
      

      (and for comparison, orc-rust reads 72057594037927935 seconds since unix epoch)

      2.

      When reading https://softwareheritage.s3.amazonaws.com/graph/2024-05-16/orc/revision/revision-0c45576a-59f7-48d1-a9a8-2e5c64098905.orc:

      Java: 1969-12-31 23:59:59.0

      $ java -jar java/tools/orc-tools-2.1.0-SNAPSHOT-uber.jar data /revision-0c45576a-59f7-48d1-a9a8-2e5c64098905.orc | grep -a 114514a0c8259cd395b4e16acd78a89c2e317f5f
      [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
      Processing data file /srv/softwareheritage/ssd/data/vlorentz/datasets/2024-05-16/orc/revision/revision-0c45576a-59f7-48d1-a9a8-2e5c64098905.orc [length: 4173056395]
      
      {"id":"114514a0c8259cd395b4e16acd78a89c2e317f5f","message":[226,128,139,10],"author":[51,46,124,25,190,5,4,86,248,30,103,45,127,192,195,26,40,109,46,255,255,148,71,50,157,196,97,226,145,134,70,42],"date":"1969-12-31 23:59:59.0","date_offset":0,"date_raw_offset_bytes":[43,48,48,48,48],"committer":[51,46,124,25,190,5,4,86,248,30,103,45,127,192,195,26,40,109,46,255,255,148,71,50,157,196,97,226,145,134,70,42],"committer_date":"1969-12-31 23:59:59.0","committer_offset":0,"committer_date_raw_offset_bytes":[43,48,48,48,48],"directory":"0150c5877b0b1872b5c9e9bc06fdae2da8d087d8","type":"git","raw_manifest":null}
      

      C++: 219250468-04-01 15:.0

      $ ./tools/src/orc-contents revision-0c45576a-59f7-48d1-a9a8-2e5c64098905.orc --columnNames id,date | grep -a 114514a0c8259cd395b4e16acd78a89c2e317f5f
      
      {"id": "114514a0c8259cd395b4e16acd78a89c2e317f5f", "date": "219250468-04-01 15:.0"}
      

      (and for comparison, orc-rust reads 72057594037927935 seconds)

      3.

      When reading https://softwareheritage.s3.amazonaws.com/graph/2024-05-16/orc/revision/revision-dd3b23d4-baa8-48c9-8e22-22e65747a67b.orc:

      Java: 122821817-02-07 14:30:11.0

      $ java -jar java/tools/orc-tools-2.1.0-SNAPSHOT-uber.jar data revision-dd3b23d4-baa8-48c9-8e22-22e65747a67b.orc | grep -a b36a8bbcc8816993d55b69db4e0f74bcf1086243
      [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
      Processing data file /srv/softwareheritage/ssd/data/vlorentz/datasets/2024-05-16/orc/revision/revision-dd3b23d4-baa8-48c9-8e22-22e65747a67b.orc [length: 4170501916]
      
      {"id":"b36a8bbcc8816993d55b69db4e0f74bcf1086243","message":[73,32,99,111,109,101,32,102,114,111,109,32,116,104,101,32,102,117,116,117,114,101,44,32,119,97,116,99,104,32,111,117,116,32,102,111,114,32,102,114,111,103,115,33,33,10],"author":[110,182,117,195,11,144,5,13,255,194,120,63,34,37,190,241,91,85,98,247,119,116,175,16,57,25,130,209,18,22,227,174],"date":"122821817-02-07 14:30:11.0","date_offset":60,"date_raw_offset_bytes":[43,48,49,48,48],"committer":[110,182,117,195,11,144,5,13,255,194,120,63,34,37,190,241,91,85,98,247,119,116,175,16,57,25,130,209,18,22,227,174],"committer_date":"122821817-02-07 14:30:11.0","committer_offset":60,"committer_date_raw_offset_bytes":[43,48,49,48,48],"directory":"9bfbc4ca2661b45dfe5e39092e027280c829d772","type":"git","raw_manifest":null}
      

      C++: 1623969404-47-32566.0 (whatever this means)

      $ ./tools/src/orc-contents revision-dd3b23d4-baa8-48c9-8e22-22e65747a67b.orc --columnNames id,date | grep -a b36a8bbcc8816993d55b69db4e0f74bcf1086243
      
      {"id": "b36a8bbcc8816993d55b69db4e0f74bcf1086243", "date": "1623969404-47-32566.0"}
      

      (and for comparison, orc-rust reads 999999999999999999 seconds; thought it might be a bug due to going through a Decimal type)

      Attachments

        Activity

          People

            Unassigned Unassigned
            progval Valentin Lorentz
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: