Uploaded image for project: 'ORC'
  1. ORC
  2. ORC-645

Timestamp between -1 and 0 seconds from UNIX epoch changes after write-read

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      When writing timestamp between -1 seconds and 0 seconds from UNIX epoch like this:

      TypeDescription schema =
        TypeDescription.fromString("struct<x:timestamp>");
      Writer writer = OrcFile.createWriter(new Path("time-file.orc"),
                                           OrcFile.writerOptions(new Configuration())
                                            .setSchema(schema));
      VectorizedRowBatch batch = schema.createRowBatch();
      TimestampColumnVector x = (TimestampColumnVector) batch.cols[0];
      int row = batch.size++;
      
      // This is supposed to be 1969-12-31 23:59:59.762
      x.set(row, new Timestamp(-238L));
      
      writer.addRowBatch(batch);
      writer.close();
      

      And reading it back with pyarrow.orc

      import pyarrow.orc as orc
      pdf = orc.ORCFile("time-file.orc").read().to_pandas()
      print(pdf)
      

      I get:

                                    x
      0 1970-01-01 00:00:00.762000000
      

       

      This is probably because of special handling of negative timestamps here:

      https://github.com/apache/orc/blob/fa9c011e13e8376d2a185bd76af834bd644f4332/java/core/src/java/org/apache/orc/impl/TreeReaderFactory.java#L1221-L1227

      Millis will not be < 0 in this particular case so it will not be reduced by MILLIS_PER_SECOND.

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              devavret Devavret Makkar
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: