Uploaded image for project: 'ORC'
  1. ORC
  2. ORC-468

Fix incorrect documentation for nanoseconds stream encoding

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Trivial
    • Resolution: Unresolved
    • None
    • None
    • documentation
    • None

    Description

      According to ORC spec doc, "1000 nanoseconds would be serialized as 0x0b and 100000 would be serialized as 0x0d."
      However, the actual encoding result are: formatNano(1000) = 0x0a and formatNano(100000) = 0x0c.

      How about changing the document as below?

      "Because the number of nanoseconds often has a large number of trailing zeros, the number has trailing decimal zero digits removed and the last three bits are used to record how many zeros were removed if the trailing zeros are more than 2. Thus 1000 nanoseconds would be serialized as 0x0a and 100000 would be serialized as 0x0c."

      Below is my test and result to confirm nanoseconds encodings.

       

      // this is the ORC's serialization code in ColumnWriter.cc, ORC encodes nanoseconds by this function.
      // https://github.com/apache/orc/blob/master/c%2B%2B/src/ColumnWriter.cc#L1669
      static int64_t formatNano(int64_t nanos) {
       if (nanos == 0) {
       return 0;
       }
       else if (nanos % 100 != 0) {
       return (nanos) << 3;
       }
       else {
       nanos /= 100;
       int64_t trailingZeros = 1;
       while (nanos % 10 == 0 && trailingZeros < 7) {
       nanos /= 10;
       trailingZeros += 1;
       }
       return (nanos) << 3 | trailingZeros;
       }
      }
      void main()
      {
       for (int nano = 1; nano <= 1000000; nano *= 10) {
       printf("formatNano(%d) = 0x%02x\n", nano, formatNano(nano));
       }
      }
      

       

      The result:

      formatNano(1) = 0x08
      formatNano(10) = 0x50
      formatNano(100) = 0x09
      formatNano(1000) = 0x0a
      formatNano(10000) = 0x0b
      formatNano(100000) = 0x0c
      formatNano(1000000) = 0x0d

      Attachments

        Activity

          People

            Unassigned Unassigned
            Kova Tadahito Kobayashi
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: