Hive
  1. Hive
  2. HIVE-3874

Create a new Optimized Row Columnar file format for Hive

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.11.0
    • Labels:
      None

      Description

      There are several limitations of the current RC File format that I'd like to address by creating a new format:

      • each column value is stored as a binary blob, which means:
        • the entire column value must be read, decompressed, and deserialized
        • the file format can't use smarter type-specific compression
        • push down filters can't be evaluated
      • the start of each row group needs to be found by scanning
      • user metadata can only be added to the file when the file is created
      • the file doesn't store the number of rows per a file or row group
      • there is no mechanism for seeking to a particular row number, which is required for external indexes.
      • there is no mechanism for storing light weight indexes within the file to enable push-down filters to skip entire row groups.
      • the type of the rows aren't stored in the file
      1. HIVE-3874.D8529.4.patch
        745 kB
        Phabricator
      2. HIVE-3874.D8871.1.patch
        12 kB
        Phabricator
      3. HIVE-3874.D8529.3.patch
        741 kB
        Phabricator
      4. HIVE-3874.D8529.2.patch
        740 kB
        Phabricator
      5. HIVE-3874.D8529.1.patch
        735 kB
        Phabricator
      6. hive.3874.2.patch
        666 kB
        Namit Jain
      7. orc.tgz
        49 kB
        Owen O'Malley
      8. OrcFileIntro.pptx
        1.10 MB
        Owen O'Malley

        Issue Links

          Activity

          Hide
          Namit Jain added a comment -

          Owen, will it also be more compact overall ?
          Will it be possible to perform dictionary encoding for a string ? Some initial experiments we performed on top of RCFile suggested that
          dictionary encoding for low cardinality columns gives very good overall space savings for some of our top tables.

          It would be useful to have a initial patch/doc out asap, so that we can also play around with it.

          Show
          Namit Jain added a comment - Owen, will it also be more compact overall ? Will it be possible to perform dictionary encoding for a string ? Some initial experiments we performed on top of RCFile suggested that dictionary encoding for low cardinality columns gives very good overall space savings for some of our top tables. It would be useful to have a initial patch/doc out asap, so that we can also play around with it.
          Hide
          Owen O'Malley added a comment -

          Namit,
          Yes, it has dictionary encoding for strings. The dictionary enables both better compression and makes push down filters much more efficient. The dictionaries are local to only the row group, so that row groups can be processed independently of each other. Currently, strings are always dictionary encoded, but it would make sense to allow the writer to pick whether the column should be encoded directly or using a dictionary.

          Show
          Owen O'Malley added a comment - Namit, Yes, it has dictionary encoding for strings. The dictionary enables both better compression and makes push down filters much more efficient. The dictionaries are local to only the row group, so that row groups can be processed independently of each other. Currently, strings are always dictionary encoded, but it would make sense to allow the writer to pick whether the column should be encoded directly or using a dictionary.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          +100 !

          Show
          Vinod Kumar Vavilapalli added a comment - +100 !
          Hide
          He Yongqiang added a comment -

          will this optimized format support backward compatibility? If it's backward compatible, it will be easier to deploy. New formats without backward compatibility is really a headache, especially when you have a need to convert old data.

          Show
          He Yongqiang added a comment - will this optimized format support backward compatibility? If it's backward compatible, it will be easier to deploy. New formats without backward compatibility is really a headache, especially when you have a need to convert old data.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Bumping up the version number for ORC and transparently forwarding old data to the current file format should work, no?

          Show
          Vinod Kumar Vavilapalli added a comment - Bumping up the version number for ORC and transparently forwarding old data to the current file format should work, no?
          Hide
          He Yongqiang added a comment -

          That should work, just want to make sure they have similar API, so other tools/utilities will automatically work, or just needs small changes. One example is the block merger.

          Show
          He Yongqiang added a comment - That should work, just want to make sure they have similar API, so other tools/utilities will automatically work, or just needs small changes. One example is the block merger.
          Hide
          Namit Jain added a comment -

          Can the index be made optional ? In our typical use-case, the old data is hardly queried - so we are willing to trade off cpu, and not
          support skipping rows for old data to save some space.

          These are not v1 requirements, but might be good to have.

          Show
          Namit Jain added a comment - Can the index be made optional ? In our typical use-case, the old data is hardly queried - so we are willing to trade off cpu, and not support skipping rows for old data to save some space. These are not v1 requirements, but might be good to have.
          Hide
          Namit Jain added a comment -

          Owen O'Malley, what are your thoughts on Trevni ? From the ppt. ORC strictly looks better than Trenvi.
          Should we focus more on ORC in that case.

          Show
          Namit Jain added a comment - Owen O'Malley , what are your thoughts on Trevni ? From the ppt. ORC strictly looks better than Trenvi. Should we focus more on ORC in that case.
          Hide
          Owen O'Malley added a comment -

          He Yongqiang, the APIs to the two formats are significantly different. It would be possible to extend the RCFile reader to recognize an ORC file and to have it delegate to the ORC File reader.

          The other direction (having the ORC file reader parse an RCFile) isn't possible, because ORC provides operations that would be very expensive or impossible to implement in RCFile.

          One concern with making the RCFile reader delegate to the ORC file reader is that RCFile returns binary values that are interpreted by the serde while in ORC deserialization happens in the reader. Therefore, either the adaptor would need to re-serialize the data or would require changes in the serde as well.

          Show
          Owen O'Malley added a comment - He Yongqiang, the APIs to the two formats are significantly different. It would be possible to extend the RCFile reader to recognize an ORC file and to have it delegate to the ORC File reader. The other direction (having the ORC file reader parse an RCFile) isn't possible, because ORC provides operations that would be very expensive or impossible to implement in RCFile. One concern with making the RCFile reader delegate to the ORC file reader is that RCFile returns binary values that are interpreted by the serde while in ORC deserialization happens in the reader. Therefore, either the adaptor would need to re-serialize the data or would require changes in the serde as well.
          Hide
          Owen O'Malley added a comment -

          Namit, I obviously did consider Trevni, but it didn't support some of the features that I wanted:

          • using the hive type model
          • more advanced encodings like dictionaries
          • the ability to support push down predicates for skipping row groups
          • running compression in block mode rather than streaming so that the reader can skip entire compression blocks
          Show
          Owen O'Malley added a comment - Namit, I obviously did consider Trevni, but it didn't support some of the features that I wanted: using the hive type model more advanced encodings like dictionaries the ability to support push down predicates for skipping row groups running compression in block mode rather than streaming so that the reader can skip entire compression blocks
          Hide
          Namit Jain added a comment -

          What I meant was, for a pure hive user (who does not data inherit data from anywhere else), is there any advantage of trevni over ORC ?

          Show
          Namit Jain added a comment - What I meant was, for a pure hive user (who does not data inherit data from anywhere else), is there any advantage of trevni over ORC ?
          Hide
          Owen O'Malley added a comment -

          Namit, for pure hive users there aren't any advantages of trevni over ORC.

          Show
          Owen O'Malley added a comment - Namit, for pure hive users there aren't any advantages of trevni over ORC.
          Hide
          Doug Cutting added a comment -

          Owen, did you consider proposing improvements to Trevni instead?

          Addressing your four points of distinction with Trevni:

          • How is Trevni's type model incompatible with Hive? Is the irreparable?
          • Might dictionaries be somehow added to Trevni?
          • What sort of indexes are required in addition to those that Trevni supports, where the initial value of every block may be stored before all the blocks, permitting random access by value to the blocks? If something different is required, might that be added to Trevni?
          • Trevni uses relatively small compression blocks (~64k) that may be skipped. How would block mode substantially improve this? If it would, might this change be made to Trevni?

          Thanks!

          Show
          Doug Cutting added a comment - Owen, did you consider proposing improvements to Trevni instead? Addressing your four points of distinction with Trevni: How is Trevni's type model incompatible with Hive? Is the irreparable? Might dictionaries be somehow added to Trevni? What sort of indexes are required in addition to those that Trevni supports, where the initial value of every block may be stored before all the blocks, permitting random access by value to the blocks? If something different is required, might that be added to Trevni? Trevni uses relatively small compression blocks (~64k) that may be skipped. How would block mode substantially improve this? If it would, might this change be made to Trevni? Thanks!
          Hide
          He Yongqiang added a comment -

          It would be possible to extend the RCFile reader to recognize an ORC file and to have it delegate to the ORC File reader.

          it will be great to have this support. In this case, what's the fileformat for the partition/table, rcfile, or orcfile?

          When we did the conversion for old data from sequencefile to rcfile long time ago, it is a big headache handle errors like "unrecognized fileformat or corruption" because there is no interoperability between these two files. The most errors we saw are because the table/partition format does not match the actual data format.

          two examples:
          1. old partition's data is rcfile, new partition's data is in orc format.
          2. in one partition, some files are rcfile, and some files are in orc format.

          Show
          He Yongqiang added a comment - It would be possible to extend the RCFile reader to recognize an ORC file and to have it delegate to the ORC File reader. it will be great to have this support. In this case, what's the fileformat for the partition/table, rcfile, or orcfile? When we did the conversion for old data from sequencefile to rcfile long time ago, it is a big headache handle errors like "unrecognized fileformat or corruption" because there is no interoperability between these two files. The most errors we saw are because the table/partition format does not match the actual data format. two examples: 1. old partition's data is rcfile, new partition's data is in orc format. 2. in one partition, some files are rcfile, and some files are in orc format.
          Hide
          Doug Cutting added a comment -

          A way to proceed with improvements to Trevni might be, once HIVE-3585 is committed, propose a patch to Trevni together with a Hive benchmark that illustrates its advantage. Then we could quantitatively demonstrate the advantage of each proposed improvement. With HIVE-3585 we should be able to quantitatively demonstrate advantages of Trevni as-is over RCFile and SequenceFile.

          Show
          Doug Cutting added a comment - A way to proceed with improvements to Trevni might be, once HIVE-3585 is committed, propose a patch to Trevni together with a Hive benchmark that illustrates its advantage. Then we could quantitatively demonstrate the advantage of each proposed improvement. With HIVE-3585 we should be able to quantitatively demonstrate advantages of Trevni as-is over RCFile and SequenceFile.
          Hide
          Russell Jurney added a comment -

          He had told us this work must go in contrib. See HIVE-3585

          Show
          Russell Jurney added a comment - He had told us this work must go in contrib. See HIVE-3585
          Hide
          He Yongqiang added a comment -

          I want to list a few thoughts why i think the orc solution is a much more appealing one.

          1. For a BIG data warehouse that stores more than 90% of existing data in rcfile (like FB's >100PB warehouse), data conversion from one format to another is something that definitely should be avoided. It is possible to convert some tables if there is a big space saving advantage. But managing two distinct formats which do not have any compatibility, inter-operability, or even in two different code repositories is another big headache that would avoid at the first place.
          2. Developing the new ORC format in the hive/hcatalog codebase will make hive development/operations much easier.
          3. Letting new ORC format have some backward compatibility with RCFile will save a lot of trouble.

          Show
          He Yongqiang added a comment - I want to list a few thoughts why i think the orc solution is a much more appealing one. 1. For a BIG data warehouse that stores more than 90% of existing data in rcfile (like FB's >100PB warehouse), data conversion from one format to another is something that definitely should be avoided. It is possible to convert some tables if there is a big space saving advantage. But managing two distinct formats which do not have any compatibility, inter-operability, or even in two different code repositories is another big headache that would avoid at the first place. 2. Developing the new ORC format in the hive/hcatalog codebase will make hive development/operations much easier. 3. Letting new ORC format have some backward compatibility with RCFile will save a lot of trouble.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Can the index be made optional ? In our typical use-case, the old data is hardly queried - so we are willing to trade off cpu, and not

          support skipping rows for old data to save some space.
          The way I understand it, index creation can be specified during creation, so it can be made optional. To start with, we may in fact have no indices and then add them later.

          Show
          Vinod Kumar Vavilapalli added a comment - Can the index be made optional ? In our typical use-case, the old data is hardly queried - so we are willing to trade off cpu, and not support skipping rows for old data to save some space. The way I understand it, index creation can be specified during creation, so it can be made optional. To start with, we may in fact have no indices and then add them later.
          Hide
          Namit Jain added a comment -

          In partition metadata, can I somehow specify - dont create index ?
          When that partition is over-written, the index would dis-appear.

          I agree, we can think about that later.

          Show
          Namit Jain added a comment - In partition metadata, can I somehow specify - dont create index ? When that partition is over-written, the index would dis-appear. I agree, we can think about that later.
          Hide
          Owen O'Malley added a comment -

          Namit, I'm using the table properties to manage the other features like compression, so I would probably make a table property like 'orc.create.index' or something. Would that make sense?

          I should note that the indexes are very light. In a sample file:

          • uncompressed text: 370MB
          • compress orc: 86MB
          • row index in orc: 140k
          Show
          Owen O'Malley added a comment - Namit, I'm using the table properties to manage the other features like compression, so I would probably make a table property like 'orc.create.index' or something. Would that make sense? I should note that the indexes are very light. In a sample file: uncompressed text: 370MB compress orc: 86MB row index in orc: 140k
          Hide
          Owen O'Malley added a comment -

          Doug, of course Trevni could be modified arbitrarily to match the needs of Hive. But Hive will benefit more if there is a deep integration between the file format and the query engine. Both HBase and Accumulo have file formats that were originally based on Hadoop's TFile. But the need for integration with the query engine was such that their projects were better served by having the file format in their project rather than an upstream project.

          Of course the Avro project is free to copy any of the ORC code into Trevni, but Hive has the need to innovate in this area without asking Avro to make changes and waiting for them to be released.

          Show
          Owen O'Malley added a comment - Doug, of course Trevni could be modified arbitrarily to match the needs of Hive. But Hive will benefit more if there is a deep integration between the file format and the query engine. Both HBase and Accumulo have file formats that were originally based on Hadoop's TFile. But the need for integration with the query engine was such that their projects were better served by having the file format in their project rather than an upstream project. Of course the Avro project is free to copy any of the ORC code into Trevni, but Hive has the need to innovate in this area without asking Avro to make changes and waiting for them to be released.
          Hide
          Sambavi Muthukrishnan added a comment -

          Owen: is the row group expected to be about 250 MB (per stripe size)? Does your implementation attempt to make every row group align with HDFS block size so a split = 1 block?

          Also: do you have an ETA for an initial patch? We would really like to try this out - we have some additional ideas that we would like to try out on top of this.

          Show
          Sambavi Muthukrishnan added a comment - Owen: is the row group expected to be about 250 MB (per stripe size)? Does your implementation attempt to make every row group align with HDFS block size so a split = 1 block? Also: do you have an ETA for an initial patch? We would really like to try this out - we have some additional ideas that we would like to try out on top of this.
          Hide
          Owen O'Malley added a comment -

          Sambavi, I should have a patch ready next week. Yes, the row groups (stripes) are 250MB by default. I currently set the HDFS block size for the files to 2 times the stripe size, but I don't try to align them other than that.

          Show
          Owen O'Malley added a comment - Sambavi, I should have a patch ready next week. Yes, the row groups (stripes) are 250MB by default. I currently set the HDFS block size for the files to 2 times the stripe size, but I don't try to align them other than that.
          Hide
          Namit Jain added a comment -

          Owen, that would be good 'a table property like 'orc.create.index' or something'

          Show
          Namit Jain added a comment - Owen, that would be good 'a table property like 'orc.create.index' or something'
          Hide
          Joydeep Sen Sarma added a comment -

          couple of observations:

          • one use case mentioned is external indices. but in my experience, secondary index pointers have little correlation with the primary key ordering. If the use case is to speed up secondary index lookups - then one will be forced to consider smaller row groups. At that point - this starts breaking down - large row groups are good for scanning for scanning and compression - but poor for lookups.

          a possible way out is to do a two level structure - stripes or chunks as the unit of compression (column dictionaries maintained at this level), but a smaller unit for row-groups (a single 250MB chunk has many smaller row groups all encoded using a common dictionary). this can give a good balance of compression and lookup capabilities.

          at this point - i believe - we are closer to a HFile data structure - and I think converging HFile* so it works well for Hive would be a great goal. A lot of people would benefit from letting HBase do indexing and let Hive/Hadoop chomp on HBase produced HFiles.

          • another use case mentioned is pruning based on column ranges. Once again - these use cases typically only benefit columns whose values are correlated with the primary row order. Timestamps and anything correlated with timestamps do benefit - but others don't. In systems like Netezza - this is used as a substitute for partitioning.

          The issue is that pruning at the block level is not enough - because one has already generated large number splits for MR to chomp on. And large number splits make processing really slow - even if everything is pruned out inside each mapper. Unless that issue is addressed - most users would end up repartitioning their (using Hive's dynamic partitioning) based on column values - and the whole column range stuff would largely not come in use.

          Show
          Joydeep Sen Sarma added a comment - couple of observations: one use case mentioned is external indices. but in my experience, secondary index pointers have little correlation with the primary key ordering. If the use case is to speed up secondary index lookups - then one will be forced to consider smaller row groups. At that point - this starts breaking down - large row groups are good for scanning for scanning and compression - but poor for lookups. a possible way out is to do a two level structure - stripes or chunks as the unit of compression (column dictionaries maintained at this level), but a smaller unit for row-groups (a single 250MB chunk has many smaller row groups all encoded using a common dictionary). this can give a good balance of compression and lookup capabilities. at this point - i believe - we are closer to a HFile data structure - and I think converging HFile* so it works well for Hive would be a great goal. A lot of people would benefit from letting HBase do indexing and let Hive/Hadoop chomp on HBase produced HFiles. another use case mentioned is pruning based on column ranges. Once again - these use cases typically only benefit columns whose values are correlated with the primary row order. Timestamps and anything correlated with timestamps do benefit - but others don't. In systems like Netezza - this is used as a substitute for partitioning. The issue is that pruning at the block level is not enough - because one has already generated large number splits for MR to chomp on. And large number splits make processing really slow - even if everything is pruned out inside each mapper. Unless that issue is addressed - most users would end up repartitioning their (using Hive's dynamic partitioning) based on column values - and the whole column range stuff would largely not come in use.
          Hide
          Yin Huai added a comment -

          one question. Why a small row group size has to be used if we want to speed up secondary index lookups? When using a large row group size, if we store a column to multiple blocks, with a index of this column in this large row group, we do not need to read the entire column from the disk. Also, we can use row numbers to locate what blocks should be read from other columns in the row group.

          If a small row group size is used, the size of a single column can be very small and a single buffered read may retrieve lots of unnecessary data from those unneeded columns from the disk.

          Show
          Yin Huai added a comment - one question. Why a small row group size has to be used if we want to speed up secondary index lookups? When using a large row group size, if we store a column to multiple blocks, with a index of this column in this large row group, we do not need to read the entire column from the disk. Also, we can use row numbers to locate what blocks should be read from other columns in the row group. If a small row group size is used, the size of a single column can be very small and a single buffered read may retrieve lots of unnecessary data from those unneeded columns from the disk.
          Hide
          Owen O'Malley added a comment -

          Joydeep, I've used a two level strategy:

          • large stripes (default 250MB) to enable large efficient reads
          • relatively frequent row index entries (default 10k rows) to enable skipping with in a stripe

          The row index entries have the locations within each column to enable seeking to the right compression block and byte within the decompressed block.

          I obviously did consider HFile, although from a practical point of view it is fairly embedded within HBase. Additionally, since it treats each of the columns as bytes it can't do any type-specific encodings/compression and can't interpret the column values, which is critical for performance.

          Once you have the ability to skip large sets of rows based on the filter predicates, you can sort the table on the secondary keys and achieve a large speed up. For example, if your primary partition is transaction date, you might want to sort the table on state, zip, and last name. Then if you are looking for just the records in CA it won't need to read the records for the other states.

          Show
          Owen O'Malley added a comment - Joydeep, I've used a two level strategy: large stripes (default 250MB) to enable large efficient reads relatively frequent row index entries (default 10k rows) to enable skipping with in a stripe The row index entries have the locations within each column to enable seeking to the right compression block and byte within the decompressed block. I obviously did consider HFile, although from a practical point of view it is fairly embedded within HBase. Additionally, since it treats each of the columns as bytes it can't do any type-specific encodings/compression and can't interpret the column values, which is critical for performance. Once you have the ability to skip large sets of rows based on the filter predicates, you can sort the table on the secondary keys and achieve a large speed up. For example, if your primary partition is transaction date, you might want to sort the table on state, zip, and last name. Then if you are looking for just the records in CA it won't need to read the records for the other states.
          Hide
          Owen O'Malley added a comment -

          Yin, large stripes (and I'm defaulting to 250MB) enable efficient reads from HDFS. The row indexes help address the issue of the large stripes by providing the offsets within the large stripes.

          Show
          Owen O'Malley added a comment - Yin, large stripes (and I'm defaulting to 250MB) enable efficient reads from HDFS. The row indexes help address the issue of the large stripes by providing the offsets within the large stripes.
          Hide
          Owen O'Malley added a comment -

          Here's the current version of the code. The seek to row isn't implemented and it is still a standalone project, but it will let people start looking at it.

          Show
          Owen O'Malley added a comment - Here's the current version of the code. The seek to row isn't implemented and it is still a standalone project, but it will let people start looking at it.
          Hide
          Owen O'Malley added a comment -

          I've updated the patch with the index suppression option that Nammit asked for.

          Show
          Owen O'Malley added a comment - I've updated the patch with the index suppression option that Nammit asked for.
          Hide
          Owen O'Malley added a comment -

          I've fixed some bugs.

          Show
          Owen O'Malley added a comment - I've fixed some bugs.
          Hide
          Alan Gates added a comment -

          Letting new ORC format have some backward compatibility with RCFile will save a lot of trouble.

          He Yongqiang, I'm curious what trouble this would solve. Hive handles the case where different partitions in a table have different formats. Rather than trying to integrate these two formats at the InputFormat/SerDe level it makes more sense to me to test that when a table has some partitions in RC and some in ORC reads across the partitions still works.

          Show
          Alan Gates added a comment - Letting new ORC format have some backward compatibility with RCFile will save a lot of trouble. He Yongqiang , I'm curious what trouble this would solve. Hive handles the case where different partitions in a table have different formats. Rather than trying to integrate these two formats at the InputFormat/SerDe level it makes more sense to me to test that when a table has some partitions in RC and some in ORC reads across the partitions still works.
          Hide
          Namit Jain added a comment -

          Owen O'Malley, do you want to get the patch in a compilable state in contrib ?
          That way, we can work on getting it in, and continue development over there.

          Show
          Namit Jain added a comment - Owen O'Malley , do you want to get the patch in a compilable state in contrib ? That way, we can work on getting it in, and continue development over there.
          Hide
          Namit Jain added a comment -

          Alan Gates, Ashutosh Chauhan, what do you think ?

          Show
          Namit Jain added a comment - Alan Gates , Ashutosh Chauhan , what do you think ?
          Hide
          Kevin Wilfong added a comment -

          +1 to Namit's suggestion

          Show
          Kevin Wilfong added a comment - +1 to Namit's suggestion
          Hide
          Owen O'Malley added a comment -

          Namit Jain, I've got one more feature that I'm working on (seek to row) and then I'll make a patch. I'm aiming to upload the patch on Friday.

          Show
          Owen O'Malley added a comment - Namit Jain , I've got one more feature that I'm working on (seek to row) and then I'll make a patch. I'm aiming to upload the patch on Friday.
          Hide
          Namit Jain added a comment -

          I took a stab at it. I am attaching it just in case - feel free to ignore it.
          I was not able to get the protocol buffer file auto-generated from ant, so I manually generated it for the
          purpose of this patch.

          Show
          Namit Jain added a comment - I took a stab at it. I am attaching it just in case - feel free to ignore it. I was not able to get the protocol buffer file auto-generated from ant, so I manually generated it for the purpose of this patch.
          Hide
          Carl Steinbach added a comment -

          @Namit: The only advantage I see to putting ORC in contrib is that it will make it harder for people to use. Why are people opposed to adding this to the serde module?

          Show
          Carl Steinbach added a comment - @Namit: The only advantage I see to putting ORC in contrib is that it will make it harder for people to use. Why are people opposed to adding this to the serde module?
          Hide
          Kevin Wilfong added a comment -

          The reason I supported the idea was that I was hoping this would get it into the repo sooner. Based on my experiences trying it so far, it seems a little unstable, but I would like to help fix the issues. Getting the code in contrib would make it easier for other contributors to provide fixes as we develop them, without suggesting to users that it is as solid as any other piece of code in Hive (relatively speaking of course). I assumed it would be pulled into the serde module after this (short) period of cleanup.

          If people are willing to pull this into the serde module with the knowledge of that instability and that people would be working to fix it, I'd be happy with that too.

          Show
          Kevin Wilfong added a comment - The reason I supported the idea was that I was hoping this would get it into the repo sooner. Based on my experiences trying it so far, it seems a little unstable, but I would like to help fix the issues. Getting the code in contrib would make it easier for other contributors to provide fixes as we develop them, without suggesting to users that it is as solid as any other piece of code in Hive (relatively speaking of course). I assumed it would be pulled into the serde module after this (short) period of cleanup. If people are willing to pull this into the serde module with the knowledge of that instability and that people would be working to fix it, I'd be happy with that too.
          Hide
          Carl Steinbach added a comment -

          @Kevin: I think there will be more than enough time to stabilize this feature on trunk before the next release. I also think putting this code in the serde module and adding a release note explaining any stability issues is preferable to the alternative which will require users to update all of their scripts when ORC is moved from contrib to serde.

          Show
          Carl Steinbach added a comment - @Kevin: I think there will be more than enough time to stabilize this feature on trunk before the next release. I also think putting this code in the serde module and adding a release note explaining any stability issues is preferable to the alternative which will require users to update all of their scripts when ORC is moved from contrib to serde.
          Hide
          Kevin Wilfong added a comment -

          @Carl: In that case, I don't see any reason to put it in contrib.

          Show
          Kevin Wilfong added a comment - @Carl: In that case, I don't see any reason to put it in contrib.
          Hide
          Kevin Wilfong added a comment -

          @Owen: Regarding some of the issues I've seen:
          1) In the add method in DynamicByteArray, the line which updates remaining seems a little off, and it causes a NPE if newLength > chunkSize / 2
          I think it should be remaining -= size
          2) I had trouble reading a column of only null values, I saw division by zero exceptions in a couple methods of DynamicByteArray.
          I wrote up a possible fix here https://reviews.facebook.net/D8361 but I'm not sure if it's the right fix.

          If you want to wait, I can file formal JIRAs for these later instead.

          Show
          Kevin Wilfong added a comment - @Owen: Regarding some of the issues I've seen: 1) In the add method in DynamicByteArray, the line which updates remaining seems a little off, and it causes a NPE if newLength > chunkSize / 2 I think it should be remaining -= size 2) I had trouble reading a column of only null values, I saw division by zero exceptions in a couple methods of DynamicByteArray. I wrote up a possible fix here https://reviews.facebook.net/D8361 but I'm not sure if it's the right fix. If you want to wait, I can file formal JIRAs for these later instead.
          Hide
          Namit Jain added a comment -

          @Carl, I am OK with putting this in the serde.
          We can even add a new config, which clearly says that this is a work in progress type of thing.
          The only reason for contrib was to suggest that this is not stable enough.

          Show
          Namit Jain added a comment - @Carl, I am OK with putting this in the serde. We can even add a new config, which clearly says that this is a work in progress type of thing. The only reason for contrib was to suggest that this is not stable enough.
          Hide
          Kevin Wilfong added a comment -

          @Owen: Here's a couple more issues I ran into, and again I can file JIRAs for these later once the code is checked in.

          Incorrect deserialization of doubles (leads to a lot of NaNs)
          https://reviews.facebook.net/D8379

          Strings are written incorrectly when they span two chunks of a DynamicByteArray
          E.g. say the original string is 'abcdefghi' the string written in the ORC file may be 'abcdefabc'
          https://reviews.facebook.net/D8385

          Show
          Kevin Wilfong added a comment - @Owen: Here's a couple more issues I ran into, and again I can file JIRAs for these later once the code is checked in. Incorrect deserialization of doubles (leads to a lot of NaNs) https://reviews.facebook.net/D8379 Strings are written incorrectly when they span two chunks of a DynamicByteArray E.g. say the original string is 'abcdefghi' the string written in the ORC file may be 'abcdefabc' https://reviews.facebook.net/D8385
          Hide
          Owen O'Malley added a comment -

          Kevin Wilfong Thanks for the bug fixes, Kevin. I pushed the DynamicByteArray and double serialization fixes to github. I have the null column problem fixed, but it is tied into my other changes on my row-seek dev branch. I hope to finish up the row-seek today and I'll merge it into master and make the patch putting it into Hive.

          Show
          Owen O'Malley added a comment - Kevin Wilfong Thanks for the bug fixes, Kevin. I pushed the DynamicByteArray and double serialization fixes to github . I have the null column problem fixed, but it is tied into my other changes on my row-seek dev branch. I hope to finish up the row-seek today and I'll merge it into master and make the patch putting it into Hive.
          Hide
          Owen O'Malley added a comment -

          I've pushed the current version up to github with the seek to record implemented. Does it make more sense to put ORC into serde or ql? RCFile is in ql, so I'd assumed it would go there. Thoughts?

          Show
          Owen O'Malley added a comment - I've pushed the current version up to github with the seek to record implemented. Does it make more sense to put ORC into serde or ql? RCFile is in ql, so I'd assumed it would go there. Thoughts?
          Hide
          Namit Jain added a comment -

          ql is fine.

          Show
          Namit Jain added a comment - ql is fine.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Would it make sense to create a (very) temporary svn branch for capturing various bug fixes from (possibly) different contributors on sub-JIRAs?

          Show
          Vinod Kumar Vavilapalli added a comment - Would it make sense to create a (very) temporary svn branch for capturing various bug fixes from (possibly) different contributors on sub-JIRAs?
          Hide
          Kevin Wilfong added a comment -

          Thanks for merging my fixes Owen, any update on uploading a patch here?

          Show
          Kevin Wilfong added a comment - Thanks for merging my fixes Owen, any update on uploading a patch here?
          Hide
          Owen O'Malley added a comment -

          Kevin, I had some distractions at work, but I should get the patch uploaded today.

          Show
          Owen O'Malley added a comment - Kevin, I had some distractions at work, but I should get the patch uploaded today.
          Hide
          Phabricator added a comment -

          omalley requested code review of "HIVE-3874 [jira] Create a new Optimized Row Columnar file format for Hive".

          Reviewers: JIRA

          HIVE-3874. Create ORC File format.

          There are several limitations of the current RC File format that I'd like to address by creating a new format:

          each column value is stored as a binary blob, which means:

          the entire column value must be read, decompressed, and deserialized
          the file format can't use smarter type-specific compression
          push down filters can't be evaluated

          the start of each row group needs to be found by scanning
          user metadata can only be added to the file when the file is created
          the file doesn't store the number of rows per a file or row group
          there is no mechanism for seeking to a particular row number, which is required for external indexes.
          there is no mechanism for storing light weight indexes within the file to enable push-down filters to skip entire row groups.
          the type of the rows aren't stored in the file

          TEST PLAN
          EMPTY

          REVISION DETAIL
          https://reviews.facebook.net/D8529

          AFFECTED FILES
          build.properties
          build.xml
          ivy/libraries.properties
          ql/build.xml
          ql/ivy.xml
          ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io/orc/OrcProto.java
          ql/src/java/org/apache/hadoop/hive/ql/orc/BitFieldReader.java
          ql/src/java/org/apache/hadoop/hive/ql/orc/BitFieldWriter.java
          ql/src/java/org/apache/hadoop/hive/ql/orc/BooleanColumnStatistics.java
          ql/src/java/org/apache/hadoop/hive/ql/orc/ColumnStatistics.java
          ql/src/java/org/apache/hadoop/hive/ql/orc/ColumnStatisticsImpl.java
          ql/src/java/org/apache/hadoop/hive/ql/orc/CompressionCodec.java
          ql/src/java/org/apache/hadoop/hive/ql/orc/CompressionKind.java
          ql/src/java/org/apache/hadoop/hive/ql/orc/DoubleColumnStatistics.java
          ql/src/java/org/apache/hadoop/hive/ql/orc/DynamicByteArray.java
          ql/src/java/org/apache/hadoop/hive/ql/orc/DynamicIntArray.java
          ql/src/java/org/apache/hadoop/hive/ql/orc/FileDump.java
          ql/src/java/org/apache/hadoop/hive/ql/orc/InStream.java
          ql/src/java/org/apache/hadoop/hive/ql/orc/IntegerColumnStatistics.java
          ql/src/java/org/apache/hadoop/hive/ql/orc/OrcFile.java
          ql/src/java/org/apache/hadoop/hive/ql/orc/OrcInputFormat.java
          ql/src/java/org/apache/hadoop/hive/ql/orc/OrcOutputFormat.java
          ql/src/java/org/apache/hadoop/hive/ql/orc/OrcSerde.java
          ql/src/java/org/apache/hadoop/hive/ql/orc/OrcStruct.java
          ql/src/java/org/apache/hadoop/hive/ql/orc/OrcUnion.java
          ql/src/java/org/apache/hadoop/hive/ql/orc/OutStream.java
          ql/src/java/org/apache/hadoop/hive/ql/orc/PositionProvider.java
          ql/src/java/org/apache/hadoop/hive/ql/orc/PositionRecorder.java
          ql/src/java/org/apache/hadoop/hive/ql/orc/PositionedOutputStream.java
          ql/src/java/org/apache/hadoop/hive/ql/orc/Reader.java
          ql/src/java/org/apache/hadoop/hive/ql/orc/ReaderImpl.java
          ql/src/java/org/apache/hadoop/hive/ql/orc/RecordReader.java
          ql/src/java/org/apache/hadoop/hive/ql/orc/RecordReaderImpl.java
          ql/src/java/org/apache/hadoop/hive/ql/orc/RedBlackTree.java
          ql/src/java/org/apache/hadoop/hive/ql/orc/RunLengthByteReader.java
          ql/src/java/org/apache/hadoop/hive/ql/orc/RunLengthByteWriter.java
          ql/src/java/org/apache/hadoop/hive/ql/orc/RunLengthIntegerReader.java
          ql/src/java/org/apache/hadoop/hive/ql/orc/RunLengthIntegerWriter.java
          ql/src/java/org/apache/hadoop/hive/ql/orc/SerializationUtils.java
          ql/src/java/org/apache/hadoop/hive/ql/orc/SnappyCodec.java
          ql/src/java/org/apache/hadoop/hive/ql/orc/StreamName.java
          ql/src/java/org/apache/hadoop/hive/ql/orc/StringColumnStatistics.java
          ql/src/java/org/apache/hadoop/hive/ql/orc/StringRedBlackTree.java
          ql/src/java/org/apache/hadoop/hive/ql/orc/StripeInformation.java
          ql/src/java/org/apache/hadoop/hive/ql/orc/Writer.java
          ql/src/java/org/apache/hadoop/hive/ql/orc/WriterImpl.java
          ql/src/java/org/apache/hadoop/hive/ql/orc/ZlibCodec.java
          ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto
          ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestBitFieldReader.java
          ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestDynamicArray.java
          ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestFileDump.java
          ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInStream.java
          ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java
          ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java
          ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcStruct.java
          ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestRunLengthByteReader.java
          ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestRunLengthIntegerReader.java
          ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestSerializationUtils.java
          ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestStreamName.java
          ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestStringRedBlackTree.java
          ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestZlib.java
          ql/src/test/resources/orc-file-dump.out

          MANAGE HERALD RULES
          https://reviews.facebook.net/herald/view/differential/

          WHY DID I GET THIS EMAIL?
          https://reviews.facebook.net/herald/transcript/20781/

          To: JIRA, omalley

          Show
          Phabricator added a comment - omalley requested code review of " HIVE-3874 [jira] Create a new Optimized Row Columnar file format for Hive". Reviewers: JIRA HIVE-3874 . Create ORC File format. There are several limitations of the current RC File format that I'd like to address by creating a new format: each column value is stored as a binary blob, which means: the entire column value must be read, decompressed, and deserialized the file format can't use smarter type-specific compression push down filters can't be evaluated the start of each row group needs to be found by scanning user metadata can only be added to the file when the file is created the file doesn't store the number of rows per a file or row group there is no mechanism for seeking to a particular row number, which is required for external indexes. there is no mechanism for storing light weight indexes within the file to enable push-down filters to skip entire row groups. the type of the rows aren't stored in the file TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D8529 AFFECTED FILES build.properties build.xml ivy/libraries.properties ql/build.xml ql/ivy.xml ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io/orc/OrcProto.java ql/src/java/org/apache/hadoop/hive/ql/orc/BitFieldReader.java ql/src/java/org/apache/hadoop/hive/ql/orc/BitFieldWriter.java ql/src/java/org/apache/hadoop/hive/ql/orc/BooleanColumnStatistics.java ql/src/java/org/apache/hadoop/hive/ql/orc/ColumnStatistics.java ql/src/java/org/apache/hadoop/hive/ql/orc/ColumnStatisticsImpl.java ql/src/java/org/apache/hadoop/hive/ql/orc/CompressionCodec.java ql/src/java/org/apache/hadoop/hive/ql/orc/CompressionKind.java ql/src/java/org/apache/hadoop/hive/ql/orc/DoubleColumnStatistics.java ql/src/java/org/apache/hadoop/hive/ql/orc/DynamicByteArray.java ql/src/java/org/apache/hadoop/hive/ql/orc/DynamicIntArray.java ql/src/java/org/apache/hadoop/hive/ql/orc/FileDump.java ql/src/java/org/apache/hadoop/hive/ql/orc/InStream.java ql/src/java/org/apache/hadoop/hive/ql/orc/IntegerColumnStatistics.java ql/src/java/org/apache/hadoop/hive/ql/orc/OrcFile.java ql/src/java/org/apache/hadoop/hive/ql/orc/OrcInputFormat.java ql/src/java/org/apache/hadoop/hive/ql/orc/OrcOutputFormat.java ql/src/java/org/apache/hadoop/hive/ql/orc/OrcSerde.java ql/src/java/org/apache/hadoop/hive/ql/orc/OrcStruct.java ql/src/java/org/apache/hadoop/hive/ql/orc/OrcUnion.java ql/src/java/org/apache/hadoop/hive/ql/orc/OutStream.java ql/src/java/org/apache/hadoop/hive/ql/orc/PositionProvider.java ql/src/java/org/apache/hadoop/hive/ql/orc/PositionRecorder.java ql/src/java/org/apache/hadoop/hive/ql/orc/PositionedOutputStream.java ql/src/java/org/apache/hadoop/hive/ql/orc/Reader.java ql/src/java/org/apache/hadoop/hive/ql/orc/ReaderImpl.java ql/src/java/org/apache/hadoop/hive/ql/orc/RecordReader.java ql/src/java/org/apache/hadoop/hive/ql/orc/RecordReaderImpl.java ql/src/java/org/apache/hadoop/hive/ql/orc/RedBlackTree.java ql/src/java/org/apache/hadoop/hive/ql/orc/RunLengthByteReader.java ql/src/java/org/apache/hadoop/hive/ql/orc/RunLengthByteWriter.java ql/src/java/org/apache/hadoop/hive/ql/orc/RunLengthIntegerReader.java ql/src/java/org/apache/hadoop/hive/ql/orc/RunLengthIntegerWriter.java ql/src/java/org/apache/hadoop/hive/ql/orc/SerializationUtils.java ql/src/java/org/apache/hadoop/hive/ql/orc/SnappyCodec.java ql/src/java/org/apache/hadoop/hive/ql/orc/StreamName.java ql/src/java/org/apache/hadoop/hive/ql/orc/StringColumnStatistics.java ql/src/java/org/apache/hadoop/hive/ql/orc/StringRedBlackTree.java ql/src/java/org/apache/hadoop/hive/ql/orc/StripeInformation.java ql/src/java/org/apache/hadoop/hive/ql/orc/Writer.java ql/src/java/org/apache/hadoop/hive/ql/orc/WriterImpl.java ql/src/java/org/apache/hadoop/hive/ql/orc/ZlibCodec.java ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestBitFieldReader.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestDynamicArray.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestFileDump.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInStream.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcStruct.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestRunLengthByteReader.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestRunLengthIntegerReader.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestSerializationUtils.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestStreamName.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestStringRedBlackTree.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestZlib.java ql/src/test/resources/orc-file-dump.out MANAGE HERALD RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/20781/ To: JIRA, omalley
          Hide
          Phabricator added a comment -

          kevinwilfong has commented on the revision "HIVE-3874 [jira] Create a new Optimized Row Columnar file format for Hive".

          A couple of minor style comments, according to the style guide https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-CodingConvention :

          There are a number of places in the code where you are missing spaces around + operators (e.g. line 58 in DynamicByteArray), you're missing a space between for and ( (e.g. line 63 in DynamicByteArray), and you're missing a space before a : in a for-each loop (e.g. line 191 in OrcStruct).

          Mentioning these now as I don't want them to hold up a commit later.

          INLINE COMMENTS
          ql/src/java/org/apache/hadoop/hive/ql/orc/OrcInputFormat.java:149-151 Is this loop necessary? result is a boolean array so all of these entries will default to false anyway
          ql/src/java/org/apache/hadoop/hive/ql/orc/OutStream.java:136-140 I'm a little confused by this, if compressed is null, why aren't you initializing overflow as well?
          ql/src/java/org/apache/hadoop/hive/ql/orc/OrcStruct.java:307 I saw issues with this, and TypeInfoUtils expecting array instead of list.
          ql/src/java/org/apache/hadoop/hive/ql/orc/WriterImpl.java:561-562 As far as I can tell, by storing the intermediate string data in these structures which do not write to a stream until writeStripe is called, the size of string columns is not being accounted for at all when determining whether or not to write out the stripe. (This could be fixed as a follow up)

          REVISION DETAIL
          https://reviews.facebook.net/D8529

          To: JIRA, omalley
          Cc: kevinwilfong

          Show
          Phabricator added a comment - kevinwilfong has commented on the revision " HIVE-3874 [jira] Create a new Optimized Row Columnar file format for Hive". A couple of minor style comments, according to the style guide https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-CodingConvention : There are a number of places in the code where you are missing spaces around + operators (e.g. line 58 in DynamicByteArray), you're missing a space between for and ( (e.g. line 63 in DynamicByteArray), and you're missing a space before a : in a for-each loop (e.g. line 191 in OrcStruct). Mentioning these now as I don't want them to hold up a commit later. INLINE COMMENTS ql/src/java/org/apache/hadoop/hive/ql/orc/OrcInputFormat.java:149-151 Is this loop necessary? result is a boolean array so all of these entries will default to false anyway ql/src/java/org/apache/hadoop/hive/ql/orc/OutStream.java:136-140 I'm a little confused by this, if compressed is null, why aren't you initializing overflow as well? ql/src/java/org/apache/hadoop/hive/ql/orc/OrcStruct.java:307 I saw issues with this, and TypeInfoUtils expecting array instead of list. ql/src/java/org/apache/hadoop/hive/ql/orc/WriterImpl.java:561-562 As far as I can tell, by storing the intermediate string data in these structures which do not write to a stream until writeStripe is called, the size of string columns is not being accounted for at all when determining whether or not to write out the stripe. (This could be fixed as a follow up) REVISION DETAIL https://reviews.facebook.net/D8529 To: JIRA, omalley Cc: kevinwilfong
          Hide
          Phabricator added a comment -

          kevinwilfong has commented on the revision "HIVE-3874 [jira] Create a new Optimized Row Columnar file format for Hive".

          lso

          INLINE COMMENTS
          ql/src/java/org/apache/hadoop/hive/ql/orc/BitFieldReader.java:18 The package name doesn't match the directory structure. This doesn't seem to be causing the build to fail, but in Eclipse it shows up as an error. Could you adjust either the package name or the directory structure so they match.

          REVISION DETAIL
          https://reviews.facebook.net/D8529

          To: JIRA, omalley
          Cc: kevinwilfong

          Show
          Phabricator added a comment - kevinwilfong has commented on the revision " HIVE-3874 [jira] Create a new Optimized Row Columnar file format for Hive". lso INLINE COMMENTS ql/src/java/org/apache/hadoop/hive/ql/orc/BitFieldReader.java:18 The package name doesn't match the directory structure. This doesn't seem to be causing the build to fail, but in Eclipse it shows up as an error. Could you adjust either the package name or the directory structure so they match. REVISION DETAIL https://reviews.facebook.net/D8529 To: JIRA, omalley Cc: kevinwilfong
          Hide
          Phabricator added a comment -

          kevinwilfong has commented on the revision "HIVE-3874 [jira] Create a new Optimized Row Columnar file format for Hive".

          *Ignore the "lso"

          REVISION DETAIL
          https://reviews.facebook.net/D8529

          To: JIRA, omalley
          Cc: kevinwilfong

          Show
          Phabricator added a comment - kevinwilfong has commented on the revision " HIVE-3874 [jira] Create a new Optimized Row Columnar file format for Hive". *Ignore the "lso" REVISION DETAIL https://reviews.facebook.net/D8529 To: JIRA, omalley Cc: kevinwilfong
          Hide
          Phabricator added a comment -

          njain has commented on the revision "HIVE-3874 [jira] Create a new Optimized Row Columnar file format for Hive".

          Do you want to have a simple test for HIVE-4015 as part of this patch ?
          Dont need to be exhaustive, but just a very simple select

          REVISION DETAIL
          https://reviews.facebook.net/D8529

          To: JIRA, omalley
          Cc: kevinwilfong, njain

          Show
          Phabricator added a comment - njain has commented on the revision " HIVE-3874 [jira] Create a new Optimized Row Columnar file format for Hive". Do you want to have a simple test for HIVE-4015 as part of this patch ? Dont need to be exhaustive, but just a very simple select REVISION DETAIL https://reviews.facebook.net/D8529 To: JIRA, omalley Cc: kevinwilfong, njain
          Hide
          Phabricator added a comment -

          omalley updated the revision "HIVE-3874 [jira] Create a new Optimized Row Columnar file format for Hive".

          Addressed Kevin's feedback.

          • Fixed 500+ warnings out of checkstyle - there are a few left that I couldn't
            avoid.
          • Fixed all of the cases that I could find where operators didn't have space
            around them. If we care about that, we should configure checkstyle to check
            for it.
          • Fixed the directory with the code to match the package name.
          • Removed the loop initializing the boolean array in OrcInputFormat.
          • Added code to include the dictionary size when estimating memory size.

          Reviewers: JIRA

          REVISION DETAIL
          https://reviews.facebook.net/D8529

          CHANGE SINCE LAST DIFF
          https://reviews.facebook.net/D8529?vs=27621&id=28101#toc

          AFFECTED FILES
          build.properties
          build.xml
          ivy/libraries.properties
          ql/build.xml
          ql/ivy.xml
          ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io/orc/OrcProto.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/BitFieldReader.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/BitFieldWriter.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/BooleanColumnStatistics.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/ColumnStatistics.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/ColumnStatisticsImpl.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/CompressionCodec.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/CompressionKind.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/DoubleColumnStatistics.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/DynamicByteArray.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/DynamicIntArray.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/FileDump.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/InStream.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/IntegerColumnStatistics.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSerde.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcStruct.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcUnion.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/OutStream.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/PositionProvider.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/PositionRecorder.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/PositionedOutputStream.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/Reader.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReader.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/RedBlackTree.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthByteReader.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthByteWriter.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerReader.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerWriter.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/SerializationUtils.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/SnappyCodec.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/StreamName.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringColumnStatistics.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringRedBlackTree.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/StripeInformation.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/Writer.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/ZlibCodec.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/package-info.java
          ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto
          ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestBitFieldReader.java
          ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestDynamicArray.java
          ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestFileDump.java
          ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInStream.java
          ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java
          ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java
          ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcStruct.java
          ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestRunLengthByteReader.java
          ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestRunLengthIntegerReader.java
          ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestSerializationUtils.java
          ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestStreamName.java
          ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestStringRedBlackTree.java
          ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestZlib.java
          ql/src/test/resources/orc-file-dump.out

          To: JIRA, omalley
          Cc: kevinwilfong, njain

          Show
          Phabricator added a comment - omalley updated the revision " HIVE-3874 [jira] Create a new Optimized Row Columnar file format for Hive". Addressed Kevin's feedback. Fixed 500+ warnings out of checkstyle - there are a few left that I couldn't avoid. Fixed all of the cases that I could find where operators didn't have space around them. If we care about that, we should configure checkstyle to check for it. Fixed the directory with the code to match the package name. Removed the loop initializing the boolean array in OrcInputFormat. Added code to include the dictionary size when estimating memory size. Reviewers: JIRA REVISION DETAIL https://reviews.facebook.net/D8529 CHANGE SINCE LAST DIFF https://reviews.facebook.net/D8529?vs=27621&id=28101#toc AFFECTED FILES build.properties build.xml ivy/libraries.properties ql/build.xml ql/ivy.xml ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io/orc/OrcProto.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/BitFieldReader.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/BitFieldWriter.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/BooleanColumnStatistics.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/ColumnStatistics.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/ColumnStatisticsImpl.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/CompressionCodec.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/CompressionKind.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/DoubleColumnStatistics.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/DynamicByteArray.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/DynamicIntArray.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/FileDump.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/InStream.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/IntegerColumnStatistics.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSerde.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcStruct.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcUnion.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/OutStream.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/PositionProvider.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/PositionRecorder.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/PositionedOutputStream.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/Reader.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReader.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/RedBlackTree.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthByteReader.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthByteWriter.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerReader.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerWriter.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/SerializationUtils.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/SnappyCodec.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/StreamName.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringColumnStatistics.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringRedBlackTree.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/StripeInformation.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/Writer.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/ZlibCodec.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/package-info.java ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestBitFieldReader.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestDynamicArray.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestFileDump.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInStream.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcStruct.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestRunLengthByteReader.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestRunLengthIntegerReader.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestSerializationUtils.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestStreamName.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestStringRedBlackTree.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestZlib.java ql/src/test/resources/orc-file-dump.out To: JIRA, omalley Cc: kevinwilfong, njain
          Hide
          Namit Jain added a comment -

          Can you fix eclipse also ?

          Show
          Namit Jain added a comment - Can you fix eclipse also ?
          Hide
          Phabricator added a comment -

          njain has commented on the revision "HIVE-3874 [jira] Create a new Optimized Row Columnar file format for Hive".

          1. Can you add more comments - specially, in the class/interface definitions
          Writer/TreeWriter/StreamFactory to name a few.
          2. Can column statistics be made optional ? (can be a follow-up)
          3. This has a lot of new code - I mean, is it possible to use some of the constructs which are
          already there - for eg. RedBlackTrees, RLE etc. Can you use some existing implementations
          instead of writing these from scratch ?

          REVISION DETAIL
          https://reviews.facebook.net/D8529

          To: JIRA, omalley
          Cc: kevinwilfong, njain

          Show
          Phabricator added a comment - njain has commented on the revision " HIVE-3874 [jira] Create a new Optimized Row Columnar file format for Hive". 1. Can you add more comments - specially, in the class/interface definitions Writer/TreeWriter/StreamFactory to name a few. 2. Can column statistics be made optional ? (can be a follow-up) 3. This has a lot of new code - I mean, is it possible to use some of the constructs which are already there - for eg. RedBlackTrees, RLE etc. Can you use some existing implementations instead of writing these from scratch ? REVISION DETAIL https://reviews.facebook.net/D8529 To: JIRA, omalley Cc: kevinwilfong, njain
          Hide
          Phabricator added a comment -

          njain has commented on the revision "HIVE-3874 [jira] Create a new Optimized Row Columnar file format for Hive".

          Right now, the RLE is fixed. Should it be pluggable ? I mean - we can have a different scheme to
          store deltas.

          It is perfectly fine to do all these changes in follow-ups, Can you file jiras for them, as you see
          appropriate. That way, once the basic framework is in, other people can also jump in

          INLINE COMMENTS
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerWriter.java:84 Is this correct – should you be comparing with literals[0] ?
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerWriter.java:105 If I understand it right, we are not optimizing for Deltas:
          If the data is:

          10
          11
          12
          13
          14

          We will be storing each value separately

          REVISION DETAIL
          https://reviews.facebook.net/D8529

          To: JIRA, omalley
          Cc: kevinwilfong, njain

          Show
          Phabricator added a comment - njain has commented on the revision " HIVE-3874 [jira] Create a new Optimized Row Columnar file format for Hive". Right now, the RLE is fixed. Should it be pluggable ? I mean - we can have a different scheme to store deltas. It is perfectly fine to do all these changes in follow-ups, Can you file jiras for them, as you see appropriate. That way, once the basic framework is in, other people can also jump in INLINE COMMENTS ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerWriter.java:84 Is this correct – should you be comparing with literals [0] ? ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerWriter.java:105 If I understand it right, we are not optimizing for Deltas: If the data is: 10 11 12 13 14 We will be storing each value separately REVISION DETAIL https://reviews.facebook.net/D8529 To: JIRA, omalley Cc: kevinwilfong, njain
          Hide
          Phabricator added a comment -

          omalley updated the revision "HIVE-3874 [jira] Create a new Optimized Row Columnar file format for Hive".

          • fix unit tests

          Reviewers: JIRA

          REVISION DETAIL
          https://reviews.facebook.net/D8529

          CHANGE SINCE LAST DIFF
          https://reviews.facebook.net/D8529?vs=28101&id=28305#toc

          AFFECTED FILES
          build.properties
          build.xml
          ivy/libraries.properties
          ql/build.xml
          ql/ivy.xml
          ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io/orc/OrcProto.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/BitFieldReader.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/BitFieldWriter.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/BooleanColumnStatistics.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/ColumnStatistics.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/ColumnStatisticsImpl.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/CompressionCodec.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/CompressionKind.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/DoubleColumnStatistics.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/DynamicByteArray.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/DynamicIntArray.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/FileDump.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/InStream.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/IntegerColumnStatistics.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSerde.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcStruct.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcUnion.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/OutStream.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/PositionProvider.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/PositionRecorder.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/PositionedOutputStream.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/Reader.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReader.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/RedBlackTree.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthByteReader.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthByteWriter.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerReader.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerWriter.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/SerializationUtils.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/SnappyCodec.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/StreamName.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringColumnStatistics.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringRedBlackTree.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/StripeInformation.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/Writer.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/ZlibCodec.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/package-info.java
          ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto
          ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestBitFieldReader.java
          ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestDynamicArray.java
          ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestFileDump.java
          ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInStream.java
          ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java
          ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java
          ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcStruct.java
          ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestRunLengthByteReader.java
          ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestRunLengthIntegerReader.java
          ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestSerializationUtils.java
          ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestStreamName.java
          ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestStringRedBlackTree.java
          ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestZlib.java
          ql/src/test/resources/orc-file-dump.out

          To: JIRA, omalley
          Cc: kevinwilfong, njain

          Show
          Phabricator added a comment - omalley updated the revision " HIVE-3874 [jira] Create a new Optimized Row Columnar file format for Hive". fix unit tests Reviewers: JIRA REVISION DETAIL https://reviews.facebook.net/D8529 CHANGE SINCE LAST DIFF https://reviews.facebook.net/D8529?vs=28101&id=28305#toc AFFECTED FILES build.properties build.xml ivy/libraries.properties ql/build.xml ql/ivy.xml ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io/orc/OrcProto.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/BitFieldReader.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/BitFieldWriter.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/BooleanColumnStatistics.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/ColumnStatistics.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/ColumnStatisticsImpl.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/CompressionCodec.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/CompressionKind.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/DoubleColumnStatistics.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/DynamicByteArray.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/DynamicIntArray.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/FileDump.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/InStream.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/IntegerColumnStatistics.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSerde.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcStruct.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcUnion.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/OutStream.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/PositionProvider.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/PositionRecorder.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/PositionedOutputStream.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/Reader.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReader.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/RedBlackTree.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthByteReader.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthByteWriter.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerReader.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerWriter.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/SerializationUtils.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/SnappyCodec.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/StreamName.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringColumnStatistics.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringRedBlackTree.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/StripeInformation.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/Writer.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/ZlibCodec.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/package-info.java ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestBitFieldReader.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestDynamicArray.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestFileDump.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInStream.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcStruct.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestRunLengthByteReader.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestRunLengthIntegerReader.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestSerializationUtils.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestStreamName.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestStringRedBlackTree.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestZlib.java ql/src/test/resources/orc-file-dump.out To: JIRA, omalley Cc: kevinwilfong, njain
          Hide
          Phabricator added a comment -

          omalley has commented on the revision "HIVE-3874 [jira] Create a new Optimized Row Columnar file format for Hive".

          Do you want to have a simple test for HIVE-4015 as part of this patch ?

          Since the test will fail until I change the grammer, I think it would be better to wait until they will pass.

          1. Can you add more comments - specially, in the class/interface definitions Writer/TreeWriter/StreamFactory to name a few.

          Sure.

          2. Can column statistics be made optional ? (can be a follow-up)

          They are very cheap in practice, but it wouldn't be hard to disable them.

          3. This has a lot of new code - I mean, is it possible to use some of the constructs which are already there - for eg. RedBlackTrees, RLE etc. Can you use some existing implementations instead of writing these from scratch ?

          I'm a big fan of not writing new code when I can just use someone else's. That said, it is always a trade off when evaluating when to re-use versus write new that involves comparing the requirements to what the other code provides.

          I'm not aware of any open source java red-black trees that work on primitives without allocating multiple objects per an entry. Do you have a suggestion?

          The RLE is very specific to ORC format and it didn't seem like any of the implementations available were a good match. I'm also considering how to do a better delta and small integer encoding, but I'll do that in a follow up jira.

          Right now, the RLE is fixed. Should it be pluggable ? I mean - we can have a different scheme to store deltas.

          I think that pluggable will create compatibility problems, since you won't be able to read an ORC file that was written by a different plugin.

          My preferred direction is to use the ColumnEncoding to allow the Writer to pick a different encoding based on the observed data. For example, by looking at the first 100,000 values the writer should be able to decide if a dictionary or direct encoding is better. We could use the same mechanism to add additional encodings.

          INLINE COMMENTS
          ql/src/java/org/apache/hadoop/hive/ql/orc/OutStream.java:136-140 There is an requirement that the codec's compress method will return false rather than take more space than the input. Given that, if the compressed is empty, we can't need the overflow.
          ql/src/java/org/apache/hadoop/hive/ql/orc/OrcInputFormat.java:149-151 I've removed it.
          ql/src/java/org/apache/hadoop/hive/ql/orc/OrcStruct.java:307 fixed.
          ql/src/java/org/apache/hadoop/hive/ql/orc/WriterImpl.java:561-562 I've added the size of the dictionary to the estimate of the memory size, which should be better.
          ql/src/java/org/apache/hadoop/hive/ql/orc/BitFieldReader.java:18 i managed to move the directory in the wrong place. fixed.

          REVISION DETAIL
          https://reviews.facebook.net/D8529

          To: JIRA, omalley
          Cc: kevinwilfong, njain

          Show
          Phabricator added a comment - omalley has commented on the revision " HIVE-3874 [jira] Create a new Optimized Row Columnar file format for Hive". Do you want to have a simple test for HIVE-4015 as part of this patch ? Since the test will fail until I change the grammer, I think it would be better to wait until they will pass. 1. Can you add more comments - specially, in the class/interface definitions Writer/TreeWriter/StreamFactory to name a few. Sure. 2. Can column statistics be made optional ? (can be a follow-up) They are very cheap in practice, but it wouldn't be hard to disable them. 3. This has a lot of new code - I mean, is it possible to use some of the constructs which are already there - for eg. RedBlackTrees, RLE etc. Can you use some existing implementations instead of writing these from scratch ? I'm a big fan of not writing new code when I can just use someone else's. That said, it is always a trade off when evaluating when to re-use versus write new that involves comparing the requirements to what the other code provides. I'm not aware of any open source java red-black trees that work on primitives without allocating multiple objects per an entry. Do you have a suggestion? The RLE is very specific to ORC format and it didn't seem like any of the implementations available were a good match. I'm also considering how to do a better delta and small integer encoding, but I'll do that in a follow up jira. Right now, the RLE is fixed. Should it be pluggable ? I mean - we can have a different scheme to store deltas. I think that pluggable will create compatibility problems, since you won't be able to read an ORC file that was written by a different plugin. My preferred direction is to use the ColumnEncoding to allow the Writer to pick a different encoding based on the observed data. For example, by looking at the first 100,000 values the writer should be able to decide if a dictionary or direct encoding is better. We could use the same mechanism to add additional encodings. INLINE COMMENTS ql/src/java/org/apache/hadoop/hive/ql/orc/OutStream.java:136-140 There is an requirement that the codec's compress method will return false rather than take more space than the input. Given that, if the compressed is empty, we can't need the overflow. ql/src/java/org/apache/hadoop/hive/ql/orc/OrcInputFormat.java:149-151 I've removed it. ql/src/java/org/apache/hadoop/hive/ql/orc/OrcStruct.java:307 fixed. ql/src/java/org/apache/hadoop/hive/ql/orc/WriterImpl.java:561-562 I've added the size of the dictionary to the estimate of the memory size, which should be better. ql/src/java/org/apache/hadoop/hive/ql/orc/BitFieldReader.java:18 i managed to move the directory in the wrong place. fixed. REVISION DETAIL https://reviews.facebook.net/D8529 To: JIRA, omalley Cc: kevinwilfong, njain
          Hide
          Phabricator added a comment -

          omalley requested code review of "HIVE-3874 [jira] Create a new Optimized Row Columnar file format for Hive".

          Reviewers: JIRA

          improve some of the comments on WriterImpl

          There are several limitations of the current RC File format that I'd like to address by creating a new format:

          each column value is stored as a binary blob, which means:

          the entire column value must be read, decompressed, and deserialized
          the file format can't use smarter type-specific compression
          push down filters can't be evaluated

          the start of each row group needs to be found by scanning
          user metadata can only be added to the file when the file is created
          the file doesn't store the number of rows per a file or row group
          there is no mechanism for seeking to a particular row number, which is required for external indexes.
          there is no mechanism for storing light weight indexes within the file to enable push-down filters to skip entire row groups.
          the type of the rows aren't stored in the file

          TEST PLAN
          EMPTY

          REVISION DETAIL
          https://reviews.facebook.net/D8871

          AFFECTED FILES
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java

          MANAGE HERALD RULES
          https://reviews.facebook.net/herald/view/differential/

          WHY DID I GET THIS EMAIL?
          https://reviews.facebook.net/herald/transcript/21597/

          To: JIRA, omalley

          Show
          Phabricator added a comment - omalley requested code review of " HIVE-3874 [jira] Create a new Optimized Row Columnar file format for Hive". Reviewers: JIRA improve some of the comments on WriterImpl There are several limitations of the current RC File format that I'd like to address by creating a new format: each column value is stored as a binary blob, which means: the entire column value must be read, decompressed, and deserialized the file format can't use smarter type-specific compression push down filters can't be evaluated the start of each row group needs to be found by scanning user metadata can only be added to the file when the file is created the file doesn't store the number of rows per a file or row group there is no mechanism for seeking to a particular row number, which is required for external indexes. there is no mechanism for storing light weight indexes within the file to enable push-down filters to skip entire row groups. the type of the rows aren't stored in the file TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D8871 AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java MANAGE HERALD RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/21597/ To: JIRA, omalley
          Hide
          Phabricator added a comment -

          omalley has abandoned the revision "HIVE-3874 [jira] Create a new Optimized Row Columnar file format for Hive".

          REVISION DETAIL
          https://reviews.facebook.net/D8871

          To: JIRA, omalley

          Show
          Phabricator added a comment - omalley has abandoned the revision " HIVE-3874 [jira] Create a new Optimized Row Columnar file format for Hive". REVISION DETAIL https://reviews.facebook.net/D8871 To: JIRA, omalley
          Hide
          Phabricator added a comment -

          omalley updated the revision "HIVE-3874 [jira] Create a new Optimized Row Columnar file format for Hive".

          • started updating comments
          • more style changes
          • fix compilation
          • fix unit tests
          • fix more unit tests
          • added more comments

          Reviewers: JIRA

          REVISION DETAIL
          https://reviews.facebook.net/D8529

          CHANGE SINCE LAST DIFF
          https://reviews.facebook.net/D8529?vs=28305&id=28617#toc

          AFFECTED FILES
          build.properties
          build.xml
          ivy/libraries.properties
          ql/build.xml
          ql/ivy.xml
          ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io/orc/OrcProto.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/BitFieldReader.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/BitFieldWriter.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/BooleanColumnStatistics.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/ColumnStatistics.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/ColumnStatisticsImpl.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/CompressionCodec.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/CompressionKind.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/DoubleColumnStatistics.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/DynamicByteArray.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/DynamicIntArray.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/FileDump.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/InStream.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/IntegerColumnStatistics.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSerde.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcStruct.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcUnion.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/OutStream.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/PositionProvider.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/PositionRecorder.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/PositionedOutputStream.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/Reader.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReader.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/RedBlackTree.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthByteReader.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthByteWriter.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerReader.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerWriter.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/SerializationUtils.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/SnappyCodec.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/StreamName.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringColumnStatistics.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringRedBlackTree.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/StripeInformation.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/Writer.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/ZlibCodec.java
          ql/src/java/org/apache/hadoop/hive/ql/io/orc/package-info.java
          ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto
          ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestBitFieldReader.java
          ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestDynamicArray.java
          ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestFileDump.java
          ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInStream.java
          ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java
          ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java
          ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcStruct.java
          ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestRunLengthByteReader.java
          ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestRunLengthIntegerReader.java
          ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestSerializationUtils.java
          ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestStreamName.java
          ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestStringRedBlackTree.java
          ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestZlib.java
          ql/src/test/resources/orc-file-dump.out

          To: JIRA, omalley
          Cc: kevinwilfong, njain

          Show
          Phabricator added a comment - omalley updated the revision " HIVE-3874 [jira] Create a new Optimized Row Columnar file format for Hive". started updating comments more style changes fix compilation fix unit tests fix more unit tests added more comments Reviewers: JIRA REVISION DETAIL https://reviews.facebook.net/D8529 CHANGE SINCE LAST DIFF https://reviews.facebook.net/D8529?vs=28305&id=28617#toc AFFECTED FILES build.properties build.xml ivy/libraries.properties ql/build.xml ql/ivy.xml ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io/orc/OrcProto.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/BitFieldReader.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/BitFieldWriter.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/BooleanColumnStatistics.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/ColumnStatistics.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/ColumnStatisticsImpl.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/CompressionCodec.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/CompressionKind.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/DoubleColumnStatistics.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/DynamicByteArray.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/DynamicIntArray.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/FileDump.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/InStream.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/IntegerColumnStatistics.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSerde.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcStruct.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcUnion.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/OutStream.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/PositionProvider.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/PositionRecorder.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/PositionedOutputStream.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/Reader.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReader.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/RedBlackTree.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthByteReader.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthByteWriter.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerReader.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerWriter.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/SerializationUtils.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/SnappyCodec.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/StreamName.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringColumnStatistics.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringRedBlackTree.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/StripeInformation.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/Writer.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/ZlibCodec.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/package-info.java ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestBitFieldReader.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestDynamicArray.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestFileDump.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInStream.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcStruct.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestRunLengthByteReader.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestRunLengthIntegerReader.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestSerializationUtils.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestStreamName.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestStringRedBlackTree.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestZlib.java ql/src/test/resources/orc-file-dump.out To: JIRA, omalley Cc: kevinwilfong, njain
          Hide
          Owen O'Malley added a comment -

          Ok, I added some additional comments in the Writer as Namit asked and all of the unit tests cases pass.

          Show
          Owen O'Malley added a comment - Ok, I added some additional comments in the Writer as Namit asked and all of the unit tests cases pass.
          Hide
          Kevin Wilfong added a comment -

          +1

          Thanks Owen, I think this is ready.

          Show
          Kevin Wilfong added a comment - +1 Thanks Owen, I think this is ready.
          Hide
          Owen O'Malley added a comment -

          I'm actually tracking down a bug that Gunther found with a query. Let me finish track it down.

          Show
          Owen O'Malley added a comment - I'm actually tracking down a bug that Gunther found with a query. Let me finish track it down.
          Hide
          Kevin Wilfong added a comment -

          K, let me know when it's ready for review again.

          Show
          Kevin Wilfong added a comment - K, let me know when it's ready for review again.
          Hide
          Pamela Vagata added a comment -

          Owen, would you mind filing the bug as a separate JIRA and committing the code as is? We are currently hunting down issues and putting together fixes

          Show
          Pamela Vagata added a comment - Owen, would you mind filing the bug as a separate JIRA and committing the code as is? We are currently hunting down issues and putting together fixes
          Hide
          Owen O'Malley added a comment -

          Pamela,
          Yeah, that probably makes sense. I'll file the follow up jiras.

          Show
          Owen O'Malley added a comment - Pamela, Yeah, that probably makes sense. I'll file the follow up jiras.
          Hide
          Kevin Wilfong added a comment -

          Thanks Pam and Owen.

          +1 again

          Show
          Kevin Wilfong added a comment - Thanks Pam and Owen. +1 again
          Hide
          Gunther Hagleitner added a comment -

          Looks good to me. +1 (non-committer)

          Show
          Gunther Hagleitner added a comment - Looks good to me. +1 (non-committer)
          Hide
          Kevin Wilfong added a comment -

          Committed, thanks Owen!

          Show
          Kevin Wilfong added a comment - Committed, thanks Owen!
          Hide
          Hudson added a comment -

          Integrated in Hive-trunk-h0.21 #2002 (See https://builds.apache.org/job/Hive-trunk-h0.21/2002/)
          HIVE-3874. Create a new Optimized Row Columnar file format for Hive. (Owen O'Malley via kevinwilfong) (Revision 1452992)

          Result = SUCCESS
          kevinwilfong : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1452992
          Files :

          • /hive/trunk/build.properties
          • /hive/trunk/build.xml
          • /hive/trunk/ivy/libraries.properties
          • /hive/trunk/ql/build.xml
          • /hive/trunk/ql/ivy.xml
          • /hive/trunk/ql/src/gen/protobuf
          • /hive/trunk/ql/src/gen/protobuf/gen-java
          • /hive/trunk/ql/src/gen/protobuf/gen-java/org
          • /hive/trunk/ql/src/gen/protobuf/gen-java/org/apache
          • /hive/trunk/ql/src/gen/protobuf/gen-java/org/apache/hadoop
          • /hive/trunk/ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive
          • /hive/trunk/ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql
          • /hive/trunk/ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io
          • /hive/trunk/ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io/orc
          • /hive/trunk/ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io/orc/OrcProto.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/BitFieldReader.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/BitFieldWriter.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/BooleanColumnStatistics.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/ColumnStatistics.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/ColumnStatisticsImpl.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/CompressionCodec.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/CompressionKind.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/DoubleColumnStatistics.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/DynamicByteArray.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/DynamicIntArray.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/FileDump.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/InStream.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/IntegerColumnStatistics.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSerde.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcStruct.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcUnion.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OutStream.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/PositionProvider.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/PositionRecorder.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/PositionedOutputStream.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/Reader.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReader.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RedBlackTree.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthByteReader.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthByteWriter.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerReader.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerWriter.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/SerializationUtils.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/SnappyCodec.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/StreamName.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringColumnStatistics.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringRedBlackTree.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/StripeInformation.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/Writer.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/ZlibCodec.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/package-info.java
          • /hive/trunk/ql/src/protobuf
          • /hive/trunk/ql/src/protobuf/org
          • /hive/trunk/ql/src/protobuf/org/apache
          • /hive/trunk/ql/src/protobuf/org/apache/hadoop
          • /hive/trunk/ql/src/protobuf/org/apache/hadoop/hive
          • /hive/trunk/ql/src/protobuf/org/apache/hadoop/hive/ql
          • /hive/trunk/ql/src/protobuf/org/apache/hadoop/hive/ql/io
          • /hive/trunk/ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc
          • /hive/trunk/ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto
          • /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc
          • /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestBitFieldReader.java
          • /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestDynamicArray.java
          • /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestFileDump.java
          • /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInStream.java
          • /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java
          • /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java
          • /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcStruct.java
          • /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestRunLengthByteReader.java
          • /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestRunLengthIntegerReader.java
          • /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestSerializationUtils.java
          • /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestStreamName.java
          • /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestStringRedBlackTree.java
          • /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestZlib.java
          • /hive/trunk/ql/src/test/resources/orc-file-dump.out
          Show
          Hudson added a comment - Integrated in Hive-trunk-h0.21 #2002 (See https://builds.apache.org/job/Hive-trunk-h0.21/2002/ ) HIVE-3874 . Create a new Optimized Row Columnar file format for Hive. (Owen O'Malley via kevinwilfong) (Revision 1452992) Result = SUCCESS kevinwilfong : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1452992 Files : /hive/trunk/build.properties /hive/trunk/build.xml /hive/trunk/ivy/libraries.properties /hive/trunk/ql/build.xml /hive/trunk/ql/ivy.xml /hive/trunk/ql/src/gen/protobuf /hive/trunk/ql/src/gen/protobuf/gen-java /hive/trunk/ql/src/gen/protobuf/gen-java/org /hive/trunk/ql/src/gen/protobuf/gen-java/org/apache /hive/trunk/ql/src/gen/protobuf/gen-java/org/apache/hadoop /hive/trunk/ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive /hive/trunk/ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql /hive/trunk/ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io /hive/trunk/ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io/orc /hive/trunk/ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io/orc/OrcProto.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/BitFieldReader.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/BitFieldWriter.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/BooleanColumnStatistics.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/ColumnStatistics.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/ColumnStatisticsImpl.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/CompressionCodec.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/CompressionKind.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/DoubleColumnStatistics.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/DynamicByteArray.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/DynamicIntArray.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/FileDump.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/InStream.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/IntegerColumnStatistics.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSerde.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcStruct.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcUnion.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OutStream.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/PositionProvider.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/PositionRecorder.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/PositionedOutputStream.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/Reader.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReader.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RedBlackTree.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthByteReader.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthByteWriter.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerReader.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerWriter.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/SerializationUtils.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/SnappyCodec.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/StreamName.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringColumnStatistics.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringRedBlackTree.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/StripeInformation.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/Writer.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/ZlibCodec.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/package-info.java /hive/trunk/ql/src/protobuf /hive/trunk/ql/src/protobuf/org /hive/trunk/ql/src/protobuf/org/apache /hive/trunk/ql/src/protobuf/org/apache/hadoop /hive/trunk/ql/src/protobuf/org/apache/hadoop/hive /hive/trunk/ql/src/protobuf/org/apache/hadoop/hive/ql /hive/trunk/ql/src/protobuf/org/apache/hadoop/hive/ql/io /hive/trunk/ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc /hive/trunk/ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestBitFieldReader.java /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestDynamicArray.java /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestFileDump.java /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInStream.java /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcStruct.java /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestRunLengthByteReader.java /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestRunLengthIntegerReader.java /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestSerializationUtils.java /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestStreamName.java /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestStringRedBlackTree.java /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestZlib.java /hive/trunk/ql/src/test/resources/orc-file-dump.out
          Hide
          Hudson added a comment -

          Integrated in Hive-trunk-hadoop2 #138 (See https://builds.apache.org/job/Hive-trunk-hadoop2/138/)
          HIVE-3874. Create a new Optimized Row Columnar file format for Hive. (Owen O'Malley via kevinwilfong) (Revision 1452992)

          Result = FAILURE
          kevinwilfong : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1452992
          Files :

          • /hive/trunk/build.properties
          • /hive/trunk/build.xml
          • /hive/trunk/ivy/libraries.properties
          • /hive/trunk/ql/build.xml
          • /hive/trunk/ql/ivy.xml
          • /hive/trunk/ql/src/gen/protobuf
          • /hive/trunk/ql/src/gen/protobuf/gen-java
          • /hive/trunk/ql/src/gen/protobuf/gen-java/org
          • /hive/trunk/ql/src/gen/protobuf/gen-java/org/apache
          • /hive/trunk/ql/src/gen/protobuf/gen-java/org/apache/hadoop
          • /hive/trunk/ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive
          • /hive/trunk/ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql
          • /hive/trunk/ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io
          • /hive/trunk/ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io/orc
          • /hive/trunk/ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io/orc/OrcProto.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/BitFieldReader.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/BitFieldWriter.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/BooleanColumnStatistics.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/ColumnStatistics.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/ColumnStatisticsImpl.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/CompressionCodec.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/CompressionKind.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/DoubleColumnStatistics.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/DynamicByteArray.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/DynamicIntArray.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/FileDump.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/InStream.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/IntegerColumnStatistics.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSerde.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcStruct.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcUnion.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OutStream.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/PositionProvider.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/PositionRecorder.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/PositionedOutputStream.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/Reader.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReader.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RedBlackTree.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthByteReader.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthByteWriter.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerReader.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerWriter.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/SerializationUtils.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/SnappyCodec.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/StreamName.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringColumnStatistics.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringRedBlackTree.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/StripeInformation.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/Writer.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/ZlibCodec.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/package-info.java
          • /hive/trunk/ql/src/protobuf
          • /hive/trunk/ql/src/protobuf/org
          • /hive/trunk/ql/src/protobuf/org/apache
          • /hive/trunk/ql/src/protobuf/org/apache/hadoop
          • /hive/trunk/ql/src/protobuf/org/apache/hadoop/hive
          • /hive/trunk/ql/src/protobuf/org/apache/hadoop/hive/ql
          • /hive/trunk/ql/src/protobuf/org/apache/hadoop/hive/ql/io
          • /hive/trunk/ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc
          • /hive/trunk/ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto
          • /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc
          • /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestBitFieldReader.java
          • /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestDynamicArray.java
          • /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestFileDump.java
          • /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInStream.java
          • /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java
          • /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java
          • /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcStruct.java
          • /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestRunLengthByteReader.java
          • /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestRunLengthIntegerReader.java
          • /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestSerializationUtils.java
          • /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestStreamName.java
          • /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestStringRedBlackTree.java
          • /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestZlib.java
          • /hive/trunk/ql/src/test/resources/orc-file-dump.out
          Show
          Hudson added a comment - Integrated in Hive-trunk-hadoop2 #138 (See https://builds.apache.org/job/Hive-trunk-hadoop2/138/ ) HIVE-3874 . Create a new Optimized Row Columnar file format for Hive. (Owen O'Malley via kevinwilfong) (Revision 1452992) Result = FAILURE kevinwilfong : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1452992 Files : /hive/trunk/build.properties /hive/trunk/build.xml /hive/trunk/ivy/libraries.properties /hive/trunk/ql/build.xml /hive/trunk/ql/ivy.xml /hive/trunk/ql/src/gen/protobuf /hive/trunk/ql/src/gen/protobuf/gen-java /hive/trunk/ql/src/gen/protobuf/gen-java/org /hive/trunk/ql/src/gen/protobuf/gen-java/org/apache /hive/trunk/ql/src/gen/protobuf/gen-java/org/apache/hadoop /hive/trunk/ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive /hive/trunk/ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql /hive/trunk/ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io /hive/trunk/ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io/orc /hive/trunk/ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io/orc/OrcProto.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/BitFieldReader.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/BitFieldWriter.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/BooleanColumnStatistics.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/ColumnStatistics.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/ColumnStatisticsImpl.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/CompressionCodec.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/CompressionKind.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/DoubleColumnStatistics.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/DynamicByteArray.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/DynamicIntArray.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/FileDump.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/InStream.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/IntegerColumnStatistics.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSerde.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcStruct.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcUnion.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OutStream.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/PositionProvider.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/PositionRecorder.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/PositionedOutputStream.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/Reader.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReader.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RedBlackTree.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthByteReader.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthByteWriter.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerReader.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerWriter.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/SerializationUtils.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/SnappyCodec.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/StreamName.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringColumnStatistics.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringRedBlackTree.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/StripeInformation.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/Writer.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/ZlibCodec.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/package-info.java /hive/trunk/ql/src/protobuf /hive/trunk/ql/src/protobuf/org /hive/trunk/ql/src/protobuf/org/apache /hive/trunk/ql/src/protobuf/org/apache/hadoop /hive/trunk/ql/src/protobuf/org/apache/hadoop/hive /hive/trunk/ql/src/protobuf/org/apache/hadoop/hive/ql /hive/trunk/ql/src/protobuf/org/apache/hadoop/hive/ql/io /hive/trunk/ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc /hive/trunk/ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestBitFieldReader.java /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestDynamicArray.java /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestFileDump.java /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInStream.java /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcStruct.java /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestRunLengthByteReader.java /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestRunLengthIntegerReader.java /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestSerializationUtils.java /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestStreamName.java /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestStringRedBlackTree.java /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestZlib.java /hive/trunk/ql/src/test/resources/orc-file-dump.out

            People

            • Assignee:
              Owen O'Malley
              Reporter:
              Owen O'Malley
            • Votes:
              6 Vote for this issue
              Watchers:
              60 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development