Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: HDFS-5698 (FSImage in protobuf)
    • Component/s: None
    • Labels:
      None

      Description

      As suggested by the documentation of Protocol Buffers, the protobuf specification of the fsimage should specify [packed=true] for all repeated fields.

      1. HDFS-5885.001.patch
        2 kB
        Haohui Mai
      2. HDFS-5885.000.patch
        2 kB
        Haohui Mai

        Activity

        Hide
        Jing Zhao added a comment -

        According to protobuf document: "For historical reasons, repeated fields of basic numeric types aren't encoded as efficiently as they could be. New code should use the special option [packed=true] to get a more efficient encoding."

        So I guess we only need to also add "[packed=true]" for basic numeric types like int64? Do we want to also add it to "repeated BlockProto blocks"? Todd Lipcon, could you please comment on this?

        Show
        Jing Zhao added a comment - According to protobuf document: "For historical reasons, repeated fields of basic numeric types aren't encoded as efficiently as they could be. New code should use the special option [packed=true] to get a more efficient encoding." So I guess we only need to also add " [packed=true] " for basic numeric types like int64? Do we want to also add it to "repeated BlockProto blocks"? Todd Lipcon , could you please comment on this?
        Hide
        Todd Lipcon added a comment -

        As far as I know, the packed=true attribute only applies for primitive fields (based on my reading of the docs and of the protobuf code).

        If we wanted to be extra compact, we could "shred" the BlockProtos into three separate packed lists of primitives, eg:

        // "Shredded" version of BlockProto, used as a more compact encoding for a list
        // of blocks.
        message BlockProtoList {
          repeated uint64 block_ids = 1 [packed = true];
          repeated uint64 gen_stamps = 2 [packed = true];
          repeated uint64 sizes = 3 [packed = true];
        }
        

        The gains here are a couple of bytes per block. Think it's worth it?

        Show
        Todd Lipcon added a comment - As far as I know, the packed=true attribute only applies for primitive fields (based on my reading of the docs and of the protobuf code). If we wanted to be extra compact, we could "shred" the BlockProtos into three separate packed lists of primitives, eg: // "Shredded" version of BlockProto, used as a more compact encoding for a list // of blocks. message BlockProtoList { repeated uint64 block_ids = 1 [packed = true ]; repeated uint64 gen_stamps = 2 [packed = true ]; repeated uint64 sizes = 3 [packed = true ]; } The gains here are a couple of bytes per block. Think it's worth it?
        Hide
        Haohui Mai added a comment -

        Based on the statistics on a FSImage in production, I estimate that this optimization can save around 10% of the space.

        My concern is that the information of blocks might be changed in the near future (e.g, by HDFS-2832). The shredded version makes it less intuitive.

        Therefore I think it might be better to revisit this after the merge. What do you think?

        Show
        Haohui Mai added a comment - Based on the statistics on a FSImage in production, I estimate that this optimization can save around 10% of the space. My concern is that the information of blocks might be changed in the near future (e.g, by HDFS-2832 ). The shredded version makes it less intuitive. Therefore I think it might be better to revisit this after the merge. What do you think?
        Hide
        Haohui Mai added a comment -

        The v1 patch only annotates the repeated primitive fields.

        Show
        Haohui Mai added a comment - The v1 patch only annotates the repeated primitive fields.
        Hide
        Jing Zhao added a comment -

        +1 for the 001 patch. I've committed this.

        Show
        Jing Zhao added a comment - +1 for the 001 patch. I've committed this.

          People

          • Assignee:
            Haohui Mai
            Reporter:
            Haohui Mai
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development