Kafka
  1. Kafka
  2. KAFKA-281

support multiple root log directories

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: 0.8.0
    • Component/s: core
    • Labels:
      None

      Description

      Currently, the log layout is

      {log.dir}/topicname-partitionid and one can only specify 1 {log.dir}

      . This limits the # of topics we can have per broker. We can potentially support multiple directories for

      {log.dir}

      and just assign topics using hashing or round-robin.

        Issue Links

          Activity

          Hide
          Taylor Gautier added a comment -

          I would recommend not using round-robin as that would lead to having to have some meta-data that keeps track of what directory goes where. Hashing is easy, but the downside is that it's not trivially discoverable if a person is using a command line shell to browse the directory structure.

          Show
          Taylor Gautier added a comment - I would recommend not using round-robin as that would lead to having to have some meta-data that keeps track of what directory goes where. Hashing is easy, but the downside is that it's not trivially discoverable if a person is using a command line shell to browse the directory structure.
          Hide
          Jay Kreps added a comment -

          Is this to work around the max subdirectory limits some filesystems have (e.g. I think ext4 has a limit of 64k subdirectories per directory)?

          The other advantage of this is that you can actually get rid of RAID and just run with JBOD using a separate mount point for each drive and having a data directory per drive (a la Hadoop). We wouldn't do this now, but if we had replication this would be a big win. The overhead of RAID is usually like a 20-30% perf hit, plus the additional disk space it takes up. In this setup you would be depending on replication for disk failures. The trade-off is that a single drive failure would kill a machine. In practice due to raid resync perf hit we seem to have this problem already.

          Show
          Jay Kreps added a comment - Is this to work around the max subdirectory limits some filesystems have (e.g. I think ext4 has a limit of 64k subdirectories per directory)? The other advantage of this is that you can actually get rid of RAID and just run with JBOD using a separate mount point for each drive and having a data directory per drive (a la Hadoop). We wouldn't do this now, but if we had replication this would be a big win. The overhead of RAID is usually like a 20-30% perf hit, plus the additional disk space it takes up. In this setup you would be depending on replication for disk failures. The trade-off is that a single drive failure would kill a machine. In practice due to raid resync perf hit we seem to have this problem already.
          Hide
          Taylor Gautier added a comment -

          Yes. There's not just a hard limit - there is a practical limit. We've found that EXT3 that limit is around 20k. The limit has to do with some of the low level posix apis and how they are implemented, I saw a post some time ago about how to make this better, but for the time being it's generally inefficient in most filesystems to have large numbers of files/directories in a single directory.

          Also, as you point out, it makes it next to impossible to easily add additional storage since there is only basically one mount point.

          Show
          Taylor Gautier added a comment - Yes. There's not just a hard limit - there is a practical limit. We've found that EXT3 that limit is around 20k. The limit has to do with some of the low level posix apis and how they are implemented, I saw a post some time ago about how to make this better, but for the time being it's generally inefficient in most filesystems to have large numbers of files/directories in a single directory. Also, as you point out, it makes it next to impossible to easily add additional storage since there is only basically one mount point.
          Hide
          Maxime Brugidou added a comment -

          I guess this was done on 0.8 as part of KAFKA-188

          Show
          Maxime Brugidou added a comment - I guess this was done on 0.8 as part of KAFKA-188
          Hide
          Jun Rao added a comment -

          fixed in KAFKA-188

          Show
          Jun Rao added a comment - fixed in KAFKA-188

            People

            • Assignee:
              Unassigned
              Reporter:
              Jun Rao
            • Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development