Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-8998

Small files storage supported inside HDFS

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      HDFS has problems on store small files, just like this blog said (http://blog.cloudera.com/blog/2009/02/the-small-files-problem).
      This blog also tell us some way how to store small file in HDFS, but they are not good way, seems HAR files and Sequence Files are better for read-only files.

      Current each HDFS block is only for one HDFS file, if too many small file there, many small blocks will be in DataNode, which will make DataNode heavy loading.
      This jira will show how to online merge small blocks to big one, and how to delete small file, and so on.

      Cerrentlly we have many open jira for improving HDFS scalability on NameNode, such as HDFS-7836, HDFS-8286 and so on.
      So small file meta (INode and BlocksMap) will also be in NameNode.

      Design document will be uploaded soon.

      Attachments

        1. HDFS-8998.design.001.pdf
          387 kB
          Yong Zhang

        Issue Links

          Activity

            People

              zhangyongxyz Yong Zhang
              zhangyongxyz Yong Zhang
              Votes:
              1 Vote for this issue
              Watchers:
              38 Start watching this issue

              Dates

                Created:
                Updated: