Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-8555

Random read support on HDFS files using Indexed Namenode feature

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • 2.5.2
    • None
    • hdfs-client, namenode
    • None
    • Linux

    • randomreads

    Description

      Currently Namenode does not provide support to do random reads. With so many tools built on top of HDFS solving the use case of Exploratory BI and providing SQL over HDFS. The need of hour is to reduce the number of blocks read for a Random read.
      E.g. extracting say 10 lines worth of information out of 100GB files should be reading only those block which can potentially have those 10 lines.
      This can be achieved by adding a tagging feature per block in name node, each block written to HDFS will have tags associated to it stored in index.
      Namednode when access via the Indexing feature will use this index native to reduce the no. of block returned to the client.

      Attachments

        Activity

          People

            Saan Afzal Saan
            rocksolid amit sehgal
            Votes:
            2 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - 720h
                720h
                Remaining:
                Remaining Estimate - 720h
                720h
                Logged:
                Time Spent - Not Specified
                Not Specified