Uploaded image for project: 'Accumulo'
  1. Accumulo
  2. ACCUMULO-55

Accumulo Output Format can create numerous empty files

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.3.5-incubating
    • 1.4.0
    • client

    Description

      In conjuction with Accumulo-52, large amounts of empty files can cause problems. The short problem is when a reducer is empty, due to the partitioner used, the file for it will still be created. We do not want empty files lingering around, especially do not want them bulk imported. It should be as simple as either not creating the file until a write on it is attempted (more complex) or the file should be deleted at close time if there were no records written (simpler but more overhead due to file creation and deletion in the process).

      Due to the complexity of the patch, I do not think it should be applied before the 1.4 version. It should simply delete the file after closing it if there are no writes to the file.

      EDIT: As of 1.4 we now delete empty files on close() in the RecordWriter. I would like to implement a more robust version which does not create a file until the first write. I will do this for version 1.5 so as not to worry about breaking things.

      Attachments

        Activity

          People

            Unassigned Unassigned
            vines John Vines
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: