Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.19.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change, Reviewed
    • Release Note:
      Hide
      Added HDFS file access times. By default, access times will be precise to the most recent hour boundary. A configuration parameter dfs.access.time.precision (milliseconds) is used to control this precision. Setting a value of 0 will disable persisting access times for HDFS files.
      Show
      Added HDFS file access times. By default, access times will be precise to the most recent hour boundary. A configuration parameter dfs.access.time.precision (milliseconds) is used to control this precision. Setting a value of 0 will disable persisting access times for HDFS files.

      Description

      HDFS should support some type of statistics that allows an administrator to determine when a file was last accessed.

      Since HDFS does not have quotas yet, it is likely that users keep on accumulating files in their home directories without much regard to the amount of space they are occupying. This causes memory-related problems with the namenode.

      Access times are costly to maintain. AFS does not maintain access times. I thind DCE-DFS does maintain access times with a coarse granularity.

      One proposal for HDFS would be to implement something like an "access bit".
      1. This access-bit is set when a file is accessed. If the access bit is already set, then this call does not result in a transaction.
      2. A FileSystem.clearAccessBits() indicates that the access bits of all files need to be cleared.

      An administrator can effectively use the above mechanism (maybe a daily cron job) to determine files that are recently used.

      1. accessTime1.patch
        37 kB
        dhruba borthakur
      2. accessTime4.patch
        41 kB
        dhruba borthakur
      3. accessTime5.patch
        41 kB
        dhruba borthakur
      4. accessTime6.patch
        45 kB
        dhruba borthakur

        Issue Links

          Activity

          dhruba borthakur created issue -
          Hide
          Doug Cutting added a comment -

          Another approach might be to process the namenode logs (if they're kept) to find which files have been accessed when.

          Also, in HDFS, would access times really be that expensive? We have relatively few files and relatively many blocks. So increasing the data structure size of a file shouldn't be that costly. The larger expense might be logging each time a file is opened. How bad would that be? Perhaps we could make it optional?

          I'm just brainstorming...

          Show
          Doug Cutting added a comment - Another approach might be to process the namenode logs (if they're kept) to find which files have been accessed when. Also, in HDFS, would access times really be that expensive? We have relatively few files and relatively many blocks. So increasing the data structure size of a file shouldn't be that costly. The larger expense might be logging each time a file is opened. How bad would that be? Perhaps we could make it optional? I'm just brainstorming...
          Hide
          Allen Wittenauer added a comment -

          Currently on our bigger grids, we have a significant amount of files that we aren't sure whether anyone is actually using or not (e.g., /tmp). While I recognize that atime is a huge performance killer, whenever one deals with users who have free reign over their space, it is an incredibly important tool to help maintain the system.

          This is especially important given the lack of ACLs. On our larger grids, there are many files that are just kind of scattered all over that we have no real insight as to their purpose, much less their usage pattern. All we know is that we didn't put them there.

          Having an access log would at least tell us whether something is being used. Once users get added to the system, having the user information combined with whether a file was touched will be extremely handy.

          Operationally, I see this being used by dumping the data on a regular interval into an RDBMS or perhaps even inside HDFS itself. It is then fairly trivial to create tools and form policies around data retention.

          Show
          Allen Wittenauer added a comment - Currently on our bigger grids, we have a significant amount of files that we aren't sure whether anyone is actually using or not (e.g., /tmp). While I recognize that atime is a huge performance killer, whenever one deals with users who have free reign over their space, it is an incredibly important tool to help maintain the system. This is especially important given the lack of ACLs. On our larger grids, there are many files that are just kind of scattered all over that we have no real insight as to their purpose, much less their usage pattern. All we know is that we didn't put them there. Having an access log would at least tell us whether something is being used. Once users get added to the system, having the user information combined with whether a file was touched will be extremely handy. Operationally, I see this being used by dumping the data on a regular interval into an RDBMS or perhaps even inside HDFS itself. It is then fairly trivial to create tools and form policies around data retention.
          Hide
          dhruba borthakur added a comment -

          I am not too worried about data structure size. But if we have to write a transaction for every file access, that could be a performance killer, do you agree? My proposal was to somehow batch access times updates to the disk (using the access bit). Using the access-bit and a cron job that runs every hour could effectively provide a coarse-grain-access-time.

          Show
          dhruba borthakur added a comment - I am not too worried about data structure size. But if we have to write a transaction for every file access, that could be a performance killer, do you agree? My proposal was to somehow batch access times updates to the disk (using the access bit). Using the access-bit and a cron job that runs every hour could effectively provide a coarse-grain-access-time.
          Hide
          Doug Cutting added a comment -

          > if we have to write a transaction for every file access, that could be a performance killer, do you agree?

          I don't know. Logging file opens should be good enough, right? How much is transaction logging a bottleneck currently? How much worse would this make it? If files average ten or more blocks, and we're reading files not much more often than we're writing them, then the impact might be small.

          Another option to consider is making this a separate log that's buffered, since its data is not as critical to filesystem function. We could flush the buffer every minute or so, so that when the namenode crashes we'd lose only the last minute of access time updates. Might that be acceptable?

          Show
          Doug Cutting added a comment - > if we have to write a transaction for every file access, that could be a performance killer, do you agree? I don't know. Logging file opens should be good enough, right? How much is transaction logging a bottleneck currently? How much worse would this make it? If files average ten or more blocks, and we're reading files not much more often than we're writing them, then the impact might be small. Another option to consider is making this a separate log that's buffered, since its data is not as critical to filesystem function. We could flush the buffer every minute or so, so that when the namenode crashes we'd lose only the last minute of access time updates. Might that be acceptable?
          Hide
          Sameer Paranjpye added a comment -

          > Another option to consider is making this a separate log that's buffered

          This is a pretty good option. It should cost almost nothing to write the access to a buffered log. I suspect that for the use case Allen describes it would be sufficient to discover the outliers i.e. files and directories that haven't been accessed in months.

          Show
          Sameer Paranjpye added a comment - > Another option to consider is making this a separate log that's buffered This is a pretty good option. It should cost almost nothing to write the access to a buffered log. I suspect that for the use case Allen describes it would be sufficient to discover the outliers i.e. files and directories that haven't been accessed in months.
          Hide
          dhruba borthakur added a comment -

          I agree that we can record the timestamp when an "open" occurred in namenode memory, mark the inode as dirty and then allow a async flush daemon to persist these dirty inodes to disk once every period. That should be optimal and should be light-weight.

          Show
          dhruba borthakur added a comment - I agree that we can record the timestamp when an "open" occurred in namenode memory, mark the inode as dirty and then allow a async flush daemon to persist these dirty inodes to disk once every period. That should be optimal and should be light-weight.
          Hide
          Enis Soztutar added a comment -

          Working on some job, i have realized that nearly all the MR jobs somehow create temporary directories/files. Due to various reasons(program bug, program crash, etc. ), these may not be deleted properly. We may add a FileSystem#createTempFIle() and attach it with a timestamp. And after some period, these can deleted by batch tasks.

          Show
          Enis Soztutar added a comment - Working on some job, i have realized that nearly all the MR jobs somehow create temporary directories/files. Due to various reasons(program bug, program crash, etc. ), these may not be deleted properly. We may add a FileSystem#createTempFIle() and attach it with a timestamp. And after some period, these can deleted by batch tasks.
          Hide
          Allen Wittenauer added a comment -

          Now that we have permissions and ownerships, such an access log would also need to differentiate between reads, writes, ownership changes, and permission changes. This would be extremely helpful in case forensics need to be performed to track down someone doing Bad Things(tm).

          Show
          Allen Wittenauer added a comment - Now that we have permissions and ownerships, such an access log would also need to differentiate between reads, writes, ownership changes, and permission changes. This would be extremely helpful in case forensics need to be performed to track down someone doing Bad Things(tm).
          Robert Chansler made changes -
          Field Original Value New Value
          Link This issue relates to HADOOP-3209 [ HADOOP-3209 ]
          Chris Douglas made changes -
          Link This issue is related to HADOOP-3336 [ HADOOP-3336 ]
          Hide
          dhruba borthakur added a comment -

          I plan on doing the following:

          1. Add a 4 byte field to the in-memory inode to maintain access time.
          2. Create a new transaction OP_SET_ACCESSTIME to the edits log.
          3. Every access to read data/metadata for a file will do the following:
          – update the in-memory access time of the inode.
          – write a OP_SET_ACCESSTIME entry for this inode to the edits log buffer, do not sync or flush buffer
          4. Enhance the dfs shell/webUI to display access times of files/directories

          This should not adversely impact the transaction processing rate of the namenode. Other types of transactions (e.g. file creation) will anyway cause the transaction-log-buffer to get synced to disk pretty quickly. This implementation will not distinguish between different kind of metadata accesses and is primarily targeted to weed out files that are not used for a long long time.

          Show
          dhruba borthakur added a comment - I plan on doing the following: 1. Add a 4 byte field to the in-memory inode to maintain access time. 2. Create a new transaction OP_SET_ACCESSTIME to the edits log. 3. Every access to read data/metadata for a file will do the following: – update the in-memory access time of the inode. – write a OP_SET_ACCESSTIME entry for this inode to the edits log buffer, do not sync or flush buffer 4. Enhance the dfs shell/webUI to display access times of files/directories This should not adversely impact the transaction processing rate of the namenode. Other types of transactions (e.g. file creation) will anyway cause the transaction-log-buffer to get synced to disk pretty quickly. This implementation will not distinguish between different kind of metadata accesses and is primarily targeted to weed out files that are not used for a long long time.
          dhruba borthakur made changes -
          Assignee dhruba borthakur [ dhruba ]
          Hide
          Raghu Angadi added a comment -

          Most tests seem to indicate amount of data synced matters as well (along with number of syncs). I will be surprised if a benchmark that tests mixed load (say 10% writes, 90% reads) is not impacted.

          Show
          Raghu Angadi added a comment - Most tests seem to indicate amount of data synced matters as well (along with number of syncs). I will be surprised if a benchmark that tests mixed load (say 10% writes, 90% reads) is not impacted.
          Hide
          Joydeep Sen Sarma added a comment -

          +1.

          this looks good enough. i am wondering if u can also log the username somewhere (either the edit log or the namenode log). we are potentially interested in userid's reading specific parts of the file system (and then maintaining last reader information at the directory level). so this may allow us to tail the appropriate log and pick such information up.

          Show
          Joydeep Sen Sarma added a comment - +1. this looks good enough. i am wondering if u can also log the username somewhere (either the edit log or the namenode log). we are potentially interested in userid's reading specific parts of the file system (and then maintaining last reader information at the directory level). so this may allow us to tail the appropriate log and pick such information up.
          Hide
          Raghu Angadi added a comment -

          Doesn't HADOOP-3336 do this already in separate log file?

          Show
          Raghu Angadi added a comment - Doesn't HADOOP-3336 do this already in separate log file?
          Hide
          dhruba borthakur added a comment -

          I agree that HADOOP-3336 does this from an auditing perspective.

          I am interested in making some form of archival store in HDFS. Files that are not used for a long time can automatically be moved to slower and/or denser storage. Given the rate at which a cluster size increases, and given the fact that the cost to store data for infinitely long time is very low, it makes sense for the file system to make intelligent storage decisions based on how/when data was accessed. This argues for "access time" to be stored in the file system itself.

          HADOOP-3336 can be used to accomplish this to some extent... the separate log that it generates can be periodically merged with the file system image. But, I feel that design is a little awkward and not too elegant.

          Show
          dhruba borthakur added a comment - I agree that HADOOP-3336 does this from an auditing perspective. I am interested in making some form of archival store in HDFS. Files that are not used for a long time can automatically be moved to slower and/or denser storage. Given the rate at which a cluster size increases, and given the fact that the cost to store data for infinitely long time is very low, it makes sense for the file system to make intelligent storage decisions based on how/when data was accessed. This argues for "access time" to be stored in the file system itself. HADOOP-3336 can be used to accomplish this to some extent... the separate log that it generates can be periodically merged with the file system image. But, I feel that design is a little awkward and not too elegant.
          Hide
          Konstantin Shvachko added a comment -

          I think this proposal is in the right direction.
          According to HADOOP-3860 name-node can currently perform 20-25 times more opens per second than creates.
          Which means that if we let every open / getBlockLocation be logged and flushed we loose big.
          Another observation is that map-reduce does a lot of ls operations both for directories and individual files.
          I have seen 20,000 per second. This is done when the job starts and depends on the user input data and on how many tasks should the job be running.
          So may be we should not log file access for ls, permission checking, etc. I think it would be sufficient to write
          OP_SET_ACCESSTIME only in case of getBlockLocations().
          Also I think we should not support access time for directories only for regular files.

          Another alternative would be to keep the access time only in the name-node memory. Would that be sufficient enough to detect "malicious"
          behavior of some users? Name-nodes usually run for months, right? So before say upgrading the name-node or simply every (other) week
          administrators may look at files that have never been touched during that period and act accordingly.
          My main concern is that even though with Dhruba's approach we will batch access operations and would not loose time on flushing them
          directly the journaling traffic will double that is with each flush more bytes need to be flushed. Meaning increased latency for each flush,
          and bigger edits files.

          It would be good to have some experimental data measuring throughput and latency for getBlockLocation with and without ACCESSTIME
          transactions. The easy way to test would be to use NNThroughputBenchmark.

          Show
          Konstantin Shvachko added a comment - I think this proposal is in the right direction. According to HADOOP-3860 name-node can currently perform 20-25 times more opens per second than creates. Which means that if we let every open / getBlockLocation be logged and flushed we loose big. Another observation is that map-reduce does a lot of ls operations both for directories and individual files. I have seen 20,000 per second. This is done when the job starts and depends on the user input data and on how many tasks should the job be running. So may be we should not log file access for ls, permission checking, etc. I think it would be sufficient to write OP_SET_ACCESSTIME only in case of getBlockLocations(). Also I think we should not support access time for directories only for regular files. Another alternative would be to keep the access time only in the name-node memory. Would that be sufficient enough to detect "malicious" behavior of some users? Name-nodes usually run for months, right? So before say upgrading the name-node or simply every (other) week administrators may look at files that have never been touched during that period and act accordingly. My main concern is that even though with Dhruba's approach we will batch access operations and would not loose time on flushing them directly the journaling traffic will double that is with each flush more bytes need to be flushed. Meaning increased latency for each flush, and bigger edits files. It would be good to have some experimental data measuring throughput and latency for getBlockLocation with and without ACCESSTIME transactions. The easy way to test would be to use NNThroughputBenchmark.
          Hide
          dhruba borthakur added a comment -

          I agree with Konstantin on most counts. I will probably implement access times for files and directories for getblocklocations RPC only and then run NNThroughputBenchmark to determine its impact.

          Show
          dhruba borthakur added a comment - I agree with Konstantin on most counts. I will probably implement access times for files and directories for getblocklocations RPC only and then run NNThroughputBenchmark to determine its impact.
          Hide
          Konstantin Shvachko added a comment -

          Did you mean files only, directories don't have getBlockLocations()?

          Show
          Konstantin Shvachko added a comment - Did you mean files only, directories don't have getBlockLocations()?
          Hide
          dhruba borthakur added a comment -

          Yes, Konstantin, I meant "access time for files for the getblocklocations RPC only".

          Regarding Joydeep's requirement about recording the user-name of last access,, I agree with Raghu that it is more likely a case for HADOOP-3336.

          Show
          dhruba borthakur added a comment - Yes, Konstantin, I meant "access time for files for the getblocklocations RPC only". Regarding Joydeep's requirement about recording the user-name of last access,, I agree with Raghu that it is more likely a case for HADOOP-3336 .
          Hide
          Raghu Angadi added a comment -

          If and only if increase in EditsLog data noticeably affects performance (which I suspect it will) : Couple of options :

          • Provide an accuracy setting : For e.g. if an admin is interested only in accuracy of days, this setting could be 24 hours and an access to a file with 24hours will be recorded only once.
          • Maintain only the in-memory access times. Since NameNode is not restarted very often, admins could always run a script every week or so to get a snap shot of access times. When the NameNode restarts similar script could update the access times from prev snap-shot.
          Show
          Raghu Angadi added a comment - If and only if increase in EditsLog data noticeably affects performance (which I suspect it will) : Couple of options : Provide an accuracy setting : For e.g. if an admin is interested only in accuracy of days, this setting could be 24 hours and an access to a file with 24hours will be recorded only once. Maintain only the in-memory access times. Since NameNode is not restarted very often, admins could always run a script every week or so to get a snap shot of access times. When the NameNode restarts similar script could update the access times from prev snap-shot.
          Hide
          dhruba borthakur added a comment -

          Hi Raghu, the idea you propose "update access times olce every 24 hours" sounds good. However, how will the namenode remember which inodes are dirty and need to be flushed? It can keep a data structure to record all inodes that have "dirty" access times, but it needs memory. Another option would be to "walk" the entire tree looking for "dirty" inodes. Both approaches are not very appealing. Do you have any other options?

          Show
          dhruba borthakur added a comment - Hi Raghu, the idea you propose "update access times olce every 24 hours" sounds good. However, how will the namenode remember which inodes are dirty and need to be flushed? It can keep a data structure to record all inodes that have "dirty" access times, but it needs memory. Another option would be to "walk" the entire tree looking for "dirty" inodes. Both approaches are not very appealing. Do you have any other options?
          Hide
          Raghu Angadi added a comment -

          > However, how will the namenode remember which inodes are dirty and need to be flushed?

          It does not need any more structures than what you have proposed ("Add a 4 byte field to the in-memory inode to maintain access time."). So at each access, you check if this 4-byte field is older than "accuracy setting", then you add a EditLog entry. If you want to keep the in-memory accurately (which I don't think is required), then you need to add 4 bytes more to record "last logged time". Right?

          Show
          Raghu Angadi added a comment - > However, how will the namenode remember which inodes are dirty and need to be flushed? It does not need any more structures than what you have proposed ("Add a 4 byte field to the in-memory inode to maintain access time."). So at each access, you check if this 4-byte field is older than "accuracy setting", then you add a EditLog entry. If you want to keep the in-memory accurately (which I don't think is required), then you need to add 4 bytes more to record "last logged time". Right?
          Hide
          Allen Wittenauer added a comment -

          Until we get volumes or the equivalent, it would be good to have the accuracy setting take a path. For example, I might want to have a more accurate setting for data outside a user's home directory.

          Show
          Allen Wittenauer added a comment - Until we get volumes or the equivalent, it would be good to have the accuracy setting take a path. For example, I might want to have a more accurate setting for data outside a user's home directory.
          Hide
          Raghu Angadi added a comment -

          Just to clarify once again, the first option above does not require any more memory or tree traversals. It just reduces the number of entries to EditsLog. Its actually just a tweak.

          Show
          Raghu Angadi added a comment - Just to clarify once again, the first option above does not require any more memory or tree traversals. It just reduces the number of entries to EditsLog. Its actually just a tweak.
          Hide
          dhruba borthakur added a comment -

          Raghu, just to make sure I understand right: let's say the accuray setting in 24 hours. Now, suppose I read the contents of a file /tmp/foo now at 1PM. The in-memory inode is updated with the accesstime of 1PM. But it is not recorded in the transaction log. let's assue, that no other files are accessed in the file system for the entire next day.

          When it is 1 PM tomorrow, the system has to remember that /tmp/foo needs to be flushed. How does this occur? How does hdfs find out that the inode /tmp/foo is dirty and has to be flushed to the transaction log?

          Show
          dhruba borthakur added a comment - Raghu, just to make sure I understand right: let's say the accuray setting in 24 hours. Now, suppose I read the contents of a file /tmp/foo now at 1PM. The in-memory inode is updated with the accesstime of 1PM. But it is not recorded in the transaction log. let's assue, that no other files are accessed in the file system for the entire next day. When it is 1 PM tomorrow, the system has to remember that /tmp/foo needs to be flushed. How does this occur? How does hdfs find out that the inode /tmp/foo is dirty and has to be flushed to the transaction log?
          Hide
          Raghu Angadi added a comment -

          > The in-memory inode is updated with the accesstime of 1PM. But it is not recorded in the transaction log.

          It is recorded in the transaction log at this time (assuming it was not accessed in 24 hours prior to that).

          > When it is 1 PM tomorrow, the system has to remember that /tmp/foo needs to be flushed. How does this occur? [...]

          It does not need to remember, since the transaction was written at 1 PM previous day.

          I am trying to see if I am missing something here. Note that effect of not sync-ing the editslog file for each access is same as before.

          Show
          Raghu Angadi added a comment - > The in-memory inode is updated with the accesstime of 1PM. But it is not recorded in the transaction log. It is recorded in the transaction log at this time (assuming it was not accessed in 24 hours prior to that). > When it is 1 PM tomorrow, the system has to remember that /tmp/foo needs to be flushed. How does this occur? [...] It does not need to remember, since the transaction was written at 1 PM previous day. I am trying to see if I am missing something here. Note that effect of not sync-ing the editslog file for each access is same as before.
          Hide
          Raghu Angadi added a comment -

          IOW, a last access time of 't' returned by NameNode for file implies "this file was last accessed during [t, t+24h)".

          Show
          Raghu Angadi added a comment - IOW, a last access time of 't' returned by NameNode for file implies "this file was last accessed during [t, t+24h)".
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > IOW, a last access time of 't' returned by NameNode for file implies "this file was last accessed during [t, t+24h)".

          Basically, it is recording "access date" for the case above.

          BTW, is it expensive to invoke System.currentTimeMillis()? I don't have any idea.

          Show
          Tsz Wo Nicholas Sze added a comment - > IOW, a last access time of 't' returned by NameNode for file implies "this file was last accessed during [t, t+24h)". Basically, it is recording "access date" for the case above. BTW, is it expensive to invoke System.currentTimeMillis()? I don't have any idea.
          Hide
          dhruba borthakur added a comment -

          I got it Raghu. Thanks for the tip. I like your proposal. +1.

          Show
          dhruba borthakur added a comment - I got it Raghu. Thanks for the tip. I like your proposal. +1.
          Hide
          dhruba borthakur added a comment -

          This patch does the following:

          1. Implements access time for files. Directories do not have access times.
          2. The access times of files is precise upto an hour boundary. This means that all accesses to a file between 2Pm and 2:59PM will have an access time of 2PM.
          3. The transaction to write access time to edits is not synced. It is assumed that it will get synced as part of other transactions.
          4. There is a configuration parameter to switch off access-times. By default, it is on.
          5. There is a new FileSystem API setAccessTime() to set the access time on a file.
          6.

          Show
          dhruba borthakur added a comment - This patch does the following: 1. Implements access time for files. Directories do not have access times. 2. The access times of files is precise upto an hour boundary. This means that all accesses to a file between 2Pm and 2:59PM will have an access time of 2PM. 3. The transaction to write access time to edits is not synced. It is assumed that it will get synced as part of other transactions. 4. There is a configuration parameter to switch off access-times. By default, it is on. 5. There is a new FileSystem API setAccessTime() to set the access time on a file. 6.
          dhruba borthakur made changes -
          Attachment accessTime1.patch [ 12388720 ]
          Hide
          Raghu Angadi added a comment -

          > 2. The access times of files is precise upto an hour boundary
          Can this be made configurable? If not now, I am 100% certain it will be made configurable in near future. I don't see any advantage to not making this a config variable (but probably there is one). In fact we might have a feature request to make it runtime-configurable.. but not required in this jira.

          Show
          Raghu Angadi added a comment - > 2. The access times of files is precise upto an hour boundary Can this be made configurable? If not now, I am 100% certain it will be made configurable in near future. I don't see any advantage to not making this a config variable (but probably there is one). In fact we might have a feature request to make it runtime-configurable.. but not required in this jira.
          Hide
          Konstantin Shvachko added a comment -

          A few preliminary comments.

          1. I agree we should better make INode.accessTimePrecision a configuration parameter. In any case I did not expect it to be an extra field in INode (not even static!!).
          2. If we introduce hdfs.access.time.precision as a config parameter, then you do not need dfs.support.accessTime because you can either set hdfs.access.time.precision to a big (unreachable) number or pick a special value say "0" to indicate the atime should not be supported.
          3. FileSystem.setAccessTime() and related changes to client classes including ClientProtocol seem to be redundant. We already have such method, just call getBlockLocations().
          4. In FSDirectory.setAccessTime() I would modify atime in memory every time it's requested. This is free once you already have the INode. Only calling of logAccessTime() should be done once an hour.
          Show
          Konstantin Shvachko added a comment - A few preliminary comments. I agree we should better make INode.accessTimePrecision a configuration parameter. In any case I did not expect it to be an extra field in INode (not even static!!). If we introduce hdfs.access.time.precision as a config parameter, then you do not need dfs.support.accessTime because you can either set hdfs.access.time.precision to a big (unreachable) number or pick a special value say "0" to indicate the atime should not be supported. FileSystem.setAccessTime() and related changes to client classes including ClientProtocol seem to be redundant. We already have such method, just call getBlockLocations() . In FSDirectory.setAccessTime() I would modify atime in memory every time it's requested. This is free once you already have the INode. Only calling of logAccessTime() should be done once an hour.
          Hide
          dhruba borthakur added a comment -

          Incorporated most review comments. I do not update the in-memory access time every time. The in-memory access time is in sync with the value persisted on disk. Otherwise, the access time of a file could move back in time when a namenode restarts!

          I also ran benchmarks with NNThroughputBenchmark. All benchmarks remain at practically the same performance. In particular, the "open benchmark with 300 threads and 100K files" is as follows:

          patch trunk
          ----------------------------
          59916 59865 ops/sec
          59171 59191 ops/sec

          Show
          dhruba borthakur added a comment - Incorporated most review comments. I do not update the in-memory access time every time. The in-memory access time is in sync with the value persisted on disk. Otherwise, the access time of a file could move back in time when a namenode restarts! I also ran benchmarks with NNThroughputBenchmark. All benchmarks remain at practically the same performance. In particular, the "open benchmark with 300 threads and 100K files" is as follows: patch trunk ---------------------------- 59916 59865 ops/sec 59171 59191 ops/sec
          dhruba borthakur made changes -
          Attachment accessTime4.patch [ 12388838 ]
          dhruba borthakur made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12388838/accessTime4.patch
          against trunk revision 689230.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

          -1 javac. The applied patch generated 484 javac compiler warnings (more than the trunk's current 480 warnings).

          -1 findbugs. The patch appears to introduce 1 new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3112/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3112/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3112/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3112/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12388838/accessTime4.patch against trunk revision 689230. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. -1 javac. The applied patch generated 484 javac compiler warnings (more than the trunk's current 480 warnings). -1 findbugs. The patch appears to introduce 1 new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3112/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3112/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3112/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3112/console This message is automatically generated.
          Hide
          Konstantin Shvachko added a comment -
          1. Test data -1-2% looks good.
          2. FSConstants
            • accessTimePrecision is not a constant and is not used anywhere.
            • white space added.
          3. FileSystem
            • I strongly disagree with introduction of FileSystem.setAccessTime().
              File systems usually do not provide such an interface.
              Access time is changed to current when you access a file (call getBlockLocations() in our case), but the clients should not be able to set atime. It is not secure. Users can mistakenly set it to some time in the past or maliciously to a future value and screw the system up.
            • white space added.
          4. FSNamesystem. It is better to replace boolean member supportAccessTime by a method
            boolean isAccessTimeSupported() {
              return accessTimePrecision > 0;
            }
            
          5. INode. Please revert changes to this line
            +  /** * Get last modification time of inode.
            
          6. FSDirectory. The name-node will print the message below every time you access the file.
            NameNode.stateChangeLog.info("File " + src + 
            			" already has an precise access time."
            

            This is going to be a problem for log watchers, and will affect the performance.
            Is it just a debug message? I'd remove it.

          7. I thought that ls should print aTime. Did not find any references to FileStatus.getAccessTime() in shell files.
          Show
          Konstantin Shvachko added a comment - Test data -1-2% looks good. FSConstants accessTimePrecision is not a constant and is not used anywhere. white space added. FileSystem I strongly disagree with introduction of FileSystem.setAccessTime() . File systems usually do not provide such an interface. Access time is changed to current when you access a file (call getBlockLocations() in our case), but the clients should not be able to set atime. It is not secure. Users can mistakenly set it to some time in the past or maliciously to a future value and screw the system up. white space added. FSNamesystem. It is better to replace boolean member supportAccessTime by a method boolean isAccessTimeSupported() { return accessTimePrecision > 0; } INode. Please revert changes to this line + /** * Get last modification time of inode. FSDirectory. The name-node will print the message below every time you access the file. NameNode.stateChangeLog.info( "File " + src + " already has an precise access time." This is going to be a problem for log watchers, and will affect the performance. Is it just a debug message? I'd remove it. I thought that ls should print aTime. Did not find any references to FileStatus.getAccessTime() in shell files.
          Hide
          dhruba borthakur added a comment -

          Address review comments.

          Show
          dhruba borthakur added a comment - Address review comments.
          dhruba borthakur made changes -
          Resolution Fixed [ 1 ]
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hide
          dhruba borthakur added a comment -

          Incorporated all of Konstantin's review comments other than the one that said that setAccessTime() call should be removed.

          The setAccessTime() call is a utility method that allows an application to set the access time of a file without having to "open" the file. The permission-access-checks are precisely the same as that for "opening" a file, so there isn't any security concern IMO. Most file systems support setting access times/modification times on a file, see http://linux.die.net/man/2/utimes.

          I purposely did not add a command to the FsShell to display access times. An application can fetch the accessTime using a programmatic API FileSystem.getFileStatus().

          Show
          dhruba borthakur added a comment - Incorporated all of Konstantin's review comments other than the one that said that setAccessTime() call should be removed. The setAccessTime() call is a utility method that allows an application to set the access time of a file without having to "open" the file. The permission-access-checks are precisely the same as that for "opening" a file, so there isn't any security concern IMO. Most file systems support setting access times/modification times on a file, see http://linux.die.net/man/2/utimes . I purposely did not add a command to the FsShell to display access times. An application can fetch the accessTime using a programmatic API FileSystem.getFileStatus().
          dhruba borthakur made changes -
          Attachment accessTime5.patch [ 12389045 ]
          Hide
          dhruba borthakur added a comment -

          Mistakenly closed issue, re-opening.

          Show
          dhruba borthakur added a comment - Mistakenly closed issue, re-opening.
          dhruba borthakur made changes -
          Status Resolved [ 5 ] Reopened [ 4 ]
          Resolution Fixed [ 1 ]
          dhruba borthakur made changes -
          Fix Version/s 0.19.0 [ 12313211 ]
          Status Reopened [ 4 ] Patch Available [ 10002 ]
          Hide
          Konstantin Shvachko added a comment - - edited

          I don't understand. There must be some secret use case that you don't want to talk about or something. I have all those questions

          • Why do we need to be able to setAccessTime() to Au 31, 1991 or Jul 15, 2036?
          • Why do we need setAccessTime() but never needed setModificationTime()?
          • My understanding is that access time main use case is for ops to be able to recognize files that have not been used for the last 6 month and remove them on the bases they are old. So a user can loose files if he by mistake sets aTime to e.g. 1b.c. Or alternately a user can set aTime to files 1 year in advance and that will keep ops from removing them for the next 1.5 years.
          • More. Local file system, KFS, S3 all will not support setAccessTime(), but HDFS will. Is it right to make it a generic FileSystem interface?
          • Pointing to utime() is the same as pointing to FSNamesystem.unprotectedSetAccessTime(). I still cannot change aTime or mTime using bash.

          I guess I am saying I am ok with a touchAC() method (as in touch -ac), but it is already there, called getBlockLocations(), and I don't see why you need more.
          The rest looks great.

          Show
          Konstantin Shvachko added a comment - - edited I don't understand. There must be some secret use case that you don't want to talk about or something. I have all those questions Why do we need to be able to setAccessTime() to Au 31, 1991 or Jul 15, 2036? Why do we need setAccessTime() but never needed setModificationTime()? My understanding is that access time main use case is for ops to be able to recognize files that have not been used for the last 6 month and remove them on the bases they are old. So a user can loose files if he by mistake sets aTime to e.g. 1b.c. Or alternately a user can set aTime to files 1 year in advance and that will keep ops from removing them for the next 1.5 years. More. Local file system, KFS, S3 all will not support setAccessTime(), but HDFS will. Is it right to make it a generic FileSystem interface? Pointing to utime() is the same as pointing to FSNamesystem.unprotectedSetAccessTime(). I still cannot change aTime or mTime using bash. I guess I am saying I am ok with a touchAC() method (as in touch -ac ), but it is already there, called getBlockLocations(), and I don't see why you need more. The rest looks great.
          Hide
          Allen Wittenauer added a comment -

          I can see where the capability to set access time would be extremely useful for the FUSE case. (and the NFSv4 proxy case, should that come to pass)

          Show
          Allen Wittenauer added a comment - I can see where the capability to set access time would be extremely useful for the FUSE case. (and the NFSv4 proxy case, should that come to pass)
          Hide
          Pete Wyckoff added a comment -

          I can see where the capability to set access time would be extremely useful for the FUSE case. (and the NFSv4 proxy case, should that come to pass)

          +1 it would be nice as "touch /export/hdfs/foo" seems to be a common way of checking fuse-dfs and the returned IO Error is confusing since the module is working but just can't implement touch. I guess it could be a no op but when access time is configured, it would be nice to have.

          Show
          Pete Wyckoff added a comment - I can see where the capability to set access time would be extremely useful for the FUSE case. (and the NFSv4 proxy case, should that come to pass) +1 it would be nice as "touch /export/hdfs/foo" seems to be a common way of checking fuse-dfs and the returned IO Error is confusing since the module is working but just can't implement touch. I guess it could be a no op but when access time is configured, it would be nice to have.
          Hide
          Raghu Angadi added a comment - - edited

          > it would be nice as "touch /export/hdfs/foo"

          Isn't touch just 'append(f); close(f)'? That what command line touch seems to do.

          Edit: I am just talking about touch. no strong opinion about w.r.t. setModificationTime() and setAccessTime() .

          Show
          Raghu Angadi added a comment - - edited > it would be nice as "touch /export/hdfs/foo" Isn't touch just 'append(f); close(f)'? That what command line touch seems to do. Edit: I am just talking about touch. no strong opinion about w.r.t. setModificationTime() and setAccessTime() .
          Hide
          Allen Wittenauer added a comment -

          I dunno about touch, but I was thinking of tar, etc case where atime can be restored as part of the unpack operation.

          Show
          Allen Wittenauer added a comment - I dunno about touch, but I was thinking of tar, etc case where atime can be restored as part of the unpack operation.
          Hide
          Pete Wyckoff added a comment -

          in fuse-dfs, this is the error I get:

          touch /export/hdfs/user/pwyckoff/foo

          touch: setting times of `/export/hdfs/user/pwyckoff/foo': Function not implemented

          So, I assumed one needs to implement some attribute setting function. But, this is against 0.17 so appends also give me an IO Error.

          Show
          Pete Wyckoff added a comment - in fuse-dfs, this is the error I get: touch /export/hdfs/user/pwyckoff/foo touch: setting times of `/export/hdfs/user/pwyckoff/foo': Function not implemented So, I assumed one needs to implement some attribute setting function. But, this is against 0.17 so appends also give me an IO Error.
          Hide
          Konstantin Shvachko added a comment -

          touch() is getBlockLocations() in terms of hdfs.
          Preserving file times while copying or tar-ing is useful, no doubt about it, but is rather different from being able to set it to an arbitrary value.
          setAccessTime() gives more than what is needed.

          Show
          Konstantin Shvachko added a comment - touch() is getBlockLocations() in terms of hdfs. Preserving file times while copying or tar-ing is useful, no doubt about it, but is rather different from being able to set it to an arbitrary value. setAccessTime() gives more than what is needed.
          Hide
          dhruba borthakur added a comment -

          Hi Konstantin, I am not fixated on providing the setAcessTime method. However, I think it is a powerful feature that can be used by any hierarchical storage system. It even has precedence on all Unix and Windows systems :

          http://linux.die.net/man/2/utimes

          If more people feel strongly about not having the setAccessTime API, I will remove it from the patch.

          Show
          dhruba borthakur added a comment - Hi Konstantin, I am not fixated on providing the setAcessTime method. However, I think it is a powerful feature that can be used by any hierarchical storage system. It even has precedence on all Unix and Windows systems : http://linux.die.net/man/2/utimes If more people feel strongly about not having the setAccessTime API, I will remove it from the patch.
          Hide
          Raghu Angadi added a comment - - edited

          +1 for setAccessTime() in HDFS.
          -1 for adding it to FileSystem API. As we move towards stable APIs we really need to be more cautious about adding new APIs. Once there is enough utility demonstrated by this API in HDFS, it can move up (perhaps in a different form).

          regd touch, its main function is to change modification time (and create the file if it does not exist).

          Show
          Raghu Angadi added a comment - - edited +1 for setAccessTime() in HDFS. -1 for adding it to FileSystem API. As we move towards stable APIs we really need to be more cautious about adding new APIs. Once there is enough utility demonstrated by this API in HDFS, it can move up (perhaps in a different form). regd touch, its main function is to change modification time (and create the file if it does not exist).
          Hide
          Raghu Angadi added a comment -

          > regd touch, its main function is to change modification time
          i.e. getBlockLocations() does not seem sufficient or correct.

          Show
          Raghu Angadi added a comment - > regd touch, its main function is to change modification time i.e. getBlockLocations() does not seem sufficient or correct.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12389045/accessTime5.patch
          against trunk revision 689666.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 10 new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3132/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3132/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3132/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3132/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12389045/accessTime5.patch against trunk revision 689666. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 10 new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3132/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3132/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3132/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3132/console This message is automatically generated.
          Hide
          Konstantin Shvachko added a comment - - edited

          This is what I get from the discussion above.

          • touchAC() method is useful and can find immediate application in fuse.
          • for archival and (remote) copy purposes we will need a flag that would create new files with time (and may be other) attributes having THE SAME value as the source files.
          • there is no reasonable use case for setAcessTime method at the moment. I mean nobody needs to set aTime to yesterday or tomorrow unless it is the time of the file being copied or untared.

          In the spirit of minimizing impact on external APIs I propose to

          1. introduce FileSystem.touchAC() and implement it in HDFS as a call to getBlockLocations().
          2. not introduce setAcessTime() anywhere in hdfs client code including ClientProtocol.

          We can always introduce setAcessTime() when it will really be necessary. Because once introduced its hard to take it back.

          Show
          Konstantin Shvachko added a comment - - edited This is what I get from the discussion above. touchAC() method is useful and can find immediate application in fuse. for archival and (remote) copy purposes we will need a flag that would create new files with time (and may be other) attributes having THE SAME value as the source files. there is no reasonable use case for setAcessTime method at the moment. I mean nobody needs to set aTime to yesterday or tomorrow unless it is the time of the file being copied or untared. In the spirit of minimizing impact on external APIs I propose to introduce FileSystem.touchAC() and implement it in HDFS as a call to getBlockLocations(). not introduce setAcessTime() anywhere in hdfs client code including ClientProtocol. We can always introduce setAcessTime() when it will really be necessary. Because once introduced its hard to take it back.
          Hide
          Pete Wyckoff added a comment -

          +1 to not unduly adding stuff to the API, but when the thing in mind is something needed to support a posix API, shouldn't there be an exception? We already know as Allen points out that setAccessTime is needed to implement tar properly, so isn't that an important enough application?

          Show
          Pete Wyckoff added a comment - +1 to not unduly adding stuff to the API, but when the thing in mind is something needed to support a posix API, shouldn't there be an exception? We already know as Allen points out that setAccessTime is needed to implement tar properly, so isn't that an important enough application?
          Hide
          dhruba borthakur added a comment -

          I still vote for having FileSystem.setAccessTime(). It is very much like Posix and I do not see a security issue anywhere at all. Also, most archiving/unarchiving utilities sets the access time. For example, an restore utility will need to set the access time of the file.

          Show
          dhruba borthakur added a comment - I still vote for having FileSystem.setAccessTime(). It is very much like Posix and I do not see a security issue anywhere at all. Also, most archiving/unarchiving utilities sets the access time. For example, an restore utility will need to set the access time of the file.
          Pete Wyckoff made changes -
          Link This issue is blocked by HADOOP-4028 [ HADOOP-4028 ]
          Pete Wyckoff made changes -
          Link This issue is blocked by HADOOP-4028 [ HADOOP-4028 ]
          Pete Wyckoff made changes -
          Link This issue blocks HADOOP-4028 [ HADOOP-4028 ]
          Hide
          Konstantin Shvachko added a comment -

          As I said before setAccessTime has wider semantics than it is required for tar .
          setAccessTime lets you set an arbitrary value to aTime, while in tar you just need to replicate the time of an existing file.
          On the other hand setting only access time is not sufficient for tar. You also want to keep the same modification time, owner, group and permissions. So to me it makes more sense to introduce a new create method with FileStatus of the existing file as a parameter or a copy method with an option to replicate FileStatus fields.

          Show
          Konstantin Shvachko added a comment - As I said before setAccessTime has wider semantics than it is required for tar . setAccessTime lets you set an arbitrary value to aTime, while in tar you just need to replicate the time of an existing file. On the other hand setting only access time is not sufficient for tar . You also want to keep the same modification time, owner, group and permissions. So to me it makes more sense to introduce a new create method with FileStatus of the existing file as a parameter or a copy method with an option to replicate FileStatus fields.
          Hide
          Konstantin Shvachko added a comment -

          Posix is not an argument in connection with hdfs.
          I always find it hard to argue with my kind when after all pros and cons they say " I still want it".
          What restore utility?

          Show
          Konstantin Shvachko added a comment - Posix is not an argument in connection with hdfs. I always find it hard to argue with my kind when after all pros and cons they say " I still want it". What restore utility?
          Hide
          Raghu Angadi added a comment -

          If we want to provide POSIX like functionality when ever possible (which I think is good idea), it would make more sense to keep the names/parameter etc similar as well. Though not part of this jira, I am +1 for FileSystem.utimes() and -1 for FileSystem.setAccessTime().

          Show
          Raghu Angadi added a comment - If we want to provide POSIX like functionality when ever possible (which I think is good idea), it would make more sense to keep the names/parameter etc similar as well. Though not part of this jira, I am +1 for FileSystem.utimes() and -1 for FileSystem.setAccessTime().
          Hide
          Konstantin Shvachko added a comment -

          Sorry I created a confusion. When I talked about touch() I meant changing access time to current, but Raghu just corrected me that touch() in posix it can also change modification time. Just to clarify I did not want to say anything about modification time, everything I talked about was access times. Precisely, I meant

          touch -ac foo
          

          I'll replace touch() with touchAC() in my earlier posts to clarify that if anybody will be curious enough to read it.

          Show
          Konstantin Shvachko added a comment - Sorry I created a confusion. When I talked about touch() I meant changing access time to current, but Raghu just corrected me that touch() in posix it can also change modification time. Just to clarify I did not want to say anything about modification time, everything I talked about was access times. Precisely, I meant touch -ac foo I'll replace touch() with touchAC() in my earlier posts to clarify that if anybody will be curious enough to read it.
          Hide
          Allen Wittenauer added a comment -

          Maybe this is a dumb question but how will hadoop archives (htars?) interact with access times?

          Also, at least to me, an API called touchAC() seems very non-obvious as to its purpose. (esp if I'm doing this on Windows)

          Show
          Allen Wittenauer added a comment - Maybe this is a dumb question but how will hadoop archives (htars?) interact with access times? Also, at least to me, an API called touchAC() seems very non-obvious as to its purpose. (esp if I'm doing this on Windows)
          Hide
          Raghu Angadi added a comment -

          yes, I don't think we should have a touchAC()... touch would be a utility that could be implemented with other FileSystem API.

          Show
          Raghu Angadi added a comment - yes, I don't think we should have a touchAC()... touch would be a utility that could be implemented with other FileSystem API.
          Hide
          dhruba borthakur added a comment - - edited

          I like Raghu's proposal that FileSystem.setAccessTime() can be renamed as FileSystem.utimes(FileStatus). But it creates some other issues:

          1. The FileStatus object has blockSize of the file. The blockSize cannot be changed. Similarly, the FileStatus object has a field called 'isdir". What happens to this one?

          2. Similarly, the FileStatus has the length of the file. Are we going to truncate the file (or create a sparse file with holes if the user sets a longer length)?

          3. There are existing APIs FileSystem.setReplication(), FileSystem.setOwner(), setGroup(), setPermissions(). etc. Will these be deprecated or coexist with the new API?

          I prefer adding a setAccessTime because it allows an application to set the access time to an arbitrary value. If we want to merge all the above APIs into FileSystem.utimes(), I can do it as part of a separate JIRA.

          Raghu, Konstanin: does it sound ok?

          Show
          dhruba borthakur added a comment - - edited I like Raghu's proposal that FileSystem.setAccessTime() can be renamed as FileSystem.utimes(FileStatus). But it creates some other issues: 1. The FileStatus object has blockSize of the file. The blockSize cannot be changed. Similarly, the FileStatus object has a field called 'isdir". What happens to this one? 2. Similarly, the FileStatus has the length of the file. Are we going to truncate the file (or create a sparse file with holes if the user sets a longer length)? 3. There are existing APIs FileSystem.setReplication(), FileSystem.setOwner(), setGroup(), setPermissions(). etc. Will these be deprecated or coexist with the new API? I prefer adding a setAccessTime because it allows an application to set the access time to an arbitrary value. If we want to merge all the above APIs into FileSystem.utimes(), I can do it as part of a separate JIRA. Raghu, Konstanin: does it sound ok?
          Hide
          Raghu Angadi added a comment - - edited

          I don't know how FileStatus, blockSize etc matters. I would have thought it would be something like FileSystem.utimes(path, modTime, aTime) and keep the behaviour as close to as possible to posix man page, just like FileSystem.setAccessTime() interface you added.

          I suggested the name utimes() since you gave utimes() as the justification. If you think setAccessTime() should be the name, that s alright I guess.

          The API stuff could better done in a different jira IMHO.

          Edit : minor

          Show
          Raghu Angadi added a comment - - edited I don't know how FileStatus, blockSize etc matters. I would have thought it would be something like FileSystem.utimes(path, modTime, aTime) and keep the behaviour as close to as possible to posix man page, just like FileSystem.setAccessTime() interface you added. I suggested the name utimes() since you gave utimes() as the justification. If you think setAccessTime() should be the name, that s alright I guess. The API stuff could better done in a different jira IMHO. Edit : minor
          Hide
          dhruba borthakur added a comment -

          Hi Raghu, Thanks. I like your proposal of having

          FileSystem.utimes(path, modTime, aTime).

          I can do this as part of this JIRA. Further cleanup of the API can be done as part of another JIRA. Do we have consensus now?

          Show
          dhruba borthakur added a comment - Hi Raghu, Thanks. I like your proposal of having FileSystem.utimes(path, modTime, aTime). I can do this as part of this JIRA. Further cleanup of the API can be done as part of another JIRA. Do we have consensus now?
          Hide
          Raghu Angadi added a comment -

          +1 for me.. in this jira or a different one, does not matter. sticking to posix-like when possible helps since that interface has already gone through the arguments..

          Actually I wanted to edit my comment make it clear that I am not too opposed to setAccessTime()...

          Show
          Raghu Angadi added a comment - +1 for me.. in this jira or a different one, does not matter. sticking to posix-like when possible helps since that interface has already gone through the arguments.. Actually I wanted to edit my comment make it clear that I am not too opposed to setAccessTime()...
          Hide
          Konstantin Shvachko added a comment -

          > setAccessTime because it allows an application to set the access time to an arbitrary value.

          This is exactly the reason why I am against introducing setAccessTime. No arbitrary value to access time. And therefore no utimes() for me.
          I do not oppose to set aTime to current.

          Show
          Konstantin Shvachko added a comment - > setAccessTime because it allows an application to set the access time to an arbitrary value. This is exactly the reason why I am against introducing setAccessTime. No arbitrary value to access time. And therefore no utimes() for me. I do not oppose to set aTime to current.
          Hide
          dhruba borthakur added a comment -

          Ok, from Raghu's and Konstantin's comments, I guess nobody is stuck on how the API looks like. Whether it is setAccessTime() or utimes(), everybody is ok with it. It appears that both Raghu and Konstantin are +1 on this one. Please let me know if this is not the case.

          The point that is being discussed is whether utimes/setAccessTime allows setting the time to any user-specified value or whether it sets it to current time on namenode. I still vote for allowing an user to set any access time... this is what POSIX does and it allows restore utilities to use a standard API. From Raghu's comments, it appears to me that he is +1 on it too.

          >So to me it makes more sense to introduce a new create method with FileStatus of the existing file as a parameter or a
          > copy method with an option to replicate FileStatus fields.

          I do not like the idea of having a custom API as described above.

          Show
          dhruba borthakur added a comment - Ok, from Raghu's and Konstantin's comments, I guess nobody is stuck on how the API looks like. Whether it is setAccessTime() or utimes(), everybody is ok with it. It appears that both Raghu and Konstantin are +1 on this one. Please let me know if this is not the case. The point that is being discussed is whether utimes/setAccessTime allows setting the time to any user-specified value or whether it sets it to current time on namenode. I still vote for allowing an user to set any access time... this is what POSIX does and it allows restore utilities to use a standard API. From Raghu's comments, it appears to me that he is +1 on it too. >So to me it makes more sense to introduce a new create method with FileStatus of the existing file as a parameter or a > copy method with an option to replicate FileStatus fields. I do not like the idea of having a custom API as described above.
          Hide
          Konstantin Shvachko added a comment -

          Clarifying.
          > setAccessTime() or utimes(), everybody is ok with it. It appears that both Raghu and Konstantin are +1 on this one.
          -1 on both.

          > I still vote for allowing an user to set any access time.
          -1.

          Seriously, it's like nobody is listening to others. May be we need a meeting.

          Show
          Konstantin Shvachko added a comment - Clarifying. > setAccessTime() or utimes(), everybody is ok with it. It appears that both Raghu and Konstantin are +1 on this one. -1 on both. > I still vote for allowing an user to set any access time. -1. Seriously, it's like nobody is listening to others. May be we need a meeting.
          Hide
          Owen O'Malley added a comment -

          Just to add noise to the fire, I'm +1 to setAccessTime. I also think it is a very good idea to be able to configure it off at the namenode. My case for setAccessTime is that if you expand an archive or do distcp, it is really nice to be able to optionally set all of the times to match the copied files. That includes access time.

          Show
          Owen O'Malley added a comment - Just to add noise to the fire, I'm +1 to setAccessTime. I also think it is a very good idea to be able to configure it off at the namenode. My case for setAccessTime is that if you expand an archive or do distcp, it is really nice to be able to optionally set all of the times to match the copied files. That includes access time.
          dhruba borthakur made changes -
          Link This issue blocks HADOOP-4058 [ HADOOP-4058 ]
          Hide
          Allen Wittenauer added a comment -

          I think Konstantin's point is that the only "realistic" use case we can come up with for setting an arbitrary access time is during a file create (such as expansion of an archive and a distcp), therefore the API should reflect that use case since it keeps the number of routines small. Please correct me if I'm wrong.

          To me, that sounds incredibly unclean. It would better to allow for a separate utimes()-equivalent API so that a) if there is a use case later, it can be covered and b) do we really want a call that is two operations-in-one?

          Show
          Allen Wittenauer added a comment - I think Konstantin's point is that the only "realistic" use case we can come up with for setting an arbitrary access time is during a file create (such as expansion of an archive and a distcp), therefore the API should reflect that use case since it keeps the number of routines small. Please correct me if I'm wrong. To me, that sounds incredibly unclean. It would better to allow for a separate utimes()-equivalent API so that a) if there is a use case later, it can be covered and b) do we really want a call that is two operations-in-one?
          Hide
          Raghu Angadi added a comment - - edited

          > a file create (such as expansion of an archive and a distcp), [...]

          During create is not enough even for these use cases. Say distcp copies 10GB file and sets Mod time at create time (to t - 1month), and the last block is written 1 min later.. then the mod time after Distcp will be (t + 1min) rather than (t - 1month). -1 for extra options to create, or close, etc.

          Why not just provide utimes().. since we are using POSIX as a tie breaker?

          Another Konstantin's point is that FS should not allow setting future time.. which sounds ok.. but it is just a file attribute to help users not something filesystem inherently depends upon. I don't see need to police it that much .. and since POSIX is a tie breaker we could just stick to it functionality. Note that all the use cases we need to be able to set modtime too.

          Show
          Raghu Angadi added a comment - - edited > a file create (such as expansion of an archive and a distcp), [...] During create is not enough even for these use cases. Say distcp copies 10GB file and sets Mod time at create time (to t - 1month), and the last block is written 1 min later.. then the mod time after Distcp will be (t + 1min) rather than (t - 1month). -1 for extra options to create, or close, etc. Why not just provide utimes().. since we are using POSIX as a tie breaker? Another Konstantin's point is that FS should not allow setting future time.. which sounds ok.. but it is just a file attribute to help users not something filesystem inherently depends upon. I don't see need to police it that much .. and since POSIX is a tie breaker we could just stick to it functionality. Note that all the use cases we need to be able to set modtime too.
          Hide
          dhruba borthakur added a comment -

          Given that Raghu, Owen and Allen commented that it is better to follow the POSIX semantics of allowing an user to set either access time or modification time to any arbitrary value he/she likes, I change my earlier patch sightly to add the following API:

          /**

          • Set access time of a file
          • @param p The path
          • @param mtime Set the modification time of this file.
          • The number of milliseconds since Jan 1, 1970.
          • A value of -1 means that this call should not set modification time.
          • @param atime Set the access time of this file.
          • The number of milliseconds since Jan 1, 1970.
          • A value of -1 means that this call should not set access time.
            */
            public void setTimes(Path p, long mtime, long atime
            ) throws IOException;

          This is precisely similar to the POSIX utimes call, but follows the Hadoop naming pattern for method names. This allows setting access time or modification time or both.

          Show
          dhruba borthakur added a comment - Given that Raghu, Owen and Allen commented that it is better to follow the POSIX semantics of allowing an user to set either access time or modification time to any arbitrary value he/she likes, I change my earlier patch sightly to add the following API: /** Set access time of a file @param p The path @param mtime Set the modification time of this file. The number of milliseconds since Jan 1, 1970. A value of -1 means that this call should not set modification time. @param atime Set the access time of this file. The number of milliseconds since Jan 1, 1970. A value of -1 means that this call should not set access time. */ public void setTimes(Path p, long mtime, long atime ) throws IOException; This is precisely similar to the POSIX utimes call, but follows the Hadoop naming pattern for method names. This allows setting access time or modification time or both.
          dhruba borthakur made changes -
          Attachment accessTime6.patch [ 12389459 ]
          dhruba borthakur made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          dhruba borthakur added a comment -

          Submitting patch for HadoopQA tests.

          Show
          dhruba borthakur added a comment - Submitting patch for HadoopQA tests.
          dhruba borthakur made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Raghu Angadi added a comment -

          What are the permissions required for setting arbitrary accessTime? just read permission does not seem enough at least on Linux box.

          Show
          Raghu Angadi added a comment - What are the permissions required for setting arbitrary accessTime? just read permission does not seem enough at least on Linux box.
          Hide
          dhruba borthakur added a comment -

          From my understanding, (and this is what I have implemented in this patch), a read access is required to be able to set access time on files. A write-access is required to be able to set modification time on files.

          Show
          dhruba borthakur added a comment - From my understanding, (and this is what I have implemented in this patch), a read access is required to be able to set access time on files. A write-access is required to be able to set modification time on files.
          Hide
          Raghu Angadi added a comment -

          That is mostly not correct.. may be it needs to be changed later.

          Show
          Raghu Angadi added a comment - That is mostly not correct.. may be it needs to be changed later.
          Hide
          dhruba borthakur added a comment -

          Another option would be to allow changing access times and modifications times by the owner of the file and the superuser. But this patch does not do this. This patch "a read access is required to be able to set access time on files. A write-access is required to be able to set modification time on files".

          Show
          dhruba borthakur added a comment - Another option would be to allow changing access times and modifications times by the owner of the file and the superuser. But this patch does not do this. This patch "a read access is required to be able to set access time on files. A write-access is required to be able to set modification time on files".
          Hide
          Sanjay Radia added a comment -

          The main use cases are distcp, restore (or untar).
          Konstantine raises 2 good points:

          • restrict to create operation. In order to make this work the time has to be applied at close event otherwise you run into the situation that Ragu raises about the file taking a long time to write its last block.
          • restrict times to be <= the NN's current time,
            This could run into problems with distcp betweens two hdfs clusters with clocks out sync,

          While the extended create operation works for our use case, there are few advantages to the utimes() approach:

          • handles other use cases we haven't thought of today
          • if we provide partial posix compatibility in the future, one could use posix's restore/untar tools

          Hence I am in favour of:
          FileSystem.utimes(path, modTime, aTime).
          +1

          Show
          Sanjay Radia added a comment - The main use cases are distcp, restore (or untar). Konstantine raises 2 good points: restrict to create operation. In order to make this work the time has to be applied at close event otherwise you run into the situation that Ragu raises about the file taking a long time to write its last block. restrict times to be <= the NN's current time, This could run into problems with distcp betweens two hdfs clusters with clocks out sync, While the extended create operation works for our use case, there are few advantages to the utimes() approach: handles other use cases we haven't thought of today if we provide partial posix compatibility in the future, one could use posix's restore/untar tools Hence I am in favour of: FileSystem.utimes(path, modTime, aTime). +1
          Hide
          Raghu Angadi added a comment - - edited

          Edit : its fine. getBlockLocations calls internal dir.setTimes().

          Regd setTimes() implementation : We should have a private setTimes that does not do security checks and audit logging since most common use is internal (as in getBlockLocations()) . Security checks and logging is needed only when user actively invokes setTimes().. btw, should it be setUTimes()? I haven't looked at rest of the patch thoroughly.

          Show
          Raghu Angadi added a comment - - edited Edit : its fine. getBlockLocations calls internal dir.setTimes(). Regd setTimes() implementation : We should have a private setTimes that does not do security checks and audit logging since most common use is internal (as in getBlockLocations()) . Security checks and logging is needed only when user actively invokes setTimes().. btw, should it be setUTimes()? I haven't looked at rest of the patch thoroughly.
          Hide
          dhruba borthakur added a comment -

          Hi Raghu, thanks for reviewing this patch. the current patch does not do any adsitional security checks or audit logging while settign access times when invoked from getBlockLocations. In the case when FileSystem.setTimes() is called, it checks access priviledges and does audit logging. So, it behaves precisely the way you described in your comment. Please let me know if you have any additional comments.

          Show
          dhruba borthakur added a comment - Hi Raghu, thanks for reviewing this patch. the current patch does not do any adsitional security checks or audit logging while settign access times when invoked from getBlockLocations. In the case when FileSystem.setTimes() is called, it checks access priviledges and does audit logging. So, it behaves precisely the way you described in your comment. Please let me know if you have any additional comments.
          Hide
          dhruba borthakur added a comment -

          Hadoop QA did not pick up this patch for tests. Resubmitting....

          Show
          dhruba borthakur added a comment - Hadoop QA did not pick up this patch for tests. Resubmitting....
          dhruba borthakur made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          dhruba borthakur added a comment -

          Hadoop QA did not pick up this patch for tests. Resubmitting....

          Show
          dhruba borthakur added a comment - Hadoop QA did not pick up this patch for tests. Resubmitting....
          dhruba borthakur made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12389459/accessTime6.patch
          against trunk revision 692287.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 10 new Findbugs warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3181/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3181/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3181/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3181/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12389459/accessTime6.patch against trunk revision 692287. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 10 new Findbugs warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3181/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3181/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3181/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3181/console This message is automatically generated.
          dhruba borthakur made changes -
          Link This issue blocks HADOOP-4077 [ HADOOP-4077 ]
          Hide
          dhruba borthakur added a comment -

          I created HADOOP-4077 to debate on the access permissions that are requires to invoke the setTimes API.

          The failed test is TestKosmosFileSystem and it has been failing for the last 5 builds. This failure is not part of this patch.

          The findbugs warnings are not introduced by this patch. I believe that the test-patch process is getting confused while diffing the findbugs outout on trunk with the findbugs output from this patch.

          Konstantin has reviewed this patch earlier. Please let me know if somebody else wants to review this patch. I would to get it commited by Friday Sept 5 so that it can make it into the 0.19 release.

          Show
          dhruba borthakur added a comment - I created HADOOP-4077 to debate on the access permissions that are requires to invoke the setTimes API. The failed test is TestKosmosFileSystem and it has been failing for the last 5 builds. This failure is not part of this patch. The findbugs warnings are not introduced by this patch. I believe that the test-patch process is getting confused while diffing the findbugs outout on trunk with the findbugs output from this patch. Konstantin has reviewed this patch earlier. Please let me know if somebody else wants to review this patch. I would to get it commited by Friday Sept 5 so that it can make it into the 0.19 release.
          Hide
          Raghu Angadi added a comment -

          Dhruba, I looked at setTimes() mainly it looks good. Since the rest of the patch hasn't changed, you can commit it.

          Show
          Raghu Angadi added a comment - Dhruba, I looked at setTimes() mainly it looks good. Since the rest of the patch hasn't changed, you can commit it.
          Hide
          dhruba borthakur added a comment -

          Thanks Raghu. I will commit this patch.

          Show
          dhruba borthakur added a comment - Thanks Raghu. I will commit this patch.
          Hide
          dhruba borthakur added a comment -

          I just committed this.

          Show
          dhruba borthakur added a comment - I just committed this.
          dhruba borthakur made changes -
          Hadoop Flags [Incompatible change, Reviewed]
          Release Note HDFS files have access times. By default, access times will be precise to the most recent hour boundary. A configuration parameter dfs.access.time.precision is used to control this precision. Setting a value of 0 will disable persisting access times for HDFS files.
          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #595 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/595/ )
          Hide
          dhruba borthakur added a comment -

          I have committed this.

          Show
          dhruba borthakur added a comment - I have committed this.
          Hide
          dhruba borthakur added a comment -

          I have committed this.

          Show
          dhruba borthakur added a comment - I have committed this.
          dhruba borthakur made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hadoop Flags [Reviewed, Incompatible change] [Incompatible change, Reviewed]
          Tsz Wo Nicholas Sze made changes -
          Link This issue is related to HADOOP-4099 [ HADOOP-4099 ]
          Pete Wyckoff made changes -
          Release Note HDFS files have access times. By default, access times will be precise to the most recent hour boundary. A configuration parameter dfs.access.time.precision is used to control this precision. Setting a value of 0 will disable persisting access times for HDFS files. HDFS files have access times. By default, access times will be precise to the most recent hour boundary. A configuration parameter dfs.access.time.precision (milliseconds) is used to control this precision. Setting a value of 0 will disable persisting access times for HDFS files.
          Hadoop Flags [Reviewed, Incompatible change] [Incompatible change, Reviewed]
          Robert Chansler made changes -
          Hadoop Flags [Reviewed, Incompatible change] [Incompatible change, Reviewed]
          Release Note HDFS files have access times. By default, access times will be precise to the most recent hour boundary. A configuration parameter dfs.access.time.precision (milliseconds) is used to control this precision. Setting a value of 0 will disable persisting access times for HDFS files. Added HDFS file access times. By default, access times will be precise to the most recent hour boundary. A configuration parameter dfs.access.time.precision (milliseconds) is used to control this precision. Setting a value of 0 will disable persisting access times for HDFS files.
          Nigel Daley made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Tsz Wo Nicholas Sze made changes -
          Link This issue is related to HADOOP-4986 [ HADOOP-4986 ]
          Owen O'Malley made changes -
          Component/s dfs [ 12310710 ]
          Konstantin Shvachko made changes -
          Link This issue relates to HDFS-2712 [ HDFS-2712 ]
          Gavin made changes -
          Link This issue blocks HDFS-220 [ HDFS-220 ]
          Gavin made changes -
          Link This issue is depended upon by HDFS-220 [ HDFS-220 ]
          Gavin made changes -
          Link This issue blocks HADOOP-4077 [ HADOOP-4077 ]
          Gavin made changes -
          Link This issue is depended upon by HADOOP-4077 [ HADOOP-4077 ]

            People

            • Assignee:
              dhruba borthakur
              Reporter:
              dhruba borthakur
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development