|
[
Permlink
| « Hide
]
Owen O'Malley added a comment - 15/Oct/07 03:53 PM
This should be a static method on the FileInputFormat instead of JobConf, since it won't affect the framework, but only the FileInputFormat's behavior.
The method should probably also have a getter and most of them look like:
public static void setInputPathFilter(JobConf job, PathFilter filter); public static PathFilter getInputPathFilter(JobConf job); Having a static method on the FileInputFormat it would make difficult for an application that dispatches hadoop jobs (ie a webapp) to set filters on per job basis.
IMO, it should be configurable at job level. > IMO, it should be configurable at job level.
Please look more closely at the static methods Owen suggested. The job is a parameter. we support globing in input paths now. Doesn't that address this need?
IE *.foo Owen, Doug, got the static methos thing, that would work.
Eric, using wildcards would not work as it allows you to tell what you want to include, but now what you don't want to include. For example, if I have some files like the CRC files (to track other type of information) and I would like to skip them.
Alejandro Abdelnur made changes - 25/Mar/08 10:19 AM
I've figured out (IMO) a cleaner way of implementing this feature:
Adding the following 2 instance methods to the JobConf:
Modifying the FileInputFormat's listPaths() method to apply the hiddenFileFilter and (if set) the filter set in the jobconf. And still globbing works for regex inclusion, even if a path filter is set. By being able to specify a custom PathFilter it will be possible to create more complex filters such as exclusion ones and doing selections not possible to be done via regex.
Alejandro Abdelnur made changes - 25/Mar/08 10:36 AM
Alejandro Abdelnur made changes - 25/Mar/08 10:37 AM
Alejandro Abdelnur made changes - 25/Mar/08 10:42 AM
Devaraj Das made changes - 25/Mar/08 11:01 AM
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12378554/patch2055.txt against trunk revision 619744. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 3 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2048/testReport/ This message is automatically generated.
Alejandro Abdelnur made changes - 26/Mar/08 10:02 AM
refactored patch to Owen's suggestion as the functionality is specific to File InputFormats.
Alejandro Abdelnur made changes - 26/Mar/08 10:05 AM
Alejandro Abdelnur made changes - 26/Mar/08 10:05 AM
Alejandro Abdelnur made changes - 26/Mar/08 10:06 AM
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12378623/patch2055.txt against trunk revision 619744. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 3 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2061/testReport/ This message is automatically generated. I just committed this. Thanks, Alejandro!
Devaraj Das made changes - 28/Mar/08 12:44 PM
Integrated in Hadoop-trunk #445 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/445/
Devaraj Das made changes - 17/Apr/08 06:06 AM
Nigel Daley made changes - 21/May/08 08:05 PM
Owen O'Malley made changes - 08/Jul/09 04:52 PM
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||