Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.2.0, 2.7.1
-
None
-
Incompatible change
-
GlobFilter and RegexFilter.compile() now returns com.google.re2j.pattern.Pattern instead of java.util.regex.Pattern
Description
java.util.regex classes have performance problems with certain wildcard patterns. Namely, consecutive * characters in a file name (not properly escaped as literals) will cause commands such as "hadoop fs -ls file******name" to consume 100% CPU and probably never return in a reasonable time (time scales with number of *'s).
Here is an example:
hadoop fs -touchz /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D\\\+\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\+\\\+\\\+...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist hadoop fs -ls /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D+******************************+++...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist
causes:
PID COMMAND %CPU TIME 14526 java 100.0 01:18.85
Not every string of *'s causes this, but the above filename reproduces this reliably.
Attachments
Attachments
Issue Links
- is duplicated by
-
HADOOP-13099 Glob should return files with special characters in name
- Resolved
- is related to
-
PARQUET-2158 Upgrade Hadoop dependency to version 3.2.0
- Resolved
-
HADOOP-13051 Add Glob unit test for special characters
- Resolved
- relates to
-
HDFS-9246 TestGlobPaths#pTestCurlyBracket is failing
- Resolved