Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1968

File Name too long issue of DumpFileUtil.java file

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.10
    • Fix Version/s: 1.10
    • Component/s: tool
    • Labels:
    • Environment:

      Nutch 1.10 Revision 1667458

      Description

      With the helpful patch that Renxia posts https://issues.apache.org/jira/browse/NUTCH-1957, I figure out that we need to solve the file name collision, otherwise we will lose data. However, when I use this patch to execute bin/nutch dump, I get file name too long error as follows:
      zhangxin0804@zhangxin0804-VirtualBox:~/Desktop/Nutch/nutch/runtime/local$ bin/nutch dump -outputDir outputDir -segment TestCrawl2/segments
      java.io.FileNotFoundException:/home/zhangxin0804/Desktop/Nutch/nutch/runtime/local/outputDir/86/fc/830433456bfbcff5f7b53661cc24d9d4_maps.php?submitted=true&year=2014&month=6&imgs%5b%5d=nationaltavgrank&imgs%5b%5d=nationaltmaxrank&imgs%5b%5d=nationaltminrank&imgs%5b%5d=nationalpcpnrank&imgs%5b%5d=regionaltavgrank&imgs%5b%5d=regionaltmaxrank&imgs%5b%5d=regionaltminrank&imgs%5b%5d=regionalpcpnrank&imgs%5b%5d=statewidetavgrank&imgs%5b%5d=statewidetmaxrank&imgs%5b%5d=statewidetminrank&imgs%5b%5d=statewidepcpnrank&imgs%5b%5d=divisionaltavgrank&imgs%5b%5d=divisionaltmaxrank&imgs%5b%5d=divisionaltminrank&imgs%5b%5d=divisionalpcpnrank&ts=3 (File name too long)
      at java.io.FileOutputStream.open(Native Method)
      at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
      at java.io.FileOutputStream.<init>(FileOutputStream.java:171)
      at org.apache.nutch.tools.FileDumper.dump(FileDumper.java:221)
      at org.apache.nutch.tools.FileDumper.main(FileDumper.java:309)

      I dig into this patch and find it only checks the length of fileBaseName in /nutch/trunk/src/java/org/apache/nutch/util/DumpFileUtil.java. Therefore, if the <extension> is too long, the final outputFullPath is still too long which means it will throw exception in FileDumper.java Probably not everyone will meet this issue and it is maybe a minor bug, correct me if I am wrong. Meanwhile, is that OK to truncate fileExtension name as we did on fileBase name to solve this problem?

        Attachments

        1. EXTENSION_TOO_LONG.patch
          3 kB
          Renxia Wang

          Activity

            People

            • Assignee:
              chrismattmann Chris A. Mattmann
              Reporter:
              zhangxin0804 Xin Zhang
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: