Uploaded image for project: 'OODT (Retired)'
  1. OODT (Retired)
  2. OODT-667

CAS-PGE no longer respects writers and file tags from earlier pgeConfig.xml files



    • Expert (Hard) - Guru knowledge of this project could be required


      It's been a long standing bug post Apache OODT 0.3 (0.4 and beyond) that the updates to CAS-PGE to simplify its crawling system for met extraction based on files and regExp tags and to unify it with the AutoDetectProductCrawler has caused cas-pge to no longer honor the following blocks from pgeConfig.xml files:

          <files regExp="someRegExp" metWriter="some.class" args="some args"/>

      This was a conscious decision and discuss by Brian Foster and myself and others on several occasions:


      I support Brian's implementation but I think we took a step back in not offering backwards compatibility that simply:

      1. still reads the pgeConfig.xml files tags above and then;
      2. constructs the appropriate AutoDetectCrawler and RenamingConventions and other plumbing behind the scenes.

      Note one of the key features that becomes important in these situations is to have CAS-PGE job directories contain the metadata files serialized for offline inspection in case there are errors. Currently we lost support for that (as evidenced by the removal of the met key MET_FILE_EXT). I am also going to add that back in, and simply subclass AutoDetectProductCrawler in cas-pge, and then override its crawling step to also serialize the met files it generates.

      That will get us back to full forwards and backwards compat support starting in 0.7 for all versions of CAS-PGE pgeConfig.xml files. wish me luck!


        1. OODT-667.Mattmann.060914.patch.txt
          11 kB
          Chris A. Mattmann



            chrismattmann Chris A. Mattmann
            chrismattmann Chris A. Mattmann
            0 Vote for this issue
            1 Start watching this issue