Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-2099

AllLoader in trunk does not work properly with JSON schemas



    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.8.0, 0.8.1
    • None
    • data
    • None
    • piggybank


      The AllLoader in the Piggybank in trunk does not pass JSON-defined schemas to the child loaders it instantiates. If the schema is defined in the LOAD function, when Pig calls getSchema on the AllLoader the AllLoader instantiates the child loader and calls the child's getSchema if it respects the LoadMetadata interface. If the AllLoader finds the JSON schema in a file, it does not instantiate the child loader until prepareToRead is called, and the child does not receive the schema. I have hacked this in by adding to the AllLoader:

      transient String location = null;
      transient Job job = null;

      then in AllLoader::setLocation:

      this.location = location;
      this.job = job;

      then in AllLoader::prepareToRead:

      if (childLoadFunc instanceof LoadMetadata)

      { ((LoadMetadata) childLoadFunc).getSchema(location, job); }

      Although I suspect it is not good practice to store the location/job in the class variables like that, I don't know a better way to fix this.


      Also, getFuncSpecFromContent in the accompanying LoadFuncHelper class with the AllLoader should be modified:

      funcSpec = new FuncSpec("org.apache.pig.piggybank.storage.PigStorageSchema()");

      since it currently instantiates a normal PigStorage object, which does not understand pre-defined schemas. The documentation for the AllLoader should reference PigStorageSchema instead of PigStorage as well.




            Unassigned Unassigned
            chrispesto Chris Pesto
            0 Vote for this issue
            0 Start watching this issue