Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-1172

Question on pig loader read parquet file

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.9.0, 1.9.1
    • None
    • parquet-mr, parquet-pig
    • None

    Description

      When I use spark save parquet file, schema like this

      optional group attref (LIST) {
             repeated group list {
               optional group element {
                 optional binary nid (UTF8);
                 optional binary nss (UTF8);
               }
             }
           }
      

      And then use parquet-pig-bundle to read this file, the read function can work, but when i need to access "nid" it have some problem

      If I read other file save by pig-storer, and need nid list, pig command is:

       
      B = foreach A generate value.addr.clientIp_bag.clientIp, value.guid , value.attref.nid;
      

      but read spark save version I need use this:

      B = foreach M generate value.addr.clientIp, value.guid , flatten(value.attref);
      C = foreach B generate clientIp, guid, attref::element.nid; 
      

      and this command will flatten column

      My question is pig loader have some problem when loading parquet file(save by spark)

      Attachments

        Activity

          People

            Unassigned Unassigned
            abel_ke abel_ke
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: