Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-4073

Pig query (parsing) only works on one file rather than on a folder or a list of files

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • 0.13.0
    • None
    • parser
    • None

    Description

      I am executing a parsing query (see page_views_convert_asterix.pig attached + pigmix.jar)
      The query outputs a different result whether it:
      a) executed on a single file
      b) executed on a folder of two files

      The two outputs were too large to attach but you can see the erroneous results already from their first lines:
      Expected results - running on one file:
      ================================
      {"id":0,"user" : "KupnfSFXW]_oFEZFVrA","action" : 1,"timespent" : 9,"ip_addr" : 38,"timestamp" : 35,"estimated_revenue" : -4.238605486656678E7,"page_info":

      {"f":"EHm[ER\\QXr","g":"iFQwBwywtbAi","d":"iXTDmYaE","e":"VIuDFnCoBW","b":"LYbTgBVX","c":"o]OQd^yy","a":"TSHysjRq","n":"HikofJJ","l":"BpqMXQH","m":"lAiRwyyJhLS","j":"ibiQvyxr","k":"EHxGYWJM","h":"_gQYvoeBe","i":"keJOUp_" }

      , "page_links": {{

      { "f":"WmUnCMpnpgg","g":"PGWWxQBKw[WW","d":"CussyvG_rAr]","e":"RZUSjZv_S_Dh","b":"cDYGBRDCKVrl","c":"YcDM`aGyD","a":"xnbXjDmhfJwi","l":"FgjjZaEwL^@","j":"ODtQaLv`GSv","k":"HlWLRRucI","h":"BF]y_rVh_Ea","i":"\\r]ODckjoUL" }

      ,

      { "f":"VZe`Xe[G","g":"JcxGPR`[`","d":"Qeq\\HgN_gaJk","e":"\\DbThsT`Gar","b":"_OSs`txLnnp[","c":"fj_vboF`OrZ","a":"sqgGamCUruny","h":"qNtrLHV","i":"ySrqlQI[C^@" }

      ,

      { "f":"ov^eekm","g":"CSy]jpA_","d":"iCNeW`ylQw","e":"ciQG`uoC\\kn","b":"QfAFspC\\Ian","c":"eYlFKjtLws","a":"lckQFXeZ","n":"pYbnZU_EwNa","o":"HP^krpEOTSVo","l":"EH]]AWy^WO^[","m":"kqAuNFd","j":"ZhmGaekGbA`b","k":"MXG\\mZfwZTe","h":"JC_TxwcVZ","i":"UkRogKk","w":"lKnrumw\\V^@","v":"DsFlPJ`Jv","u":"vijvBCjHLjk","t":"jUH^iZrncHux","s":"`vBRjrCj","r":"[BrvMD\\ln","q":"fCLkUqkKw","p":"XEOqPBNOk_" }

      ,

      { "f":"l[LkbUw[xJyG","g":"J]eB_BIkn]ux","d":"UMybhxHXO","e":"FTMKnVVAI","b":"FQD\\rnHGK","c":"mqNJvV`YtebF","a":"IvyPuZvB","n":"FOTRgMxIi]Uq","o":"^G\\^LPTZWF","l":"qbPTSkl","m":"EMYsayFNe_T","j":"gRRFGdx","k":"\\oYObBaWt","h":"MDWlMgKTDSS","i":"cm\\x_Kkym","t":"aWRD`Nm^@","s":"PoPoZWwBvM","r":"ttqAgoDKAR","q":"slcrtcLC","p":"B\\a_TCnAk" }

      ,

      { "f":"ESBcvWM","g":"bheg_j^qeeb","d":"UsE_^aslG","e":"LGEuvVYUAa","b":"BleiHUjdwE","c":"[HkS]s]YSbJ_","a":"VftFF`ItY","n":"\\uCYLdDSa","o":"d[m^cKk","l":"]heosu\\ATaGQ","m":"oZmnhdApGu[k","j":"UiZofXu^XS","k":"uGsK\\^wFMA","h":"Dh^lPot_jBMZ","i":"qYweMZrYV","t":"qxDWyEit^@","s":"frF_`kfSRQBM","r":"jowmZjF]D]]","q":"`to`RKYbQNn","p":"lYcSAlb" }

      }} }
      {"id":1,"user" : "kiP
      R]ouAqPdgl]Ecqk]Iw","action" : 0,"timespent" : 1,"ip_addr" : 14707,"timestamp" : 43,"estimated_revenue" : -4.359457905991358E8,"page_info": {"f":"D
      [RkWZe...

      Non expected results - as folder was containing two files:
      ================================================
      As you can see: the id=0 record seems to have stopped in the middle of the parsing (ex: non closing bracket before the next record with id=1)
      {"id":0,"user" : "KupnfSFXW]_oFEZFVrA","action" : 1,"timespent" : 9,"ip_addr" : 38,"timestamp" : 35,"estimated_revenue" : -4.238605486656678E7,"page_info":

      {"f":"EHm[ER\\QXr","g" :"iFQwBwywtbAi","d":"iXTDmYaE","e":"VIuDFnCoBW","b":"LYbTgBVX","c":"o]OQd^yy","a":"TSHysjRq","n":"HikofJJ","l":"BpqMXQH","m":"lAiRwyyJhLS","j":"ibiQvyxr","k":"EHxGYWJM","h":"_gQY voeBe","i":"keJOUp_" }

      , "page_links": {{

      { "f":"WmUnCMpnpgg","g":"PGWWxQBKw[WW","d":"CussyvG_rAr]","e":"RZUSjZv_S_Dh","b":"cDYGBRDCKVrl","c":"YcDM`aGyD","a":"xnbXjDmhfJwi","l": "FgjjZaEwL"c":"fj_vboF`OrZ","a":"sqgGamCUruny","h":"qNtrLHV","i":"ySrqlQI[C"lckQFXeZ","n":"pYbnZU_EwNa","o":"HP^krpEOTSVo","l":"EH]]AWy^WO^[","m":"kqAuNFd","j":"ZhmGaekGbA`b","k":"MXG\\mZfwZTe","h":"JC_TxwcVZ","i":"UkRogKk","w":"lKnrumw\\VJv","u":"vijvBCjHLjk","t":"jUH^iZrncHux","s":"`vBRjrCj","r":"[BrvMD\\ln","q":"fCLkUqkKw","p":"XEOqPBNOk_" }

      , { "f":"l[LkbUw[xJyG","g":"J]eB_BIkn]ux","d":"UMybhxHXO","e":"FTMKnVV
      AI","b":"FQD\\rnHGK","c":"mqNJvV`YtebF","a":"IvyPuZvB","n":"FOTRgMxIi]Uq","o":"^G\\^LPTZWF","l":"qbPTSkl","m":"EMYsayFNe_T","j":"gRRFGdx","k":"
      oYObBaWt","h":"MDWlMgKTDSS","i":"
      cm\\x_Kkym","t":"aWRD`Nm","c":"[HkS]s]YSbJ_","a":"VftFF`ItY","n":"\\uCYLdDSa","o":"d[m^cKk","l":"]heosu\\ATaGQ","m":"oZmnhdApGu[k","j":"UiZofXu^XS","k":"uGsK
      ^wFMA","h":"Dh^lPot_jBMZ","i":"qYweMZrYV","
      t":"qxDWyEit
      {"id":1,"user" : "kiP
      R]ouAqPdgl]Ecqk]Iw","action" : 0,"timespent" : 1,"ip_addr" : 14707,"timestamp" : 43,"estimated_revenue" : -4.359457905991358E8,"page_info": {"f":"D
      [RkWZe...

      Attachments

        1. page_views_convert_asterix.pig
          0.9 kB
          Keren Ouaknine
        2. pigmix.jar
          401 kB
          Keren Ouaknine

        Activity

          People

            Unassigned Unassigned
            keren3000 Keren Ouaknine
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: