Uploaded image for project: 'Crunch'
  1. Crunch
  2. CRUNCH-627

Shard API doesn't work well with parquet target

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 0.13.0
    • Fix Version/s: 0.13.0
    • Component/s: MapReduce Patterns
    • Labels:
    • Environment:
      Linux X86
    • Flags:
      Important

      Description

      PCollection<User> outTable = oldTable.union(newTable);
      Shard.shard(outTable,10).write(new AvroParquetFileTarget(tempOut+path), Target.WriteMode.OVERWRITE);

      However, I have another job which would read the output of above target output and use a field as the key , the job output looks like below
      3.0.3.1.2.CH24_RELEASE 2
      3.0.3.1.2.CH24_RELEASEE 1
      3.0.3.1.2.CH24_RELEASEEA 1
      3.0.3.1.2.CH24_RELEASEEAS 1
      3.0.3.1.2.CH24_RELEASEEASE 29
      3.0.3.1.2.CH24_RELEASEEASES 160
      3.0.3.1.2.CH24_RELEASEEASESE 85
      3.0.3.1.2.CH24_RELEASEEASESEE 14
      3.0.3.1.2.CH24_RELEASEEASESEEE 4
      3.0.3.1.2.CH24_RELEASEEASESEEES 1
      there is extra suffix added to the key of the PTable, all of them
      should be RELEASE but not the RELEASEEASE bra bra

      If I remove the Shard, and keeps all the same, the output looks like normal
      3.0.0.1.2.CH.1.4_RELEASE 1
      3.0.1.1.2.CH22_RELEASE 1622
      3.0.1.1.2.CH23_RELEASE 10607
      3.0.14.1.2.CH.1.3_RELEASE 18080
      3.0.19.1.2.TC21_RELEASE 5
      3.0.2.1.2.CH11_RELEASE 3
      3.0.2.1.2.TC21_RELEASE 4
      3.0.20.1.2.TC21_RELEASE 247
      3.0.20.7.2.SX.1.2A_RELEASE 2
      3.0.20.8.2.SX.1.3A_RELEASE 1

        Attachments

        1. CRUNCH-627.patch
          4 kB
          Josh Wills

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              leenuxwu Tony Wu
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: