Sqoop
  1. Sqoop
  2. SQOOP-428

AvroOutputFormat doesn't support compression even though documentation claims it does

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 1.4.0-incubating
    • Fix Version/s: 1.4.1-incubating
    • Component/s: docs
    • Labels:

      Description

      The documentation claims that Avro files can be compressed as well:

      By default, data is not compressed. You can compress your data by using the deflate (gzip) algorithm with the -z or --compress argument, or specify any Hadoop compression codec using the --compression-codec argument. This applies to SequenceFile, text, and Avro files.

      This is not true as the AvroOutputFormat currently doesn't support compression.

      1. TEST-com.cloudera.sqoop.TestAvroImport.txt
        690 kB
        Jarek Jarcec Cecho
      2. SQOOP-428.4.patch
        16 kB
        Lars Francke
      3. SQOOP-428.3.patch
        16 kB
        Lars Francke
      4. SQOOP-428.2.patch
        11 kB
        Lars Francke
      5. SQOOP-428.1.patch
        8 kB
        Lars Francke

        Activity

        Hide
        Lars Francke added a comment -

        I think I can provide a patch for this. The current AvroOutputFormat seems to be a copy from the Avro project or a very similar one at least. So I ported a newer version of it to Sqoop and changed all the package names etc. and it works in my tests.

        I'll need to package it up as a patch and probably write a test so I need to figure out how those work first.

        Show
        Lars Francke added a comment - I think I can provide a patch for this. The current AvroOutputFormat seems to be a copy from the Avro project or a very similar one at least. So I ported a newer version of it to Sqoop and changed all the package names etc. and it works in my tests. I'll need to package it up as a patch and probably write a test so I need to figure out how those work first.
        Hide
        Jarek Jarcec Cecho added a comment -

        That would be greatly appreciate sir. If you would have resources to make full patch in week or so, I would definitely like to include it into 1.4.1 release.

        Thank you very much for your time spent on this issue,
        Jarcec

        Show
        Jarek Jarcec Cecho added a comment - That would be greatly appreciate sir. If you would have resources to make full patch in week or so, I would definitely like to include it into 1.4.1 release. Thank you very much for your time spent on this issue, Jarcec
        Hide
        Tom White added a comment -

        Thanks for working on this Lars.

        > The current AvroOutputFormat seems to be a copy from the Avro project or a very similar one at least.

        Avro doesn't support the new MapReduce API yet, which is what Sqoop uses, hence the AvroOutputFormat in Sqoop. When an Avro release does support the new API (AVRO-593) Sqoop should use that.

        For the test, I would just have a variant of TestAvroImport#testAvroImport() that has compression enabled.

        Show
        Tom White added a comment - Thanks for working on this Lars. > The current AvroOutputFormat seems to be a copy from the Avro project or a very similar one at least. Avro doesn't support the new MapReduce API yet, which is what Sqoop uses, hence the AvroOutputFormat in Sqoop. When an Avro release does support the new API ( AVRO-593 ) Sqoop should use that. For the test, I would just have a variant of TestAvroImport#testAvroImport() that has compression enabled.
        Hide
        Lars Francke added a comment -

        Thanks for the comments and hints.

        I'll try to make a patch for this tomorrow. We need this ourselves and it's much easier to have this in the official distribution.

        Show
        Lars Francke added a comment - Thanks for the comments and hints. I'll try to make a patch for this tomorrow. We need this ourselves and it's much easier to have this in the official distribution.
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/3600/
        -----------------------------------------------------------

        Review request for Sqoop.

        Summary
        -------

        This basically only ports all the code from Avro's (1.5.4) AvroOutputFormat to the new MR API.

        I've changed the test to extract the common functionality into a helper method because they are the same apart from the two command line arguments.

        I could have deleted AvroJob completely but as I was told last time that binary compatibility needs to be maintained I left it in. It's not needed anymore as all necessary functionality can be gotten from Avro's own version of that file as far as I can tell. So if it's okay to delete that redundant file (two actually, cloudera and apache package) let me know and I'll provide a new patch.

        This addresses bug SQOOP-428.
        https://issues.apache.org/jira/browse/SQOOP-428

        Diffs


        src/java/org/apache/sqoop/mapreduce/AvroJob.java a57aaf1
        src/java/org/apache/sqoop/mapreduce/AvroOutputFormat.java 96befd7
        src/test/com/cloudera/sqoop/TestAvroImport.java 1b8b046

        Diff: https://reviews.apache.org/r/3600/diff

        Testing
        -------

        All tests pass for hadoopversion=20 but TestColumnTypes fails for me on 23. I can't see how that's related though.

        Thanks,

        Lars

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3600/ ----------------------------------------------------------- Review request for Sqoop. Summary ------- This basically only ports all the code from Avro's (1.5.4) AvroOutputFormat to the new MR API. I've changed the test to extract the common functionality into a helper method because they are the same apart from the two command line arguments. I could have deleted AvroJob completely but as I was told last time that binary compatibility needs to be maintained I left it in. It's not needed anymore as all necessary functionality can be gotten from Avro's own version of that file as far as I can tell. So if it's okay to delete that redundant file (two actually, cloudera and apache package) let me know and I'll provide a new patch. This addresses bug SQOOP-428 . https://issues.apache.org/jira/browse/SQOOP-428 Diffs src/java/org/apache/sqoop/mapreduce/AvroJob.java a57aaf1 src/java/org/apache/sqoop/mapreduce/AvroOutputFormat.java 96befd7 src/test/com/cloudera/sqoop/TestAvroImport.java 1b8b046 Diff: https://reviews.apache.org/r/3600/diff Testing ------- All tests pass for hadoopversion=20 but TestColumnTypes fails for me on 23. I can't see how that's related though. Thanks, Lars
        Hide
        Lars Francke added a comment -

        Attached patch that's currently up for Review on https://reviews.apache.org/r/3600/ as well.

        Show
        Lars Francke added a comment - Attached patch that's currently up for Review on https://reviews.apache.org/r/3600/ as well.
        Hide
        Lars Francke added a comment -

        My patch has four Checkstyle violations. I'll upload a fixed one pending the review.

        Show
        Lars Francke added a comment - My patch has four Checkstyle violations. I'll upload a fixed one pending the review.
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/3600/#review4566
        -----------------------------------------------------------

        Looks good. Have you run manual tests with it too?

        src/test/com/cloudera/sqoop/TestAvroImport.java
        <https://reviews.apache.org/r/3600/#comment10194>

        You should check that the files that are written are compressed (by looking at DataFileReader's metadata).

        We also need a test for --compress.

        • Tom

        On 2012-01-24 14:07:58, Lars Francke wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/3600/

        -----------------------------------------------------------

        (Updated 2012-01-24 14:07:58)

        Review request for Sqoop.

        Summary

        -------

        This basically only ports all the code from Avro's (1.5.4) AvroOutputFormat to the new MR API.

        I've changed the test to extract the common functionality into a helper method because they are the same apart from the two command line arguments.

        I could have deleted AvroJob completely but as I was told last time that binary compatibility needs to be maintained I left it in. It's not needed anymore as all necessary functionality can be gotten from Avro's own version of that file as far as I can tell. So if it's okay to delete that redundant file (two actually, cloudera and apache package) let me know and I'll provide a new patch.

        This addresses bug SQOOP-428.

        https://issues.apache.org/jira/browse/SQOOP-428

        Diffs

        -----

        src/java/org/apache/sqoop/mapreduce/AvroJob.java a57aaf1

        src/java/org/apache/sqoop/mapreduce/AvroOutputFormat.java 96befd7

        src/test/com/cloudera/sqoop/TestAvroImport.java 1b8b046

        Diff: https://reviews.apache.org/r/3600/diff

        Testing

        -------

        All tests pass for hadoopversion=20 but TestColumnTypes fails for me on 23. I can't see how that's related though.

        Thanks,

        Lars

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3600/#review4566 ----------------------------------------------------------- Looks good. Have you run manual tests with it too? src/test/com/cloudera/sqoop/TestAvroImport.java < https://reviews.apache.org/r/3600/#comment10194 > You should check that the files that are written are compressed (by looking at DataFileReader's metadata). We also need a test for --compress. Tom On 2012-01-24 14:07:58, Lars Francke wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3600/ ----------------------------------------------------------- (Updated 2012-01-24 14:07:58) Review request for Sqoop. Summary ------- This basically only ports all the code from Avro's (1.5.4) AvroOutputFormat to the new MR API. I've changed the test to extract the common functionality into a helper method because they are the same apart from the two command line arguments. I could have deleted AvroJob completely but as I was told last time that binary compatibility needs to be maintained I left it in. It's not needed anymore as all necessary functionality can be gotten from Avro's own version of that file as far as I can tell. So if it's okay to delete that redundant file (two actually, cloudera and apache package) let me know and I'll provide a new patch. This addresses bug SQOOP-428 . https://issues.apache.org/jira/browse/SQOOP-428 Diffs ----- src/java/org/apache/sqoop/mapreduce/AvroJob.java a57aaf1 src/java/org/apache/sqoop/mapreduce/AvroOutputFormat.java 96befd7 src/test/com/cloudera/sqoop/TestAvroImport.java 1b8b046 Diff: https://reviews.apache.org/r/3600/diff Testing ------- All tests pass for hadoopversion=20 but TestColumnTypes fails for me on 23. I can't see how that's related though. Thanks, Lars
        Hide
        jiraposter@reviews.apache.org added a comment -

        On 2012-01-24 21:50:05, Tom White wrote:

        > Looks good. Have you run manual tests with it too?

        Yes, I've used it and manually checked the result (only with Snappy though) and the result is correct (that's when we stumbled upon SQOOP-429)

        On 2012-01-24 21:50:05, Tom White wrote:

        > src/test/com/cloudera/sqoop/TestAvroImport.java, line 89

        > <https://reviews.apache.org/r/3600/diff/1/?file=70555#file70555line89>

        >

        > You should check that the files that are written are compressed (by looking at DataFileReader's metadata).

        >

        > We also need a test for --compress.

        Thanks for the hint. I'll look into it and update the review.

        • Lars

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/3600/#review4566
        -----------------------------------------------------------

        On 2012-01-24 14:07:58, Lars Francke wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/3600/

        -----------------------------------------------------------

        (Updated 2012-01-24 14:07:58)

        Review request for Sqoop.

        Summary

        -------

        This basically only ports all the code from Avro's (1.5.4) AvroOutputFormat to the new MR API.

        I've changed the test to extract the common functionality into a helper method because they are the same apart from the two command line arguments.

        I could have deleted AvroJob completely but as I was told last time that binary compatibility needs to be maintained I left it in. It's not needed anymore as all necessary functionality can be gotten from Avro's own version of that file as far as I can tell. So if it's okay to delete that redundant file (two actually, cloudera and apache package) let me know and I'll provide a new patch.

        This addresses bug SQOOP-428.

        https://issues.apache.org/jira/browse/SQOOP-428

        Diffs

        -----

        src/java/org/apache/sqoop/mapreduce/AvroJob.java a57aaf1

        src/java/org/apache/sqoop/mapreduce/AvroOutputFormat.java 96befd7

        src/test/com/cloudera/sqoop/TestAvroImport.java 1b8b046

        Diff: https://reviews.apache.org/r/3600/diff

        Testing

        -------

        All tests pass for hadoopversion=20 but TestColumnTypes fails for me on 23. I can't see how that's related though.

        Thanks,

        Lars

        Show
        jiraposter@reviews.apache.org added a comment - On 2012-01-24 21:50:05, Tom White wrote: > Looks good. Have you run manual tests with it too? Yes, I've used it and manually checked the result (only with Snappy though) and the result is correct (that's when we stumbled upon SQOOP-429 ) On 2012-01-24 21:50:05, Tom White wrote: > src/test/com/cloudera/sqoop/TestAvroImport.java, line 89 > < https://reviews.apache.org/r/3600/diff/1/?file=70555#file70555line89 > > > You should check that the files that are written are compressed (by looking at DataFileReader's metadata). > > We also need a test for --compress. Thanks for the hint. I'll look into it and update the review. Lars ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3600/#review4566 ----------------------------------------------------------- On 2012-01-24 14:07:58, Lars Francke wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3600/ ----------------------------------------------------------- (Updated 2012-01-24 14:07:58) Review request for Sqoop. Summary ------- This basically only ports all the code from Avro's (1.5.4) AvroOutputFormat to the new MR API. I've changed the test to extract the common functionality into a helper method because they are the same apart from the two command line arguments. I could have deleted AvroJob completely but as I was told last time that binary compatibility needs to be maintained I left it in. It's not needed anymore as all necessary functionality can be gotten from Avro's own version of that file as far as I can tell. So if it's okay to delete that redundant file (two actually, cloudera and apache package) let me know and I'll provide a new patch. This addresses bug SQOOP-428 . https://issues.apache.org/jira/browse/SQOOP-428 Diffs ----- src/java/org/apache/sqoop/mapreduce/AvroJob.java a57aaf1 src/java/org/apache/sqoop/mapreduce/AvroOutputFormat.java 96befd7 src/test/com/cloudera/sqoop/TestAvroImport.java 1b8b046 Diff: https://reviews.apache.org/r/3600/diff Testing ------- All tests pass for hadoopversion=20 but TestColumnTypes fails for me on 23. I can't see how that's related though. Thanks, Lars
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/3600/
        -----------------------------------------------------------

        (Updated 2012-01-26 15:43:36.445052)

        Review request for Sqoop.

        Changes
        -------

        Updated after Tom's review.

        The first one didn't actually work.

        Summary
        -------

        This basically only ports all the code from Avro's (1.5.4) AvroOutputFormat to the new MR API.

        I've changed the test to extract the common functionality into a helper method because they are the same apart from the two command line arguments.

        I could have deleted AvroJob completely but as I was told last time that binary compatibility needs to be maintained I left it in. It's not needed anymore as all necessary functionality can be gotten from Avro's own version of that file as far as I can tell. So if it's okay to delete that redundant file (two actually, cloudera and apache package) let me know and I'll provide a new patch.

        This addresses bug SQOOP-428.
        https://issues.apache.org/jira/browse/SQOOP-428

        Diffs (updated)


        src/docs/user/import.txt deddb1a
        src/java/org/apache/sqoop/mapreduce/AvroJob.java a57aaf1
        src/java/org/apache/sqoop/mapreduce/AvroOutputFormat.java 96befd7
        src/java/org/apache/sqoop/mapreduce/ImportJobBase.java ed6954a
        src/test/com/cloudera/sqoop/TestAvroImport.java 1b8b046

        Diff: https://reviews.apache.org/r/3600/diff

        Testing
        -------

        All tests pass for hadoopversion=20 but TestColumnTypes fails for me on 23. I can't see how that's related though.

        Thanks,

        Lars

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3600/ ----------------------------------------------------------- (Updated 2012-01-26 15:43:36.445052) Review request for Sqoop. Changes ------- Updated after Tom's review. The first one didn't actually work. Summary ------- This basically only ports all the code from Avro's (1.5.4) AvroOutputFormat to the new MR API. I've changed the test to extract the common functionality into a helper method because they are the same apart from the two command line arguments. I could have deleted AvroJob completely but as I was told last time that binary compatibility needs to be maintained I left it in. It's not needed anymore as all necessary functionality can be gotten from Avro's own version of that file as far as I can tell. So if it's okay to delete that redundant file (two actually, cloudera and apache package) let me know and I'll provide a new patch. This addresses bug SQOOP-428 . https://issues.apache.org/jira/browse/SQOOP-428 Diffs (updated) src/docs/user/import.txt deddb1a src/java/org/apache/sqoop/mapreduce/AvroJob.java a57aaf1 src/java/org/apache/sqoop/mapreduce/AvroOutputFormat.java 96befd7 src/java/org/apache/sqoop/mapreduce/ImportJobBase.java ed6954a src/test/com/cloudera/sqoop/TestAvroImport.java 1b8b046 Diff: https://reviews.apache.org/r/3600/diff Testing ------- All tests pass for hadoopversion=20 but TestColumnTypes fails for me on 23. I can't see how that's related though. Thanks, Lars
        Hide
        Lars Francke added a comment -

        Updated Patch as on ReviewBoard

        Show
        Lars Francke added a comment - Updated Patch as on ReviewBoard
        Hide
        Lars Francke added a comment -

        Turns out my patch actually contains a big flaw which Tom pointed out.

        Avro expects the codec to be specified as either null, deflate or snappy and it doesn't accept full Codec class names. I've changed the code and the documentation to point this out and also amended the test as well as fixing those Checkstyle issues.

        Updated patch as per ReviewBoard

        Show
        Lars Francke added a comment - Turns out my patch actually contains a big flaw which Tom pointed out. Avro expects the codec to be specified as either null , deflate or snappy and it doesn't accept full Codec class names. I've changed the code and the documentation to point this out and also amended the test as well as fixing those Checkstyle issues. Updated patch as per ReviewBoard
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/3600/#review4666
        -----------------------------------------------------------

        +1 Looks good to me.

        src/docs/user/import.txt
        <https://reviews.apache.org/r/3600/#comment10383>

        Should we also support the full classname as an alias so we can avoid this note?

        See also https://issues.apache.org/jira/browse/HADOOP-7323 which will support short names in Hadoop too (from 23 I believe).

        • Tom

        On 2012-01-26 15:43:36, Lars Francke wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/3600/

        -----------------------------------------------------------

        (Updated 2012-01-26 15:43:36)

        Review request for Sqoop.

        Summary

        -------

        This basically only ports all the code from Avro's (1.5.4) AvroOutputFormat to the new MR API.

        I've changed the test to extract the common functionality into a helper method because they are the same apart from the two command line arguments.

        I could have deleted AvroJob completely but as I was told last time that binary compatibility needs to be maintained I left it in. It's not needed anymore as all necessary functionality can be gotten from Avro's own version of that file as far as I can tell. So if it's okay to delete that redundant file (two actually, cloudera and apache package) let me know and I'll provide a new patch.

        This addresses bug SQOOP-428.

        https://issues.apache.org/jira/browse/SQOOP-428

        Diffs

        -----

        src/docs/user/import.txt deddb1a

        src/java/org/apache/sqoop/mapreduce/AvroJob.java a57aaf1

        src/java/org/apache/sqoop/mapreduce/AvroOutputFormat.java 96befd7

        src/java/org/apache/sqoop/mapreduce/ImportJobBase.java ed6954a

        src/test/com/cloudera/sqoop/TestAvroImport.java 1b8b046

        Diff: https://reviews.apache.org/r/3600/diff

        Testing

        -------

        All tests pass for hadoopversion=20 but TestColumnTypes fails for me on 23. I can't see how that's related though.

        Thanks,

        Lars

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3600/#review4666 ----------------------------------------------------------- +1 Looks good to me. src/docs/user/import.txt < https://reviews.apache.org/r/3600/#comment10383 > Should we also support the full classname as an alias so we can avoid this note? See also https://issues.apache.org/jira/browse/HADOOP-7323 which will support short names in Hadoop too (from 23 I believe). Tom On 2012-01-26 15:43:36, Lars Francke wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3600/ ----------------------------------------------------------- (Updated 2012-01-26 15:43:36) Review request for Sqoop. Summary ------- This basically only ports all the code from Avro's (1.5.4) AvroOutputFormat to the new MR API. I've changed the test to extract the common functionality into a helper method because they are the same apart from the two command line arguments. I could have deleted AvroJob completely but as I was told last time that binary compatibility needs to be maintained I left it in. It's not needed anymore as all necessary functionality can be gotten from Avro's own version of that file as far as I can tell. So if it's okay to delete that redundant file (two actually, cloudera and apache package) let me know and I'll provide a new patch. This addresses bug SQOOP-428 . https://issues.apache.org/jira/browse/SQOOP-428 Diffs ----- src/docs/user/import.txt deddb1a src/java/org/apache/sqoop/mapreduce/AvroJob.java a57aaf1 src/java/org/apache/sqoop/mapreduce/AvroOutputFormat.java 96befd7 src/java/org/apache/sqoop/mapreduce/ImportJobBase.java ed6954a src/test/com/cloudera/sqoop/TestAvroImport.java 1b8b046 Diff: https://reviews.apache.org/r/3600/diff Testing ------- All tests pass for hadoopversion=20 but TestColumnTypes fails for me on 23. I can't see how that's related though. Thanks, Lars
        Hide
        jiraposter@reviews.apache.org added a comment -

        On 2012-01-28 00:13:35, Tom White wrote:

        > src/docs/user/import.txt, line 390

        > <https://reviews.apache.org/r/3600/diff/2/?file=70839#file70839line390>

        >

        > Should we also support the full classname as an alias so we can avoid this note?

        >

        > See also https://issues.apache.org/jira/browse/HADOOP-7323 which will support short names in Hadoop too (from 23 I believe).

        Good idea, shouldn't be that hard I hope. I'll look into implementing someting like "getShortName" in Sqoop's CodecMap.

        I'll try to submit a new patch by Monday.

        • Lars

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/3600/#review4666
        -----------------------------------------------------------

        On 2012-01-26 15:43:36, Lars Francke wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/3600/

        -----------------------------------------------------------

        (Updated 2012-01-26 15:43:36)

        Review request for Sqoop.

        Summary

        -------

        This basically only ports all the code from Avro's (1.5.4) AvroOutputFormat to the new MR API.

        I've changed the test to extract the common functionality into a helper method because they are the same apart from the two command line arguments.

        I could have deleted AvroJob completely but as I was told last time that binary compatibility needs to be maintained I left it in. It's not needed anymore as all necessary functionality can be gotten from Avro's own version of that file as far as I can tell. So if it's okay to delete that redundant file (two actually, cloudera and apache package) let me know and I'll provide a new patch.

        This addresses bug SQOOP-428.

        https://issues.apache.org/jira/browse/SQOOP-428

        Diffs

        -----

        src/docs/user/import.txt deddb1a

        src/java/org/apache/sqoop/mapreduce/AvroJob.java a57aaf1

        src/java/org/apache/sqoop/mapreduce/AvroOutputFormat.java 96befd7

        src/java/org/apache/sqoop/mapreduce/ImportJobBase.java ed6954a

        src/test/com/cloudera/sqoop/TestAvroImport.java 1b8b046

        Diff: https://reviews.apache.org/r/3600/diff

        Testing

        -------

        All tests pass for hadoopversion=20 but TestColumnTypes fails for me on 23. I can't see how that's related though.

        Thanks,

        Lars

        Show
        jiraposter@reviews.apache.org added a comment - On 2012-01-28 00:13:35, Tom White wrote: > src/docs/user/import.txt, line 390 > < https://reviews.apache.org/r/3600/diff/2/?file=70839#file70839line390 > > > Should we also support the full classname as an alias so we can avoid this note? > > See also https://issues.apache.org/jira/browse/HADOOP-7323 which will support short names in Hadoop too (from 23 I believe). Good idea, shouldn't be that hard I hope. I'll look into implementing someting like "getShortName" in Sqoop's CodecMap. I'll try to submit a new patch by Monday. Lars ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3600/#review4666 ----------------------------------------------------------- On 2012-01-26 15:43:36, Lars Francke wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3600/ ----------------------------------------------------------- (Updated 2012-01-26 15:43:36) Review request for Sqoop. Summary ------- This basically only ports all the code from Avro's (1.5.4) AvroOutputFormat to the new MR API. I've changed the test to extract the common functionality into a helper method because they are the same apart from the two command line arguments. I could have deleted AvroJob completely but as I was told last time that binary compatibility needs to be maintained I left it in. It's not needed anymore as all necessary functionality can be gotten from Avro's own version of that file as far as I can tell. So if it's okay to delete that redundant file (two actually, cloudera and apache package) let me know and I'll provide a new patch. This addresses bug SQOOP-428 . https://issues.apache.org/jira/browse/SQOOP-428 Diffs ----- src/docs/user/import.txt deddb1a src/java/org/apache/sqoop/mapreduce/AvroJob.java a57aaf1 src/java/org/apache/sqoop/mapreduce/AvroOutputFormat.java 96befd7 src/java/org/apache/sqoop/mapreduce/ImportJobBase.java ed6954a src/test/com/cloudera/sqoop/TestAvroImport.java 1b8b046 Diff: https://reviews.apache.org/r/3600/diff Testing ------- All tests pass for hadoopversion=20 but TestColumnTypes fails for me on 23. I can't see how that's related though. Thanks, Lars
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/3600/
        -----------------------------------------------------------

        (Updated 2012-01-31 09:50:52.475912)

        Review request for Sqoop.

        Changes
        -------

        Adds the option of providing a Codec class name as well.

        Summary
        -------

        This basically only ports all the code from Avro's (1.5.4) AvroOutputFormat to the new MR API.

        I've changed the test to extract the common functionality into a helper method because they are the same apart from the two command line arguments.

        I could have deleted AvroJob completely but as I was told last time that binary compatibility needs to be maintained I left it in. It's not needed anymore as all necessary functionality can be gotten from Avro's own version of that file as far as I can tell. So if it's okay to delete that redundant file (two actually, cloudera and apache package) let me know and I'll provide a new patch.

        This addresses bug SQOOP-428.
        https://issues.apache.org/jira/browse/SQOOP-428

        Diffs (updated)


        src/java/com/cloudera/sqoop/io/CodecMap.java ffe949b
        src/java/org/apache/sqoop/io/CodecMap.java 5b67206
        src/java/org/apache/sqoop/mapreduce/AvroJob.java a57aaf1
        src/java/org/apache/sqoop/mapreduce/AvroOutputFormat.java 96befd7
        src/java/org/apache/sqoop/mapreduce/ImportJobBase.java ed6954a
        src/test/com/cloudera/sqoop/TestAvroImport.java 1b8b046
        src/test/com/cloudera/sqoop/io/TestCodecMap.java f2f4039

        Diff: https://reviews.apache.org/r/3600/diff

        Testing
        -------

        All tests pass for hadoopversion=20 but TestColumnTypes fails for me on 23. I can't see how that's related though.

        Thanks,

        Lars

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3600/ ----------------------------------------------------------- (Updated 2012-01-31 09:50:52.475912) Review request for Sqoop. Changes ------- Adds the option of providing a Codec class name as well. Summary ------- This basically only ports all the code from Avro's (1.5.4) AvroOutputFormat to the new MR API. I've changed the test to extract the common functionality into a helper method because they are the same apart from the two command line arguments. I could have deleted AvroJob completely but as I was told last time that binary compatibility needs to be maintained I left it in. It's not needed anymore as all necessary functionality can be gotten from Avro's own version of that file as far as I can tell. So if it's okay to delete that redundant file (two actually, cloudera and apache package) let me know and I'll provide a new patch. This addresses bug SQOOP-428 . https://issues.apache.org/jira/browse/SQOOP-428 Diffs (updated) src/java/com/cloudera/sqoop/io/CodecMap.java ffe949b src/java/org/apache/sqoop/io/CodecMap.java 5b67206 src/java/org/apache/sqoop/mapreduce/AvroJob.java a57aaf1 src/java/org/apache/sqoop/mapreduce/AvroOutputFormat.java 96befd7 src/java/org/apache/sqoop/mapreduce/ImportJobBase.java ed6954a src/test/com/cloudera/sqoop/TestAvroImport.java 1b8b046 src/test/com/cloudera/sqoop/io/TestCodecMap.java f2f4039 Diff: https://reviews.apache.org/r/3600/diff Testing ------- All tests pass for hadoopversion=20 but TestColumnTypes fails for me on 23. I can't see how that's related though. Thanks, Lars
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/3600/#review4701
        -----------------------------------------------------------

        Ship it!

        Looks good to me. Thanks for updating the patch!

        • Tom

        On 2012-01-31 09:50:52, Lars Francke wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/3600/

        -----------------------------------------------------------

        (Updated 2012-01-31 09:50:52)

        Review request for Sqoop.

        Summary

        -------

        This basically only ports all the code from Avro's (1.5.4) AvroOutputFormat to the new MR API.

        I've changed the test to extract the common functionality into a helper method because they are the same apart from the two command line arguments.

        I could have deleted AvroJob completely but as I was told last time that binary compatibility needs to be maintained I left it in. It's not needed anymore as all necessary functionality can be gotten from Avro's own version of that file as far as I can tell. So if it's okay to delete that redundant file (two actually, cloudera and apache package) let me know and I'll provide a new patch.

        This addresses bug SQOOP-428.

        https://issues.apache.org/jira/browse/SQOOP-428

        Diffs

        -----

        src/java/com/cloudera/sqoop/io/CodecMap.java ffe949b

        src/java/org/apache/sqoop/io/CodecMap.java 5b67206

        src/java/org/apache/sqoop/mapreduce/AvroJob.java a57aaf1

        src/java/org/apache/sqoop/mapreduce/AvroOutputFormat.java 96befd7

        src/java/org/apache/sqoop/mapreduce/ImportJobBase.java ed6954a

        src/test/com/cloudera/sqoop/TestAvroImport.java 1b8b046

        src/test/com/cloudera/sqoop/io/TestCodecMap.java f2f4039

        Diff: https://reviews.apache.org/r/3600/diff

        Testing

        -------

        All tests pass for hadoopversion=20 but TestColumnTypes fails for me on 23. I can't see how that's related though.

        Thanks,

        Lars

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3600/#review4701 ----------------------------------------------------------- Ship it! Looks good to me. Thanks for updating the patch! Tom On 2012-01-31 09:50:52, Lars Francke wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3600/ ----------------------------------------------------------- (Updated 2012-01-31 09:50:52) Review request for Sqoop. Summary ------- This basically only ports all the code from Avro's (1.5.4) AvroOutputFormat to the new MR API. I've changed the test to extract the common functionality into a helper method because they are the same apart from the two command line arguments. I could have deleted AvroJob completely but as I was told last time that binary compatibility needs to be maintained I left it in. It's not needed anymore as all necessary functionality can be gotten from Avro's own version of that file as far as I can tell. So if it's okay to delete that redundant file (two actually, cloudera and apache package) let me know and I'll provide a new patch. This addresses bug SQOOP-428 . https://issues.apache.org/jira/browse/SQOOP-428 Diffs ----- src/java/com/cloudera/sqoop/io/CodecMap.java ffe949b src/java/org/apache/sqoop/io/CodecMap.java 5b67206 src/java/org/apache/sqoop/mapreduce/AvroJob.java a57aaf1 src/java/org/apache/sqoop/mapreduce/AvroOutputFormat.java 96befd7 src/java/org/apache/sqoop/mapreduce/ImportJobBase.java ed6954a src/test/com/cloudera/sqoop/TestAvroImport.java 1b8b046 src/test/com/cloudera/sqoop/io/TestCodecMap.java f2f4039 Diff: https://reviews.apache.org/r/3600/diff Testing ------- All tests pass for hadoopversion=20 but TestColumnTypes fails for me on 23. I can't see how that's related though. Thanks, Lars
        Hide
        Jarek Jarcec Cecho added a comment -

        Hi Lars,
        I wanted to commit your patch into trunk and 1.4.1 branch, however it seems that tests are failing on version 1.0.0 (ant -Dhadoopversion=100 test. Would you mind checking what's the problem? I did not investigated it further yet.

        Jarcec

        Show
        Jarek Jarcec Cecho added a comment - Hi Lars, I wanted to commit your patch into trunk and 1.4.1 branch, however it seems that tests are failing on version 1.0.0 (ant -Dhadoopversion=100 test. Would you mind checking what's the problem? I did not investigated it further yet. Jarcec
        Hide
        Lars Francke added a comment -

        Hey,

        thanks for the hint. I only tried 20 and 23, sorry. The problem is that 1.0.0 doesn't support Snappy compression out of the box so the test that uses it fails. Because I can't figure out an easy way right now to check for the availability I'll just remove the test. I guess that doesn't need a new review? I'll attach the patch shortly, just running tests against all three versions now. 23 fails because of MAPREDUCE-3736 but it succeeded earlier.

        Show
        Lars Francke added a comment - Hey, thanks for the hint. I only tried 20 and 23, sorry. The problem is that 1.0.0 doesn't support Snappy compression out of the box so the test that uses it fails. Because I can't figure out an easy way right now to check for the availability I'll just remove the test. I guess that doesn't need a new review? I'll attach the patch shortly, just running tests against all three versions now. 23 fails because of MAPREDUCE-3736 but it succeeded earlier.
        Hide
        Lars Francke added a comment -

        Patch that fixes test for Hadoop 1.0.0 by removing reference to Snappy.

        Show
        Lars Francke added a comment - Patch that fixes test for Hadoop 1.0.0 by removing reference to Snappy.
        Hide
        Jarek Jarcec Cecho added a comment -

        Thank you Lars for your quick response. I appreciate your time. I will firstly try to solve failing tests on 0.23 (definitely not your fault).

        I'll try to commit your patch after that. However I would prefer to upload your new patch version to review board, just to be consistent in the procedures. Would you mind?

        Jarcec

        Show
        Jarek Jarcec Cecho added a comment - Thank you Lars for your quick response. I appreciate your time. I will firstly try to solve failing tests on 0.23 (definitely not your fault). I'll try to commit your patch after that. However I would prefer to upload your new patch version to review board, just to be consistent in the procedures. Would you mind? Jarcec
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/3600/
        -----------------------------------------------------------

        (Updated 2012-02-02 14:31:05.367176)

        Review request for Sqoop.

        Changes
        -------

        Updated to remove any mention of Snappy (which isn't available on Hadoop 1.0.0)

        Summary
        -------

        This basically only ports all the code from Avro's (1.5.4) AvroOutputFormat to the new MR API.

        I've changed the test to extract the common functionality into a helper method because they are the same apart from the two command line arguments.

        I could have deleted AvroJob completely but as I was told last time that binary compatibility needs to be maintained I left it in. It's not needed anymore as all necessary functionality can be gotten from Avro's own version of that file as far as I can tell. So if it's okay to delete that redundant file (two actually, cloudera and apache package) let me know and I'll provide a new patch.

        This addresses bug SQOOP-428.
        https://issues.apache.org/jira/browse/SQOOP-428

        Diffs (updated)


        src/java/com/cloudera/sqoop/io/CodecMap.java ffe949b
        src/java/org/apache/sqoop/io/CodecMap.java 5b67206
        src/java/org/apache/sqoop/mapreduce/AvroJob.java a57aaf1
        src/java/org/apache/sqoop/mapreduce/AvroOutputFormat.java 96befd7
        src/java/org/apache/sqoop/mapreduce/ImportJobBase.java ed6954a
        src/test/com/cloudera/sqoop/TestAvroImport.java 06834fc
        src/test/com/cloudera/sqoop/io/TestCodecMap.java f2f4039

        Diff: https://reviews.apache.org/r/3600/diff

        Testing
        -------

        All tests pass for hadoopversion=20 but TestColumnTypes fails for me on 23. I can't see how that's related though.

        Thanks,

        Lars

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3600/ ----------------------------------------------------------- (Updated 2012-02-02 14:31:05.367176) Review request for Sqoop. Changes ------- Updated to remove any mention of Snappy (which isn't available on Hadoop 1.0.0) Summary ------- This basically only ports all the code from Avro's (1.5.4) AvroOutputFormat to the new MR API. I've changed the test to extract the common functionality into a helper method because they are the same apart from the two command line arguments. I could have deleted AvroJob completely but as I was told last time that binary compatibility needs to be maintained I left it in. It's not needed anymore as all necessary functionality can be gotten from Avro's own version of that file as far as I can tell. So if it's okay to delete that redundant file (two actually, cloudera and apache package) let me know and I'll provide a new patch. This addresses bug SQOOP-428 . https://issues.apache.org/jira/browse/SQOOP-428 Diffs (updated) src/java/com/cloudera/sqoop/io/CodecMap.java ffe949b src/java/org/apache/sqoop/io/CodecMap.java 5b67206 src/java/org/apache/sqoop/mapreduce/AvroJob.java a57aaf1 src/java/org/apache/sqoop/mapreduce/AvroOutputFormat.java 96befd7 src/java/org/apache/sqoop/mapreduce/ImportJobBase.java ed6954a src/test/com/cloudera/sqoop/TestAvroImport.java 06834fc src/test/com/cloudera/sqoop/io/TestCodecMap.java f2f4039 Diff: https://reviews.apache.org/r/3600/diff Testing ------- All tests pass for hadoopversion=20 but TestColumnTypes fails for me on 23. I can't see how that's related though. Thanks, Lars
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/3600/#review4821
        -----------------------------------------------------------

        Ship it!

        Thank you for your time spent on this issue sir.

        • Jarek

        On 2012-02-02 14:31:05, Lars Francke wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/3600/

        -----------------------------------------------------------

        (Updated 2012-02-02 14:31:05)

        Review request for Sqoop.

        Summary

        -------

        This basically only ports all the code from Avro's (1.5.4) AvroOutputFormat to the new MR API.

        I've changed the test to extract the common functionality into a helper method because they are the same apart from the two command line arguments.

        I could have deleted AvroJob completely but as I was told last time that binary compatibility needs to be maintained I left it in. It's not needed anymore as all necessary functionality can be gotten from Avro's own version of that file as far as I can tell. So if it's okay to delete that redundant file (two actually, cloudera and apache package) let me know and I'll provide a new patch.

        This addresses bug SQOOP-428.

        https://issues.apache.org/jira/browse/SQOOP-428

        Diffs

        -----

        src/java/com/cloudera/sqoop/io/CodecMap.java ffe949b

        src/java/org/apache/sqoop/io/CodecMap.java 5b67206

        src/java/org/apache/sqoop/mapreduce/AvroJob.java a57aaf1

        src/java/org/apache/sqoop/mapreduce/AvroOutputFormat.java 96befd7

        src/java/org/apache/sqoop/mapreduce/ImportJobBase.java ed6954a

        src/test/com/cloudera/sqoop/TestAvroImport.java 06834fc

        src/test/com/cloudera/sqoop/io/TestCodecMap.java f2f4039

        Diff: https://reviews.apache.org/r/3600/diff

        Testing

        -------

        All tests pass for hadoopversion=20 but TestColumnTypes fails for me on 23. I can't see how that's related though.

        Thanks,

        Lars

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3600/#review4821 ----------------------------------------------------------- Ship it! Thank you for your time spent on this issue sir. Jarek On 2012-02-02 14:31:05, Lars Francke wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3600/ ----------------------------------------------------------- (Updated 2012-02-02 14:31:05) Review request for Sqoop. Summary ------- This basically only ports all the code from Avro's (1.5.4) AvroOutputFormat to the new MR API. I've changed the test to extract the common functionality into a helper method because they are the same apart from the two command line arguments. I could have deleted AvroJob completely but as I was told last time that binary compatibility needs to be maintained I left it in. It's not needed anymore as all necessary functionality can be gotten from Avro's own version of that file as far as I can tell. So if it's okay to delete that redundant file (two actually, cloudera and apache package) let me know and I'll provide a new patch. This addresses bug SQOOP-428 . https://issues.apache.org/jira/browse/SQOOP-428 Diffs ----- src/java/com/cloudera/sqoop/io/CodecMap.java ffe949b src/java/org/apache/sqoop/io/CodecMap.java 5b67206 src/java/org/apache/sqoop/mapreduce/AvroJob.java a57aaf1 src/java/org/apache/sqoop/mapreduce/AvroOutputFormat.java 96befd7 src/java/org/apache/sqoop/mapreduce/ImportJobBase.java ed6954a src/test/com/cloudera/sqoop/TestAvroImport.java 06834fc src/test/com/cloudera/sqoop/io/TestCodecMap.java f2f4039 Diff: https://reviews.apache.org/r/3600/diff Testing ------- All tests pass for hadoopversion=20 but TestColumnTypes fails for me on 23. I can't see how that's related though. Thanks, Lars
        Hide
        Jarek Jarcec Cecho added a comment -

        Patch committed to trunk and branch 1.4.1. Thank you very much for your time Lars.

        Jarcec

        Show
        Jarek Jarcec Cecho added a comment - Patch committed to trunk and branch 1.4.1. Thank you very much for your time Lars. Jarcec
        Hide
        Hudson added a comment -

        Integrated in Sqoop-ant-jdk-1.6 #84 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6/84/)
        SQOOP-428. AvroOutputFormat doesn't support compression even though documentation claims it does

        (Lars Francke via Jarek Jarcec Cecho)

        jarcec : http://svn.apache.org/viewvc/?view=rev&rev=1240613
        Files :

        • /incubator/sqoop/trunk/src/java/com/cloudera/sqoop/io/CodecMap.java
        • /incubator/sqoop/trunk/src/java/org/apache/sqoop/io/CodecMap.java
        • /incubator/sqoop/trunk/src/java/org/apache/sqoop/mapreduce/AvroJob.java
        • /incubator/sqoop/trunk/src/java/org/apache/sqoop/mapreduce/AvroOutputFormat.java
        • /incubator/sqoop/trunk/src/java/org/apache/sqoop/mapreduce/ImportJobBase.java
        • /incubator/sqoop/trunk/src/test/com/cloudera/sqoop/TestAvroImport.java
        • /incubator/sqoop/trunk/src/test/com/cloudera/sqoop/io/TestCodecMap.java
        Show
        Hudson added a comment - Integrated in Sqoop-ant-jdk-1.6 #84 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6/84/ ) SQOOP-428 . AvroOutputFormat doesn't support compression even though documentation claims it does (Lars Francke via Jarek Jarcec Cecho) jarcec : http://svn.apache.org/viewvc/?view=rev&rev=1240613 Files : /incubator/sqoop/trunk/src/java/com/cloudera/sqoop/io/CodecMap.java /incubator/sqoop/trunk/src/java/org/apache/sqoop/io/CodecMap.java /incubator/sqoop/trunk/src/java/org/apache/sqoop/mapreduce/AvroJob.java /incubator/sqoop/trunk/src/java/org/apache/sqoop/mapreduce/AvroOutputFormat.java /incubator/sqoop/trunk/src/java/org/apache/sqoop/mapreduce/ImportJobBase.java /incubator/sqoop/trunk/src/test/com/cloudera/sqoop/TestAvroImport.java /incubator/sqoop/trunk/src/test/com/cloudera/sqoop/io/TestCodecMap.java

          People

          • Assignee:
            Lars Francke
            Reporter:
            Lars Francke
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development