Thejas M Nair Thank you for your comment, I am going to agree and disagree with your for my prospective on this issue.
- I use hive_test to tests my udfs https://github.com/edwardcapriolo/hive_test
- At one point we added a plugin developer kit to hive which allowed annotation based testing of UDFS
At one point this was removed, there were reports that it was flakey and I was not paying much attention at that time, but I probably would have advocated that it not be removed.
Now, I do agree with you that we can get better coverage of some things outside end-to-end tests, but believe it or not functions are not one of them.
Why do I say this? A few reasons:
- Most functions are not functional.
- They actually have state, conf at initialization, reusable objects shared between calls to evaluate.
- UDAFs have entire aggregation buffers systems.
To your specific points
1) Welcome to my life, I have been complaining about our test infrastructure for years. Honestly now that we have a build system we can test udf's fairly fast, and there is not a huge volume of them anyway.
2) That can be true, again I use hive_test and I am not against having units + end-to-end tests
3) I agree with this to an extent, but even in a real unit test one still has to write Assert.assertEquals( something, somethingElse ) so you still eyeball something. From a review standpoints it's easier to eyeball the .out then tens or hundreds of asserts.
Again I am not against having more traditionally unit tests and writing code in functional style that is easier to document and and reason about, but I think to cover all the corner cases of exceptions and cleaning out private state properly the unit tests will be more ugly then the q tests.
I am talking on hive-dev about the project split up. This is one of the things I want to do, move all the end-to-end test to a final project and really step up the unit style testing.
There is lots of things we can do to make the tests faster
- move all the UDFs into 1 big test save the overhead of launching multiple tests
- optimize 'select udf(column) from table limit 1' <-- we should be able to make that test scream
Anyway unlike the past where stuff like this sits on the queue forever we now have a build bot and I am dedicated to seeing patches reviewed and committed fast (especially those like these)
BTW at minimum there is show_functions.q, so every time you add a function you at least have to touch that test.