I stumbled upon this request while looking for something related. My team at Palantir Technologies (www.palantirtech.com) encountered this exact situation, so I thought I'd add a quick comment or two about our experience.
Parallel execution is supported by JUnit 4.6, using an experimental test running. Although the runner has a bugs and limitations, and is unusable in the 4.6 release. I recently submitted a patch to correct the problems (should make Junit 4.7, but for now my team uses the patched runner with JUnit 4.4). We've used the parallel runner with our continuous suite of ~8K tests, that runs 24/7 without drama. Many of our tests use slow RMI calls, so we saw a near linear increase in speed with our current build box using 16 test threads with roughly a 12x speedup in test time. The only caveat is the occasional uninformed developer writing unit tests not suitable for parallel execution (ie: a static resource used by many test classes), but that problem is usually quickly uncovered.
JUnit does support parameter/data testing, using the Parameterized annotation. The feature is awesome for randomized fuzz testing, especially if you have lots of code implementing only a few interfaces.
Test categorization is not built into JUnit 4, although JUnitExt supports it. If pulling in another library dependency is undesirable, you can always use multiple root test suites (ie: continuous, nightly, smoke, etc). It's simple and works, but every once in a while someone will check in a new test and forget to add it to a suite.
JUnit 4 is literally a drop in solution (assuming one is testing with a Java 5 VM), while TestNG requires at least some change.
Ultimately, we decided TestNG gave very little benefit over JUnit + extensions (dare I say no benefit?), and decided to stick with JUnit as it has better adoption and devs are already familiar with it.