Details

    • Type: Task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.1.0
    • Fix Version/s: 1.0.0
    • Component/s: general
    • Labels:
      None

      Description

      Okay bigtop ! Time to get out the sledgehammers.

      • We can't maintain everything in bigtop at the moment. Hive's metastore and so on, for example, which are the core of it, isn't working.
      • Basically anything that is unmaintained and not being used, seems like it can be up for the chopping block.

      Keep in mind the priorities for BIGTOP-1582, which prevents some stuff from going away.

        Issue Links

          Activity

          Hide
          cos Konstantin Boudnik added a comment -

          I agree - cutting off the rot makes sense!

          Show
          cos Konstantin Boudnik added a comment - I agree - cutting off the rot makes sense!
          Hide
          oflebbe Olaf Flebbe added a comment -

          Hive Metastore is not working ? Can you please elaborate?

          Show
          oflebbe Olaf Flebbe added a comment - Hive Metastore is not working ? Can you please elaborate?
          Hide
          jayunit100 jay vyas added a comment -

          It's not deployed as part of hive puppet - and we don't have tez integration and so on... So we theoretically can package bits but if we don't actually provide something that is idiomatic to what the hive distribution of vendors does, then we aren't really providing something useful. Do you think we can provide first class hive support? If so feel free to dispute or chime in on the bigtop 1x jira, where we are defining the next generation of bigtop to be leaner and easier to maintain .

          Show
          jayunit100 jay vyas added a comment - It's not deployed as part of hive puppet - and we don't have tez integration and so on... So we theoretically can package bits but if we don't actually provide something that is idiomatic to what the hive distribution of vendors does, then we aren't really providing something useful. Do you think we can provide first class hive support? If so feel free to dispute or chime in on the bigtop 1x jira, where we are defining the next generation of bigtop to be leaner and easier to maintain .
          Hide
          evans_ye Evans Ye added a comment -

          In BIGTOP-1179 Olaf is working on adding tez. If we can get the patch in, I personally think we should include hive, however, a consensus should be reached first.
          Olaf if you need a help on the tez review, feel free to ping me.

          Show
          evans_ye Evans Ye added a comment - In BIGTOP-1179 Olaf is working on adding tez. If we can get the patch in, I personally think we should include hive, however, a consensus should be reached first. Olaf if you need a help on the tez review, feel free to ping me.
          Hide
          jayunit100 jay vyas added a comment -

          Is tez or give going to be in bigtop 1x? If so let's discuss how it will play a role and go from there.... And who will be able to maintain it etc

          Show
          jayunit100 jay vyas added a comment - Is tez or give going to be in bigtop 1x? If so let's discuss how it will play a role and go from there.... And who will be able to maintain it etc
          Hide
          cos Konstantin Boudnik added a comment -

          I don't think we should just throw Hive away unless we can provide an alternative SQL engine. And SparkQL isn't any close to be such a solution as far as I know. So, we need a plan.

          Show
          cos Konstantin Boudnik added a comment - I don't think we should just throw Hive away unless we can provide an alternative SQL engine. And SparkQL isn't any close to be such a solution as far as I know. So, we need a plan.
          Hide
          warwithin YoungWoo Kim added a comment -

          I believe Hive has a important role on Hadoop ecosystem. For instance, Hue 3.x depend on Hive metastore through HCatalog. that means if users want to run HCatalog they should have running instance of Hive metastore. Using HCatalog, Hive, Pig and other components share the 'system catalog' in Hadoop.

          Hive and Pig have a capability that plugin their execution engine. e.g., mapreduce, tez or spark. IMO, if Bigtop includes Tez, it would be very useful.

          Show
          warwithin YoungWoo Kim added a comment - I believe Hive has a important role on Hadoop ecosystem. For instance, Hue 3.x depend on Hive metastore through HCatalog. that means if users want to run HCatalog they should have running instance of Hive metastore. Using HCatalog, Hive, Pig and other components share the 'system catalog' in Hadoop. Hive and Pig have a capability that plugin their execution engine. e.g., mapreduce, tez or spark. IMO, if Bigtop includes Tez, it would be very useful.
          Hide
          evans_ye Evans Ye added a comment -

          At a high level view we need a near-real time SQL solution which supports use cases such as reporting, OLAP or Ad-Hoc-query. Hive seems to be a good candidate because of its maturity and wide adoption. Its support in spark, tez, and mr query engine makes it flexible to fit in most of the workload. Although I don't have to much hive expertise, it will be interesting to learn.

          Show
          evans_ye Evans Ye added a comment - At a high level view we need a near-real time SQL solution which supports use cases such as reporting, OLAP or Ad-Hoc-query. Hive seems to be a good candidate because of its maturity and wide adoption. Its support in spark, tez, and mr query engine makes it flexible to fit in most of the workload. Although I don't have to much hive expertise, it will be interesting to learn.
          Hide
          oflebbe Olaf Flebbe added a comment -

          Tez is ready and working productivly at my customers site with hive-0.13. BIGTOP-1179

          I really had no time to dig into vagrant and the bigtop-smoke test framework until now, since there are still a ton of issues in the build process itself right now.

          I do have my own smoke tests ...

          Setting up tez : I recommend following the hortonworks documentation, which is rather good.

          Show
          oflebbe Olaf Flebbe added a comment - Tez is ready and working productivly at my customers site with hive-0.13. BIGTOP-1179 I really had no time to dig into vagrant and the bigtop-smoke test framework until now, since there are still a ton of issues in the build process itself right now. I do have my own smoke tests ... Setting up tez : I recommend following the hortonworks documentation, which is rather good.
          Hide
          oflebbe Olaf Flebbe added a comment -

          After reading all the comments above:

          I have to build a rock stable application on Debian. AFAIK hive is one of the stable QL engines with wide adoption for structured mass data.

          Hive itself is not really near-realtime, but Hive on tez is going into the direction. Integration of hive into hue is really, really nice.

          My customer scenario supports "only" zookeeper, hadoop, hive on tez, hue, oozie and their transitive dependencies. If you are going to axe hive you will lose a big customer audience.

          Show
          oflebbe Olaf Flebbe added a comment - After reading all the comments above: I have to build a rock stable application on Debian. AFAIK hive is one of the stable QL engines with wide adoption for structured mass data. Hive itself is not really near-realtime, but Hive on tez is going into the direction. Integration of hive into hue is really, really nice. My customer scenario supports "only" zookeeper, hadoop, hive on tez, hue, oozie and their transitive dependencies. If you are going to axe hive you will lose a big customer audience.
          Hide
          jayunit100 jay vyas added a comment -

          Thanks Olaf... are you proposing that we use hive in bigtop 1x, and forward?
          If so that's great! Can you put this propos in BIGTOP-1582 , where we are defining the
          Bigtop 1x requirements?
          That is guiding what is on the kill list....

          Show
          jayunit100 jay vyas added a comment - Thanks Olaf... are you proposing that we use hive in bigtop 1x, and forward? If so that's great! Can you put this propos in BIGTOP-1582 , where we are defining the Bigtop 1x requirements? That is guiding what is on the kill list....
          Hide
          cos Konstantin Boudnik added a comment -

          Using Hive and "near-real time" in the same sentence can not be correct I've dealt with Hive for a very long time (although haven't touched it since 0.12 release) and I don't share your view about "maturity". Unless you'd consider a very old crap to be mature.

          Back to the point: Hive might have to stay as it seems to be used here and there quite a bit. Although dealing with it is a pain in the butt, of course - every single release seems to be badly broken in one way or another.

          Show
          cos Konstantin Boudnik added a comment - Using Hive and "near-real time" in the same sentence can not be correct I've dealt with Hive for a very long time (although haven't touched it since 0.12 release) and I don't share your view about "maturity". Unless you'd consider a very old crap to be mature. Back to the point: Hive might have to stay as it seems to be used here and there quite a bit. Although dealing with it is a pain in the butt, of course - every single release seems to be badly broken in one way or another.
          Hide
          oflebbe Olaf Flebbe added a comment - - edited

          Hive on Tez is faster by several orders of magnitude compared to the traditional Hive with LZO or raw data. Together with using the ORC File format (compressed Columnar Storage) performance is very suitable for my customer.

          I used it even for complex analytic queries, sub-queries, joins and it worked. Sometimes the syntax to use was a bit counter-intuitive, but Hive never failed. Only Tez 0.52 had failures under high load, but as far as I checked, these problems are already addressed in newer releases.

          Enterprises tend to use old crap because it is stable. Just think about any RDBMS vendor.

          Show
          oflebbe Olaf Flebbe added a comment - - edited Hive on Tez is faster by several orders of magnitude compared to the traditional Hive with LZO or raw data. Together with using the ORC File format (compressed Columnar Storage) performance is very suitable for my customer. I used it even for complex analytic queries, sub-queries, joins and it worked. Sometimes the syntax to use was a bit counter-intuitive, but Hive never failed. Only Tez 0.52 had failures under high load, but as far as I checked, these problems are already addressed in newer releases. Enterprises tend to use old crap because it is stable. Just think about any RDBMS vendor.
          Hide
          jayunit100 jay vyas added a comment -

          I see where you are coming from. I think this needs to be on the mailing list and this is causing some confusion.

          Lets put this JIRA on hold for now : I think i made it prematurely.

          Thanks for the feedback Olaf Flebbe !

          Show
          jayunit100 jay vyas added a comment - I see where you are coming from. I think this needs to be on the mailing list and this is causing some confusion. Lets put this JIRA on hold for now : I think i made it prematurely. Thanks for the feedback Olaf Flebbe !
          Hide
          jayunit100 jay vyas added a comment - - edited

          Marked as blocked by 1582. please chime in there (or on dev list) where we will define the evolutoin of bigtop, then the kill list can be crafted objectively from that.

          Show
          jayunit100 jay vyas added a comment - - edited Marked as blocked by 1582. please chime in there (or on dev list) where we will define the evolutoin of bigtop, then the kill list can be crafted objectively from that.
          Hide
          evans_ye Evans Ye added a comment - - edited

          You're right and thanks for sharing your experience. I should make it clear that it is hive+tez makes it a near-real time solution. I don't have to much hive experience before, but our team's experiments on hive+tez indicating a good performance. Although Hive+tez does not as "near" as impala or other dremel like solutions do, the sexiest part is it is just a library, avoiding the need to spin up bunch of daemons to propagate queries and computing results. The stinger.next might also reinforce hive+tez in the future.

          Show
          evans_ye Evans Ye added a comment - - edited You're right and thanks for sharing your experience. I should make it clear that it is hive+tez makes it a near-real time solution. I don't have to much hive experience before, but our team's experiments on hive+tez indicating a good performance. Although Hive+tez does not as "near" as impala or other dremel like solutions do, the sexiest part is it is just a library, avoiding the need to spin up bunch of daemons to propagate queries and computing results. The stinger.next might also reinforce hive+tez in the future.
          Hide
          cos Konstantin Boudnik added a comment -

          You know it's interesting... Bigtop has Hadoop Accelerator from GridGain that seamlessly boosts any MR jobs (including Hive's). It is done by running the loads on non-Hadoop MR, which has all the benefits of MR, but non of it silly design flaws, like excessive shuffle, forced in-between phases persistence, etc. Basically, a lot of things that Spark is know for with one huge benefit: one doesn't need to change MR app code to run on Hadoop Accelerator. So, perhaps it might be a decent alternative to Hive+Tez?

          Show
          cos Konstantin Boudnik added a comment - You know it's interesting... Bigtop has Hadoop Accelerator from GridGain that seamlessly boosts any MR jobs (including Hive's). It is done by running the loads on non-Hadoop MR, which has all the benefits of MR, but non of it silly design flaws, like excessive shuffle, forced in-between phases persistence, etc. Basically, a lot of things that Spark is know for with one huge benefit: one doesn't need to change MR app code to run on Hadoop Accelerator. So, perhaps it might be a decent alternative to Hive+Tez?
          Hide
          jayunit100 jay vyas added a comment -

          I agree that hive/yarn/tez probably isnt a requirement .... but more importantly i think im CLOSING in favor of the MAINTAINERS.txt file. feel free to reopen if you think we still need to explisitly do a kill list !

          Show
          jayunit100 jay vyas added a comment - I agree that hive/yarn/tez probably isnt a requirement .... but more importantly i think im CLOSING in favor of the MAINTAINERS.txt file. feel free to reopen if you think we still need to explisitly do a kill list !
          Hide
          jayunit100 jay vyas added a comment -

          closed see BIGTOP-1604 and add yourself as a maintainer for a component... if you want to keep any particular things alive !

          Show
          jayunit100 jay vyas added a comment - closed see BIGTOP-1604 and add yourself as a maintainer for a component... if you want to keep any particular things alive !

            People

            • Assignee:
              Unassigned
              Reporter:
              jayunit100 jay vyas
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development