Uploaded image for project: 'Phoenix'
  1. Phoenix
  2. PHOENIX-3744

Support snapshot scanners for MR-based queries

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.11.0
    • Labels:
      None

      Description

      HBase support scanning over snapshots, with a SnapshotScanner that accesses the region directly in HDFS. We should make sure that Phoenix can support that.

      Not sure how we'd want to decide when to run a query over a snapshot. Some ideas:

      • if there's an SCN set (i.e. the query is running at a point in time in the past)
      • if the memstore is empty
      • if the query is being run at a timestamp earlier than any memstore data
      • as a config option on the table
      • as a query hint
      • based on some kind of optimizer rule (i.e. based on estimated # of bytes that will be scanned)

      Phoenix typically runs a query at the timestamp at which it was compiled. Any data committed after this time should not be seen while a query is running.

      1. PHOENIX-3744.patch
        120 kB
        Akshita Malhotra
      2. PHOENIX-3744.patch
        107 kB
        Akshita Malhotra
      3. PHOENIX-3744.patch
        107 kB
        Akshita Malhotra
      4. PHOENIX-3744-4.x-HBase-0.98.patch
        113 kB
        Akshita Malhotra
      5. PHOENIX-3744-4.x-HBase-1.1.patch
        108 kB
        Akshita Malhotra

        Issue Links

          Activity

          Hide
          jamestaylor James Taylor added a comment -

          Key questions:

          • Do we handle the case in which there's unflushed changes in the memstore for the table being read? Maybe we don't care - at most you'd see 1 hour old data.
          • How do we indicate to Phoenix that it should do a snapshot read as opposed to it's regular read path? Maybe the MR read path (used by Pig and Spark) always (or optionally) does a snapshot read?
          • What about our Phoenix coprocessors (which I believe will be bypassed if we use HDFS snapshot reads)? If snapshot reads are only for the MR path, we might not need them. If we do need them, we can probably wrap the scanner as needed, but there might be some refactoring required.
          Show
          jamestaylor James Taylor added a comment - Key questions: Do we handle the case in which there's unflushed changes in the memstore for the table being read? Maybe we don't care - at most you'd see 1 hour old data. How do we indicate to Phoenix that it should do a snapshot read as opposed to it's regular read path? Maybe the MR read path (used by Pig and Spark) always (or optionally) does a snapshot read? What about our Phoenix coprocessors (which I believe will be bypassed if we use HDFS snapshot reads)? If snapshot reads are only for the MR path, we might not need them. If we do need them, we can probably wrap the scanner as needed, but there might be some refactoring required.
          Hide
          jamestaylor James Taylor added a comment - - edited

          In offline conversation we determined that the simplest path initially will be to support snapshot reads for the MR integration queries (I've updated the JIRA subject to reflect this). We can make it configurable per job on whether snapshot reads are used or not (this would be if the client is ok seeing data that is potentially 1hr old which is probably almost all of the time). Also, I believe our coprocessors are not needed in this case (since we only execute simple scans for our MR integration) which simplifies things.

          One hole in this logic is with local indexes. Our coprocessors strip off the region start key prefix from the rows coming back. If we want to use local indexes for Phoenix MR or Spark jobs, we'd need to do this still.

          Show
          jamestaylor James Taylor added a comment - - edited In offline conversation we determined that the simplest path initially will be to support snapshot reads for the MR integration queries (I've updated the JIRA subject to reflect this). We can make it configurable per job on whether snapshot reads are used or not (this would be if the client is ok seeing data that is potentially 1hr old which is probably almost all of the time). Also, I believe our coprocessors are not needed in this case (since we only execute simple scans for our MR integration) which simplifies things. One hole in this logic is with local indexes. Our coprocessors strip off the region start key prefix from the rows coming back. If we want to use local indexes for Phoenix MR or Spark jobs, we'd need to do this still.
          Hide
          jamestaylor James Taylor added a comment -

          FYI, Eli Levine - this will improve performance of Pig/Spark/MR queries that come through our Phoenix MR integration APIs.

          Show
          jamestaylor James Taylor added a comment - FYI, Eli Levine - this will improve performance of Pig/Spark/MR queries that come through our Phoenix MR integration APIs.
          Hide
          elilevine Eli Levine added a comment -

          Thanks for the @-mention, James Taylor! This is indeed interesting.

          Show
          elilevine Eli Levine added a comment - Thanks for the @-mention, James Taylor ! This is indeed interesting.
          Hide
          jamestaylor James Taylor added a comment - - edited

          Here's an idea on how this can be implemented:

          • In the beginning of PhoenixInputFormat.getQueryPlan(), take a snapshot so we can get the now unchanging region boundaries.
          • Later in PhoenixInputFormat.getQueryPlan(), when we call statement.optimizeQuery(), provide an overloaded version that passes through an interface from which we can get the region boundaries. Have two implementations of this interface: one that does what we do today in BaseResultIterators.getParallelScans():
                    List<HRegionLocation> regionLocations = context.getConnection().getQueryServices()
                            .getAllTableRegions(physicalTableName);
            

            The other implementation would use the snapshot to get the region boundaries instead. This will prevent a race condition in which a split could occur prior to the running of the scans, but after we've already got the region boundaries (or the region boundaries being stale since we get these from the cache on the HConnection). You'd use a new job configuration parameter to determine which implementation to use based on whether or not a snapshot read is being done.

          • As side note, we might want to leverage the ParallelScanGrouper interface that's already in place to get the region boundaries as it'll be somewhat tricky to thread a new interface to the BaseResultIterators class and we already do this with an alternate ParallelScanGrouper implementation for the MR jobs.
          • In PhoenixRecordReader.initialize(), when doing a snapshot read, instead of instantiating a TableResultIterator (which is the thing that does an htable.getScanner()), instantiate a new TableSnapshotResultIterator which uses the snapshot scanner instead. The ResultIterator interface is very simple - you just need to implement two methods (and the explain method can be a noop):
            public interface ResultIterator extends SQLCloseable {
                /**
                 * Grab the next row's worth of values. The iterator will return a Tuple.
                 * @return Tuple object if there is another row, null if the scanner is
                 * exhausted.
                 * @throws SQLException e
                 */
                public Tuple next() throws SQLException;
                
                public void explain(List<String> planSteps);
            }
            

          FYI, Akshita Malhotra, churro morales, Samarth Jain

          Show
          jamestaylor James Taylor added a comment - - edited Here's an idea on how this can be implemented: In the beginning of PhoenixInputFormat.getQueryPlan(), take a snapshot so we can get the now unchanging region boundaries. Later in PhoenixInputFormat.getQueryPlan(), when we call statement.optimizeQuery(), provide an overloaded version that passes through an interface from which we can get the region boundaries. Have two implementations of this interface: one that does what we do today in BaseResultIterators.getParallelScans(): List<HRegionLocation> regionLocations = context.getConnection().getQueryServices() .getAllTableRegions(physicalTableName); The other implementation would use the snapshot to get the region boundaries instead. This will prevent a race condition in which a split could occur prior to the running of the scans, but after we've already got the region boundaries (or the region boundaries being stale since we get these from the cache on the HConnection). You'd use a new job configuration parameter to determine which implementation to use based on whether or not a snapshot read is being done. As side note, we might want to leverage the ParallelScanGrouper interface that's already in place to get the region boundaries as it'll be somewhat tricky to thread a new interface to the BaseResultIterators class and we already do this with an alternate ParallelScanGrouper implementation for the MR jobs. In PhoenixRecordReader.initialize(), when doing a snapshot read, instead of instantiating a TableResultIterator (which is the thing that does an htable.getScanner()), instantiate a new TableSnapshotResultIterator which uses the snapshot scanner instead. The ResultIterator interface is very simple - you just need to implement two methods (and the explain method can be a noop): public interface ResultIterator extends SQLCloseable { /** * Grab the next row's worth of values. The iterator will return a Tuple. * @ return Tuple object if there is another row, null if the scanner is * exhausted. * @ throws SQLException e */ public Tuple next() throws SQLException; public void explain(List< String > planSteps); } FYI, Akshita Malhotra , churro morales , Samarth Jain
          Hide
          akshita.malhotra Akshita Malhotra added a comment - - edited

          Parallel Scan grouper is extended to differentiate the functionality for getting region boundaries

          Added integration test, compares the snapshot read result with the result from select query by setting CurrentScn value.

          the configuration parameter is the snapshot name key, if set do a snapshot read

          Used an existing PhoenixIndexDBWritable class for the purpose of testing, will add a new one as I will add more tests.

          ExpressionProjector functionality is extended for snapshots as the keyvalue format returned from TableSnapshotScanner is different from ClientScanner and therefore not properly interrupted by Phoenix thereby returning null in case of projected columns.
          For the same table, following shows the different format of the keyvalues:

          ClientScanner:
          keyvalues=

          {AAPL/_v:\x00\x00\x00\x01/1493061452132/Put/vlen=7/seqid=0/value=�SSDD��}

          TableSnapshotScanner:
          keyvalues=

          {AAPL/0:\x00\x00\x00\x00/1493061673859/Put/vlen=1/seqid=4/value=x, AAPL/0:\x80\x0B/1493061673859/Put/vlen=4/seqid=4/value=SSDD}

          To DO:
          Add more integration tests to cover different scenarios such as where clause etc

          fyi: James Taylor

          Show
          akshita.malhotra Akshita Malhotra added a comment - - edited Parallel Scan grouper is extended to differentiate the functionality for getting region boundaries Added integration test, compares the snapshot read result with the result from select query by setting CurrentScn value. the configuration parameter is the snapshot name key, if set do a snapshot read Used an existing PhoenixIndexDBWritable class for the purpose of testing, will add a new one as I will add more tests. ExpressionProjector functionality is extended for snapshots as the keyvalue format returned from TableSnapshotScanner is different from ClientScanner and therefore not properly interrupted by Phoenix thereby returning null in case of projected columns. For the same table, following shows the different format of the keyvalues: ClientScanner: keyvalues= {AAPL/_v:\x00\x00\x00\x01/1493061452132/Put/vlen=7/seqid=0/value=�SSDD��} TableSnapshotScanner: keyvalues= {AAPL/0:\x00\x00\x00\x00/1493061673859/Put/vlen=1/seqid=4/value=x, AAPL/0:\x80\x0B/1493061673859/Put/vlen=4/seqid=4/value=SSDD} To DO: Add more integration tests to cover different scenarios such as where clause etc fyi: James Taylor
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user JamesRTaylor commented on the issue:

          https://github.com/apache/phoenix/pull/239

          I don't think it's necessary to fully understand the functionality to do the refactoring I've mentioned, @akshita-malhotra. Here's how I'd recommend approaching it:

          • create a new interface solely for the purpose of only abstracting RegionCoprocessorEnvironment access called RegionContext. The interface would have at least two methods: getRegion and getConfiguration. We might need more if other methods are called in RegionCoprocessorEnvironment.
          • have two implementations of this interface: RegionCoprocessorContext and RegionShapshotContext. The constructor of RegionCoprocessorContext would take a RegionCoprocessorEnvironment as an argument, while the RegionShapshotContext would take a Region and Configuration.
          • do an across the board replace of RegionCoprocessorEnvironment with RegionContext. You can likely not do this for secondary index related code (org.apache.phoenix.hbase.index.Indexer and PhoenixTransactionalIndexer). You'll find out here if other methods are called from RegionCoprocessorEnvironment or ObserverContext (which can be dealt with in a variety of ways, for example by throwing an UnsupportedOperationException if need be in the snapshot implementation).
          • in the top level coprocessor methods that pass in RegionCoprocessorEnvironment (mostly abstract BaseScannerRegionObserver class), instantiate a RegionCoprocessorContext by passing in the RegionCoprocessorEnvironment. From this point onward, all access will go through the RegionContext interface.

          You could do this refactoring completely separate from the PHOENIX-3744 so that you don't mix the two. Then PHOENIX-3744 would have something like a RegionScannerFactory (your RegionObserverUtil) that gives you back a RegionScanner given a RegionContext and you'd create a RegionShapshotContext as the backing implementation in your snapshot reading code.

          Show
          githubbot ASF GitHub Bot added a comment - Github user JamesRTaylor commented on the issue: https://github.com/apache/phoenix/pull/239 I don't think it's necessary to fully understand the functionality to do the refactoring I've mentioned, @akshita-malhotra. Here's how I'd recommend approaching it: create a new interface solely for the purpose of only abstracting RegionCoprocessorEnvironment access called RegionContext. The interface would have at least two methods: getRegion and getConfiguration. We might need more if other methods are called in RegionCoprocessorEnvironment. have two implementations of this interface: RegionCoprocessorContext and RegionShapshotContext. The constructor of RegionCoprocessorContext would take a RegionCoprocessorEnvironment as an argument, while the RegionShapshotContext would take a Region and Configuration. do an across the board replace of RegionCoprocessorEnvironment with RegionContext. You can likely not do this for secondary index related code (org.apache.phoenix.hbase.index.Indexer and PhoenixTransactionalIndexer). You'll find out here if other methods are called from RegionCoprocessorEnvironment or ObserverContext (which can be dealt with in a variety of ways, for example by throwing an UnsupportedOperationException if need be in the snapshot implementation). in the top level coprocessor methods that pass in RegionCoprocessorEnvironment (mostly abstract BaseScannerRegionObserver class), instantiate a RegionCoprocessorContext by passing in the RegionCoprocessorEnvironment. From this point onward, all access will go through the RegionContext interface. You could do this refactoring completely separate from the PHOENIX-3744 so that you don't mix the two. Then PHOENIX-3744 would have something like a RegionScannerFactory (your RegionObserverUtil) that gives you back a RegionScanner given a RegionContext and you'd create a RegionShapshotContext as the backing implementation in your snapshot reading code.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user JamesRTaylor commented on the issue:

          https://github.com/apache/phoenix/pull/239

          Please amend your commit message to be prefixed with PHOENIX-3744 (instead of Phoenix-3744) so that review comments show up as comments on the JIRA, @akshita-malhotra. Also, let's get a patch attached to the JIRA so we can get a test run to make sure there are no regressions. Directions here: https://phoenix.apache.org/contributing.html#Generate_a_patch

          Show
          githubbot ASF GitHub Bot added a comment - Github user JamesRTaylor commented on the issue: https://github.com/apache/phoenix/pull/239 Please amend your commit message to be prefixed with PHOENIX-3744 (instead of Phoenix-3744) so that review comments show up as comments on the JIRA, @akshita-malhotra. Also, let's get a patch attached to the JIRA so we can get a test run to make sure there are no regressions. Directions here: https://phoenix.apache.org/contributing.html#Generate_a_patch
          Hide
          akshita.malhotra Akshita Malhotra added a comment -

          PHOENIX-3744: Support snapshot scanners for MR-based Non-aggregate queries

          Show
          akshita.malhotra Akshita Malhotra added a comment - PHOENIX-3744 : Support snapshot scanners for MR-based Non-aggregate queries
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user akshita-malhotra commented on the issue:

          https://github.com/apache/phoenix/pull/239

          @JamesRTaylor

          • Changed ParallelScanGrouper classes as per the review
          • Changes to BaseTest were to avoid the following error:
            "Restore directory cannot be a sub directory of HBase root directory"
            Therefore, was sending true to create the root dir. Changed to use a random dir instead to avoid to make these changes
          • Refactored the util classes to Factory

          Also, uploaded the patch on the jira.

          Show
          githubbot ASF GitHub Bot added a comment - Github user akshita-malhotra commented on the issue: https://github.com/apache/phoenix/pull/239 @JamesRTaylor Changed ParallelScanGrouper classes as per the review Changes to BaseTest were to avoid the following error: "Restore directory cannot be a sub directory of HBase root directory" Therefore, was sending true to create the root dir. Changed to use a random dir instead to avoid to make these changes Refactored the util classes to Factory Also, uploaded the patch on the jira.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12868832/PHOENIX-3744.patch
          against master branch at commit 21fb0b31b46da3b7cc27265467d83a1b4cd5c5c5.
          ATTACHMENT ID: 12868832

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 javadoc. The javadoc tool appears to have generated 49 warning messages.

          -1 release audit. The applied patch generated 11 release audit warnings (more than the master's current 0 warnings).

          -1 lineLengths. The patch introduces the following lines longer than 100:
          + " FIELD1 VARCHAR NOT NULL , FIELD2 VARCHAR , FIELD3 INTEGER CONSTRAINT pk PRIMARY KEY (FIELD1 ))";
          + PhoenixMapReduceUtil.setInput(job,PhoenixIndexDBWritable.class,SNAPSHOT_NAME,tableName,tmpDir, null, FIELD1, FIELD2, FIELD3);
          + PhoenixMapReduceUtil.setInput(job,PhoenixIndexDBWritable.class,SNAPSHOT_NAME,tableName,tmpDir, FIELD3 + " > 0001", FIELD1, FIELD2, FIELD3);
          + PhoenixMapReduceUtil.setInput(job,PhoenixIndexDBWritable.class,SNAPSHOT_NAME,tableName,tmpDir,inputQuery);
          + private void configureJob(Job job, String tableName, String inputQuery, String condition) throws Exception {
          + ResultSet rs = DriverManager.getConnection(getUrl(), props).createStatement().executeQuery(inputQuery);
          + assertFalse("Should only have stored" + result.size() + "rows in the table for the timestamp!", rs.next());
          + private void upsertData(PreparedStatement stmt, String field1, String field2, int field3) throws SQLException {
          + public static class TableSnapshotMapper extends Mapper<NullWritable, PhoenixIndexDBWritable, ImmutableBytesWritable, NullWritable> {
          + RegionScannerFactory regionScannerFactory = new NonAggregateRegionScannerFactory(context, useNewValueColumnQualifier, encodingScheme);

          -1 core tests. The patch failed these unit tests:
          ./phoenix-core/target/failsafe-reports/TEST-org.apache.phoenix.end2end.ArrayIT

          Test results: https://builds.apache.org/job/PreCommit-PHOENIX-Build/880//testReport/
          Release audit warnings: https://builds.apache.org/job/PreCommit-PHOENIX-Build/880//artifact/patchprocess/patchReleaseAuditWarnings.txt
          Javadoc warnings: https://builds.apache.org/job/PreCommit-PHOENIX-Build/880//artifact/patchprocess/patchJavadocWarnings.txt
          Console output: https://builds.apache.org/job/PreCommit-PHOENIX-Build/880//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12868832/PHOENIX-3744.patch against master branch at commit 21fb0b31b46da3b7cc27265467d83a1b4cd5c5c5. ATTACHMENT ID: 12868832 +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. -1 javadoc . The javadoc tool appears to have generated 49 warning messages. -1 release audit . The applied patch generated 11 release audit warnings (more than the master's current 0 warnings). -1 lineLengths . The patch introduces the following lines longer than 100: + " FIELD1 VARCHAR NOT NULL , FIELD2 VARCHAR , FIELD3 INTEGER CONSTRAINT pk PRIMARY KEY (FIELD1 ))"; + PhoenixMapReduceUtil.setInput(job,PhoenixIndexDBWritable.class,SNAPSHOT_NAME,tableName,tmpDir, null, FIELD1, FIELD2, FIELD3); + PhoenixMapReduceUtil.setInput(job,PhoenixIndexDBWritable.class,SNAPSHOT_NAME,tableName,tmpDir, FIELD3 + " > 0001", FIELD1, FIELD2, FIELD3); + PhoenixMapReduceUtil.setInput(job,PhoenixIndexDBWritable.class,SNAPSHOT_NAME,tableName,tmpDir,inputQuery); + private void configureJob(Job job, String tableName, String inputQuery, String condition) throws Exception { + ResultSet rs = DriverManager.getConnection(getUrl(), props).createStatement().executeQuery(inputQuery); + assertFalse("Should only have stored" + result.size() + "rows in the table for the timestamp!", rs.next()); + private void upsertData(PreparedStatement stmt, String field1, String field2, int field3) throws SQLException { + public static class TableSnapshotMapper extends Mapper<NullWritable, PhoenixIndexDBWritable, ImmutableBytesWritable, NullWritable> { + RegionScannerFactory regionScannerFactory = new NonAggregateRegionScannerFactory(context, useNewValueColumnQualifier, encodingScheme); -1 core tests . The patch failed these unit tests: ./phoenix-core/target/failsafe-reports/TEST-org.apache.phoenix.end2end.ArrayIT Test results: https://builds.apache.org/job/PreCommit-PHOENIX-Build/880//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-PHOENIX-Build/880//artifact/patchprocess/patchReleaseAuditWarnings.txt Javadoc warnings: https://builds.apache.org/job/PreCommit-PHOENIX-Build/880//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-PHOENIX-Build/880//console This message is automatically generated.
          Hide
          jamestaylor James Taylor added a comment -

          Akshita Malhotra - please investigate the above test failure. You can try running it locally and attach the same patch again to get another test run to see if it fails consistently.

          Show
          jamestaylor James Taylor added a comment - Akshita Malhotra - please investigate the above test failure. You can try running it locally and attach the same patch again to get another test run to see if it fails consistently.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user JamesRTaylor commented on a diff in the pull request:

          https://github.com/apache/phoenix/pull/239#discussion_r118310359

          — Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/ScanningResultIterator.java —
          @@ -38,10 +38,12 @@
          public class ScanningResultIterator implements ResultIterator {
          private final ResultScanner scanner;
          private final CombinableMetric scanMetrics;
          + private final boolean snapshotScan;
          — End diff –

          Looks like the changes to this file aren't necessary. Please revert.

          Show
          githubbot ASF GitHub Bot added a comment - Github user JamesRTaylor commented on a diff in the pull request: https://github.com/apache/phoenix/pull/239#discussion_r118310359 — Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/ScanningResultIterator.java — @@ -38,10 +38,12 @@ public class ScanningResultIterator implements ResultIterator { private final ResultScanner scanner; private final CombinableMetric scanMetrics; + private final boolean snapshotScan; — End diff – Looks like the changes to this file aren't necessary. Please revert.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user JamesRTaylor commented on a diff in the pull request:

          https://github.com/apache/phoenix/pull/239#discussion_r118311801

          — Diff: phoenix-core/src/main/java/org/apache/phoenix/mapreduce/util/PhoenixMapReduceUtil.java —
          @@ -63,6 +67,66 @@ public static void setInput(final Job job, final Class<? extends DBWritable> inp
          PhoenixConfigurationUtil.setSchemaType(configuration, SchemaType.QUERY);
          }

          + /**
          + *
          + * @param job
          + * @param inputClass DBWritable class
          + * @param snapshotName The name of a snapshot (of a table) to read from
          + * @param tableName Input table name
          + * @param restoreDir a temporary dir to copy the snapshot files into
          + * @param conditions Condition clause to be added to the WHERE clause. Can be <tt>null</tt> if there are no conditions.
          + * @param fieldNames fields being projected for the SELECT query.
          + */
          + public static void setInput(final Job job, final Class<? extends DBWritable> inputClass, final String snapshotName, String tableName,
          — End diff –

          Why wouldn't we want to take the snapshot here instead of passing in the snapshot name?

          Show
          githubbot ASF GitHub Bot added a comment - Github user JamesRTaylor commented on a diff in the pull request: https://github.com/apache/phoenix/pull/239#discussion_r118311801 — Diff: phoenix-core/src/main/java/org/apache/phoenix/mapreduce/util/PhoenixMapReduceUtil.java — @@ -63,6 +67,66 @@ public static void setInput(final Job job, final Class<? extends DBWritable> inp PhoenixConfigurationUtil.setSchemaType(configuration, SchemaType.QUERY); } + /** + * + * @param job + * @param inputClass DBWritable class + * @param snapshotName The name of a snapshot (of a table) to read from + * @param tableName Input table name + * @param restoreDir a temporary dir to copy the snapshot files into + * @param conditions Condition clause to be added to the WHERE clause. Can be <tt>null</tt> if there are no conditions. + * @param fieldNames fields being projected for the SELECT query. + */ + public static void setInput(final Job job, final Class<? extends DBWritable> inputClass, final String snapshotName, String tableName, — End diff – Why wouldn't we want to take the snapshot here instead of passing in the snapshot name?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user JamesRTaylor commented on a diff in the pull request:

          https://github.com/apache/phoenix/pull/239#discussion_r118312938

          — Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/DefaultParallelScanGrouper.java —
          @@ -17,46 +17,79 @@
          */
          package org.apache.phoenix.iterate;

          +import java.sql.SQLException;
          import java.util.List;

          +import com.google.common.base.Preconditions;
          +import org.apache.hadoop.hbase.HRegionLocation;
          import org.apache.hadoop.hbase.client.Scan;
          import org.apache.phoenix.compile.QueryPlan;
          +import org.apache.phoenix.compile.StatementContext;
          import org.apache.phoenix.schema.PTable;
          import org.apache.phoenix.schema.PTable.IndexType;
          import org.apache.phoenix.schema.SaltingUtil;
          +import org.apache.phoenix.schema.TableRef;
          import org.apache.phoenix.util.ScanUtil;

          /**

          • Default implementation that creates a scan group if a plan is row key ordered (which requires a merge sort),
          • * or if a scan crosses a region boundary and the table is salted or a local index.
            + * or if a scan crosses a region boundary and the table is salted or a local index.
            */
            public class DefaultParallelScanGrouper implements ParallelScanGrouper {
          • private static final DefaultParallelScanGrouper INSTANCE = new DefaultParallelScanGrouper();
          • public static DefaultParallelScanGrouper getInstance() { - return INSTANCE; - }
          • private DefaultParallelScanGrouper() {}
            -
          • @Override
          • public boolean shouldStartNewScan(QueryPlan plan, List<Scan> scans, byte[] startKey, boolean crossedRegionBoundary) {
          • PTable table = plan.getTableRef().getTable();
          • boolean startNewScanGroup = false;
          • if (!plan.isRowKeyOrdered()) { - startNewScanGroup = true; - }

            else if (crossedRegionBoundary) {

          • if (table.getIndexType() == IndexType.LOCAL) { - startNewScanGroup = true; - }

            else if (table.getBucketNum() != null)

            { - startNewScanGroup = scans.isEmpty() || - ScanUtil.crossesPrefixBoundary(startKey, - ScanUtil.getPrefix(scans.get(scans.size()-1).getStartRow(), SaltingUtil.NUM_SALTING_BYTES), - SaltingUtil.NUM_SALTING_BYTES); - }
          • }
          • return startNewScanGroup;
            + private static DefaultParallelScanGrouper INSTANCE = new DefaultParallelScanGrouper();
              • End diff –

          I don't think that DefaultParallelScanGrouper can be a singleton with the state of context and tableName inside of it.

          Show
          githubbot ASF GitHub Bot added a comment - Github user JamesRTaylor commented on a diff in the pull request: https://github.com/apache/phoenix/pull/239#discussion_r118312938 — Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/DefaultParallelScanGrouper.java — @@ -17,46 +17,79 @@ */ package org.apache.phoenix.iterate; +import java.sql.SQLException; import java.util.List; +import com.google.common.base.Preconditions; +import org.apache.hadoop.hbase.HRegionLocation; import org.apache.hadoop.hbase.client.Scan; import org.apache.phoenix.compile.QueryPlan; +import org.apache.phoenix.compile.StatementContext; import org.apache.phoenix.schema.PTable; import org.apache.phoenix.schema.PTable.IndexType; import org.apache.phoenix.schema.SaltingUtil; +import org.apache.phoenix.schema.TableRef; import org.apache.phoenix.util.ScanUtil; /** Default implementation that creates a scan group if a plan is row key ordered (which requires a merge sort), * or if a scan crosses a region boundary and the table is salted or a local index. + * or if a scan crosses a region boundary and the table is salted or a local index. */ public class DefaultParallelScanGrouper implements ParallelScanGrouper { private static final DefaultParallelScanGrouper INSTANCE = new DefaultParallelScanGrouper(); public static DefaultParallelScanGrouper getInstance() { - return INSTANCE; - } private DefaultParallelScanGrouper() {} - @Override public boolean shouldStartNewScan(QueryPlan plan, List<Scan> scans, byte[] startKey, boolean crossedRegionBoundary) { PTable table = plan.getTableRef().getTable(); boolean startNewScanGroup = false; if (!plan.isRowKeyOrdered()) { - startNewScanGroup = true; - } else if (crossedRegionBoundary) { if (table.getIndexType() == IndexType.LOCAL) { - startNewScanGroup = true; - } else if (table.getBucketNum() != null) { - startNewScanGroup = scans.isEmpty() || - ScanUtil.crossesPrefixBoundary(startKey, - ScanUtil.getPrefix(scans.get(scans.size()-1).getStartRow(), SaltingUtil.NUM_SALTING_BYTES), - SaltingUtil.NUM_SALTING_BYTES); - } } return startNewScanGroup; + private static DefaultParallelScanGrouper INSTANCE = new DefaultParallelScanGrouper(); End diff – I don't think that DefaultParallelScanGrouper can be a singleton with the state of context and tableName inside of it.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user JamesRTaylor commented on a diff in the pull request:

          https://github.com/apache/phoenix/pull/239#discussion_r118313368

          — Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/TableMRParallelScanGrouper.java —
          @@ -0,0 +1,99 @@
          +package org.apache.phoenix.iterate;
          +
          +import com.google.common.base.Preconditions;
          +import com.google.common.collect.Lists;
          +import org.apache.hadoop.conf.Configuration;
          +import org.apache.hadoop.fs.FileSystem;
          +import org.apache.hadoop.fs.Path;
          +import org.apache.hadoop.hbase.HConstants;
          +import org.apache.hadoop.hbase.HRegionInfo;
          +import org.apache.hadoop.hbase.HRegionLocation;
          +import org.apache.hadoop.hbase.protobuf.generated.HBaseProtos;
          +import org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos;
          +import org.apache.hadoop.hbase.snapshot.SnapshotDescriptionUtils;
          +import org.apache.hadoop.hbase.snapshot.SnapshotManifest;
          +import org.apache.phoenix.compile.StatementContext;
          +import org.apache.phoenix.mapreduce.util.PhoenixConfigurationUtil;
          +
          +import java.sql.SQLException;
          +import java.util.List;
          +
          +public class TableMRParallelScanGrouper extends MapReduceParallelScanGrouper {
          +
          + private static TableMRParallelScanGrouper INSTANCE = null;
          — End diff –

          Same comment here - how can this be a singleton?

          Show
          githubbot ASF GitHub Bot added a comment - Github user JamesRTaylor commented on a diff in the pull request: https://github.com/apache/phoenix/pull/239#discussion_r118313368 — Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/TableMRParallelScanGrouper.java — @@ -0,0 +1,99 @@ +package org.apache.phoenix.iterate; + +import com.google.common.base.Preconditions; +import com.google.common.collect.Lists; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hbase.HConstants; +import org.apache.hadoop.hbase.HRegionInfo; +import org.apache.hadoop.hbase.HRegionLocation; +import org.apache.hadoop.hbase.protobuf.generated.HBaseProtos; +import org.apache.hadoop.hbase.protobuf.generated.SnapshotProtos; +import org.apache.hadoop.hbase.snapshot.SnapshotDescriptionUtils; +import org.apache.hadoop.hbase.snapshot.SnapshotManifest; +import org.apache.phoenix.compile.StatementContext; +import org.apache.phoenix.mapreduce.util.PhoenixConfigurationUtil; + +import java.sql.SQLException; +import java.util.List; + +public class TableMRParallelScanGrouper extends MapReduceParallelScanGrouper { + + private static TableMRParallelScanGrouper INSTANCE = null; — End diff – Same comment here - how can this be a singleton?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user JamesRTaylor commented on a diff in the pull request:

          https://github.com/apache/phoenix/pull/239#discussion_r118313746

          — Diff: phoenix-core/src/main/java/org/apache/phoenix/mapreduce/PhoenixInputFormat.java —
          @@ -185,7 +184,7 @@ private QueryPlan getQueryPlan(final JobContext context, final Configuration con
          scan.setAttribute(BaseScannerRegionObserver.TX_SCN, Bytes.toBytes(Long.valueOf(txnScnValue)));
          }
          // Initialize the query plan so it sets up the parallel scans

          • queryPlan.iterator(MapReduceParallelScanGrouper.getInstance());
            + queryPlan.iterator(TableMRParallelScanGrouper.init(configuration));
              • End diff –

          Can't this just instantiate a TableMRParallelScanGrouper and the init method can be invoked in the constructor so that it doesn't need to be exposed in the interface?

          Show
          githubbot ASF GitHub Bot added a comment - Github user JamesRTaylor commented on a diff in the pull request: https://github.com/apache/phoenix/pull/239#discussion_r118313746 — Diff: phoenix-core/src/main/java/org/apache/phoenix/mapreduce/PhoenixInputFormat.java — @@ -185,7 +184,7 @@ private QueryPlan getQueryPlan(final JobContext context, final Configuration con scan.setAttribute(BaseScannerRegionObserver.TX_SCN, Bytes.toBytes(Long.valueOf(txnScnValue))); } // Initialize the query plan so it sets up the parallel scans queryPlan.iterator(MapReduceParallelScanGrouper.getInstance()); + queryPlan.iterator(TableMRParallelScanGrouper.init(configuration)); End diff – Can't this just instantiate a TableMRParallelScanGrouper and the init method can be invoked in the constructor so that it doesn't need to be exposed in the interface?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user JamesRTaylor commented on a diff in the pull request:

          https://github.com/apache/phoenix/pull/239#discussion_r118314328

          — Diff: phoenix-core/src/main/java/org/apache/phoenix/mapreduce/util/PhoenixConfigurationUtil.java —
          @@ -192,6 +196,18 @@ public static void setOutputTableName(final Configuration configuration, final S
          public static void setUpsertColumnNames(final Configuration configuration,final String[] columns)

          { setValues(configuration, columns, MAPREDUCE_UPSERT_COLUMN_COUNT, MAPREDUCE_UPSERT_COLUMN_VALUE_PREFIX); }

          +
          + public static void setSnapshotNameKey(final Configuration configuration, final String snapshotName) {
          + Preconditions.checkNotNull(configuration);
          + Preconditions.checkNotNull(snapshotName);
          + configuration.set(SNAPSHOT_NAME_KEY, snapshotName);
          — End diff –

          The idea of having the snapshot name in the configuration is not going to translate well when we want to expose snapshot reads for queries in general as the configuration is a global object.

          Show
          githubbot ASF GitHub Bot added a comment - Github user JamesRTaylor commented on a diff in the pull request: https://github.com/apache/phoenix/pull/239#discussion_r118314328 — Diff: phoenix-core/src/main/java/org/apache/phoenix/mapreduce/util/PhoenixConfigurationUtil.java — @@ -192,6 +196,18 @@ public static void setOutputTableName(final Configuration configuration, final S public static void setUpsertColumnNames(final Configuration configuration,final String[] columns) { setValues(configuration, columns, MAPREDUCE_UPSERT_COLUMN_COUNT, MAPREDUCE_UPSERT_COLUMN_VALUE_PREFIX); } + + public static void setSnapshotNameKey(final Configuration configuration, final String snapshotName) { + Preconditions.checkNotNull(configuration); + Preconditions.checkNotNull(snapshotName); + configuration.set(SNAPSHOT_NAME_KEY, snapshotName); — End diff – The idea of having the snapshot name in the configuration is not going to translate well when we want to expose snapshot reads for queries in general as the configuration is a global object.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user JamesRTaylor commented on a diff in the pull request:

          https://github.com/apache/phoenix/pull/239#discussion_r118314856

          — Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/RegionContext.java —
          @@ -0,0 +1,15 @@
          +package org.apache.phoenix.iterate;
          +
          +import org.apache.hadoop.hbase.coprocessor.RegionCoprocessorEnvironment;
          +
          +/**
          + * Maintains coprocessor environment state to
          + * make region observer functionality work for
          + * non-coprocessor environment
          + */
          +public interface RegionContext {
          +
          + /** @return the Coprocessor environment */
          + RegionCoprocessorEnvironment getEnvironment();
          — End diff –

          If this is the only method, we don't really need this interface. Why not just keep using RegionCoprocessorEnvironment in our APIs and you can just instantiate you snapshot-based one prior to getting the RegionScanner?

          Show
          githubbot ASF GitHub Bot added a comment - Github user JamesRTaylor commented on a diff in the pull request: https://github.com/apache/phoenix/pull/239#discussion_r118314856 — Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/RegionContext.java — @@ -0,0 +1,15 @@ +package org.apache.phoenix.iterate; + +import org.apache.hadoop.hbase.coprocessor.RegionCoprocessorEnvironment; + +/** + * Maintains coprocessor environment state to + * make region observer functionality work for + * non-coprocessor environment + */ +public interface RegionContext { + + /** @return the Coprocessor environment */ + RegionCoprocessorEnvironment getEnvironment(); — End diff – If this is the only method, we don't really need this interface. Why not just keep using RegionCoprocessorEnvironment in our APIs and you can just instantiate you snapshot-based one prior to getting the RegionScanner?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user JamesRTaylor commented on the issue:

          https://github.com/apache/phoenix/pull/239

          Thanks for the revision, @akshita-malhotra. It's looking very good. I made some comments inline.

          Show
          githubbot ASF GitHub Bot added a comment - Github user JamesRTaylor commented on the issue: https://github.com/apache/phoenix/pull/239 Thanks for the revision, @akshita-malhotra. It's looking very good. I made some comments inline.
          Hide
          akshita.malhotra Akshita Malhotra added a comment -

          Updated patch

          Show
          akshita.malhotra Akshita Malhotra added a comment - Updated patch
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user akshita-malhotra commented on the issue:

          https://github.com/apache/phoenix/pull/239

          @JamesRTaylor Thanks a lot for the review. I have made the suggested changes and uploaded the updated patch on the jira.
          Regarding creating snapshot to generalize the use of snapshots for M/R jobs, I was under the impression that we are passing the snapshot name as input after our last discussion with @lhofhansl and Rahul G.
          If we are to follow the former approach, I will go ahead and make the changes.

          Show
          githubbot ASF GitHub Bot added a comment - Github user akshita-malhotra commented on the issue: https://github.com/apache/phoenix/pull/239 @JamesRTaylor Thanks a lot for the review. I have made the suggested changes and uploaded the updated patch on the jira. Regarding creating snapshot to generalize the use of snapshots for M/R jobs, I was under the impression that we are passing the snapshot name as input after our last discussion with @lhofhansl and Rahul G. If we are to follow the former approach, I will go ahead and make the changes.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12870469/PHOENIX-3744.patch
          against master branch at commit cd708bbfa589771730ace12026d755aee58e1c9e.
          ATTACHMENT ID: 12870469

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-PHOENIX-Build/930//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12870469/PHOENIX-3744.patch against master branch at commit cd708bbfa589771730ace12026d755aee58e1c9e. ATTACHMENT ID: 12870469 +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-PHOENIX-Build/930//console This message is automatically generated.
          Hide
          akshita.malhotra Akshita Malhotra added a comment -

          Updated patch

          Show
          akshita.malhotra Akshita Malhotra added a comment - Updated patch
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12870475/PHOENIX-3744.patch
          against master branch at commit e3fc929e93715a359b4267db9f4d12706247a6a6.
          ATTACHMENT ID: 12870475

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-PHOENIX-Build/934//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12870475/PHOENIX-3744.patch against master branch at commit e3fc929e93715a359b4267db9f4d12706247a6a6. ATTACHMENT ID: 12870475 +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-PHOENIX-Build/934//console This message is automatically generated.
          Hide
          akshita.malhotra Akshita Malhotra added a comment -

          Patch wasn't applying due to recent changes to scan metrics. Resolved conflicts and uploaded the patch.

          Show
          akshita.malhotra Akshita Malhotra added a comment - Patch wasn't applying due to recent changes to scan metrics. Resolved conflicts and uploaded the patch.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user JamesRTaylor commented on the issue:

          https://github.com/apache/phoenix/pull/239

          Patch looks very good, @akshita-malhotra. What's the advantage, @lhofhansl, of forcing users to create the snapshot themselves before starting the job? Wouldn't it be simpler for the snapshot to be created during the setup/initialization of the MR job? In the non MR case, when we want to support running arbitrary queries over snapshot(s), seems like we'd want Phoenix to create them, no? Otherwise, we'd need to provide the user with some means of associating a snapshot with a table name (which might get cumbersome). The alternative is to let Phoenix manage this transparently.

          Show
          githubbot ASF GitHub Bot added a comment - Github user JamesRTaylor commented on the issue: https://github.com/apache/phoenix/pull/239 Patch looks very good, @akshita-malhotra. What's the advantage, @lhofhansl, of forcing users to create the snapshot themselves before starting the job? Wouldn't it be simpler for the snapshot to be created during the setup/initialization of the MR job? In the non MR case, when we want to support running arbitrary queries over snapshot(s), seems like we'd want Phoenix to create them, no? Otherwise, we'd need to provide the user with some means of associating a snapshot with a table name (which might get cumbersome). The alternative is to let Phoenix manage this transparently.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user JamesRTaylor commented on the issue:

          https://github.com/apache/phoenix/pull/239

          Had an offline discussion with @lhofhansl and he brought up the use case of having a snapshot at a known point in time (i.e. midnight last night) and wanting to run Phoenix queries over it through MR. That seems like a good reason to pass in the snapshot name rather than build it when the MR starts.

          I'll commit this into 4.x and master. Nice work, @akshita-malhotra!

          Show
          githubbot ASF GitHub Bot added a comment - Github user JamesRTaylor commented on the issue: https://github.com/apache/phoenix/pull/239 Had an offline discussion with @lhofhansl and he brought up the use case of having a snapshot at a known point in time (i.e. midnight last night) and wanting to run Phoenix queries over it through MR. That seems like a good reason to pass in the snapshot name rather than build it when the MR starts. I'll commit this into 4.x and master. Nice work, @akshita-malhotra!
          Hide
          jamestaylor James Taylor added a comment - - edited

          Would you mind attaching a patch that applies to 4.x-HBase-0.98 branch, Akshita Malhotra?

          Also, for 4.x-HBase-1.1 branch, I get the following compilation error. Would you mind attaching a patch that'll work for that branch?

          [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.0:compile (default-compile) on project phoenix-core: Compilation failure: Compilation failure:
          [ERROR] /home/jtaylor/dev/apache/phoenix/phoenix-core/src/main/java/org/apache/phoenix/iterate/TableSnapshotResultIterator.java:[60,53] incompatible types
          [ERROR] required: org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.RestoreMetaChanges
          [ERROR] found:    void
          [ERROR] /home/jtaylor/dev/apache/phoenix/phoenix-core/src/main/java/org/apache/phoenix/iterate/TableSnapshotResultIterator.java:[63,20] cannot find symbol
          [ERROR] symbol:   method getTableDescriptor()
          [ERROR] location: variable meta of type org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.RestoreMetaChanges
          [ERROR] -> [Help 1]
          [ERROR] 
          [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
          [ERROR] Re-run Maven using the -X switch to enable full debug logging.
          [ERROR] 
          [ERROR] For more information about the errors and possible solutions, please read the following articles:
          [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
          [ERROR] 
          [ERROR] After correcting the problems, you can resume the build with the command
          [ERROR]   mvn <goals> -rf :phoenix-core
          

          FYI, looks fine on master and 4.x-HBase-1.2 branches, so I've committed it there.

          Show
          jamestaylor James Taylor added a comment - - edited Would you mind attaching a patch that applies to 4.x-HBase-0.98 branch, Akshita Malhotra ? Also, for 4.x-HBase-1.1 branch, I get the following compilation error. Would you mind attaching a patch that'll work for that branch? [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.0:compile ( default -compile) on project phoenix-core: Compilation failure: Compilation failure: [ERROR] /home/jtaylor/dev/apache/phoenix/phoenix-core/src/main/java/org/apache/phoenix/iterate/TableSnapshotResultIterator.java:[60,53] incompatible types [ERROR] required: org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.RestoreMetaChanges [ERROR] found: void [ERROR] /home/jtaylor/dev/apache/phoenix/phoenix-core/src/main/java/org/apache/phoenix/iterate/TableSnapshotResultIterator.java:[63,20] cannot find symbol [ERROR] symbol: method getTableDescriptor() [ERROR] location: variable meta of type org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.RestoreMetaChanges [ERROR] -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch . [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http: //cwiki.apache.org/confluence/display/MAVEN/MojoFailureException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn <goals> -rf :phoenix-core FYI, looks fine on master and 4.x-HBase-1.2 branches, so I've committed it there.
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Jenkins build Phoenix-master #1640 (See https://builds.apache.org/job/Phoenix-master/1640/)
          PHOENIX-3744 Support snapshot scanners for MR-based Non-aggregate (jamestaylor: rev faadb544857845d9520e34359c6db341ade1fe02)

          • (edit) phoenix-core/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java
          • (edit) phoenix-core/src/main/java/org/apache/phoenix/iterate/MapReduceParallelScanGrouper.java
          • (edit) phoenix-core/src/main/java/org/apache/phoenix/coprocessor/BaseScannerRegionObserver.java
          • (edit) phoenix-core/src/main/java/org/apache/phoenix/mapreduce/util/PhoenixConfigurationUtil.java
          • (add) phoenix-core/src/main/java/org/apache/phoenix/iterate/TableSnapshotResultIterator.java
          • (edit) phoenix-core/src/main/java/org/apache/phoenix/coprocessor/ScanRegionObserver.java
          • (edit) phoenix-core/src/main/java/org/apache/phoenix/iterate/ParallelScanGrouper.java
          • (edit) phoenix-core/src/main/java/org/apache/phoenix/mapreduce/PhoenixRecordReader.java
          • (edit) phoenix-core/src/main/java/org/apache/phoenix/iterate/DefaultParallelScanGrouper.java
          • (add) phoenix-core/src/main/java/org/apache/phoenix/iterate/SnapshotScanner.java
          • (edit) phoenix-core/src/main/java/org/apache/phoenix/mapreduce/util/PhoenixMapReduceUtil.java
          • (add) phoenix-core/src/main/java/org/apache/phoenix/iterate/RegionScannerFactory.java
          • (add) phoenix-core/src/it/java/org/apache/phoenix/end2end/TableSnapshotReadsMapReduceIT.java
          • (add) phoenix-core/src/main/java/org/apache/phoenix/iterate/NonAggregateRegionScannerFactory.java
          • (edit) phoenix-core/src/main/java/org/apache/phoenix/util/IndexUtil.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Jenkins build Phoenix-master #1640 (See https://builds.apache.org/job/Phoenix-master/1640/ ) PHOENIX-3744 Support snapshot scanners for MR-based Non-aggregate (jamestaylor: rev faadb544857845d9520e34359c6db341ade1fe02) (edit) phoenix-core/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java (edit) phoenix-core/src/main/java/org/apache/phoenix/iterate/MapReduceParallelScanGrouper.java (edit) phoenix-core/src/main/java/org/apache/phoenix/coprocessor/BaseScannerRegionObserver.java (edit) phoenix-core/src/main/java/org/apache/phoenix/mapreduce/util/PhoenixConfigurationUtil.java (add) phoenix-core/src/main/java/org/apache/phoenix/iterate/TableSnapshotResultIterator.java (edit) phoenix-core/src/main/java/org/apache/phoenix/coprocessor/ScanRegionObserver.java (edit) phoenix-core/src/main/java/org/apache/phoenix/iterate/ParallelScanGrouper.java (edit) phoenix-core/src/main/java/org/apache/phoenix/mapreduce/PhoenixRecordReader.java (edit) phoenix-core/src/main/java/org/apache/phoenix/iterate/DefaultParallelScanGrouper.java (add) phoenix-core/src/main/java/org/apache/phoenix/iterate/SnapshotScanner.java (edit) phoenix-core/src/main/java/org/apache/phoenix/mapreduce/util/PhoenixMapReduceUtil.java (add) phoenix-core/src/main/java/org/apache/phoenix/iterate/RegionScannerFactory.java (add) phoenix-core/src/it/java/org/apache/phoenix/end2end/TableSnapshotReadsMapReduceIT.java (add) phoenix-core/src/main/java/org/apache/phoenix/iterate/NonAggregateRegionScannerFactory.java (edit) phoenix-core/src/main/java/org/apache/phoenix/util/IndexUtil.java
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user akshita-malhotra opened a pull request:

          https://github.com/apache/phoenix/pull/255

          PHOENIX-3744 for 4.x-HBase-0.98

          PHOENIX-3744 patch for 4.x-HBase-0.98 branch

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/akshita-malhotra/phoenix PHOENIX-3744-4.x

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/phoenix/pull/255.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #255


          commit 718dfb2b11c3b57ae1dc94b79d15ada516bba4a9
          Author: Akshita <akshita.malhotra@salesforce.com>
          Date: 2017-06-05T23:49:08Z

          PHOENIX-3744 for 4.x-HBase-0.98


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user akshita-malhotra opened a pull request: https://github.com/apache/phoenix/pull/255 PHOENIX-3744 for 4.x-HBase-0.98 PHOENIX-3744 patch for 4.x-HBase-0.98 branch You can merge this pull request into a Git repository by running: $ git pull https://github.com/akshita-malhotra/phoenix PHOENIX-3744 -4.x Alternatively you can review and apply these changes as the patch at: https://github.com/apache/phoenix/pull/255.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #255 commit 718dfb2b11c3b57ae1dc94b79d15ada516bba4a9 Author: Akshita <akshita.malhotra@salesforce.com> Date: 2017-06-05T23:49:08Z PHOENIX-3744 for 4.x-HBase-0.98
          Hide
          jamestaylor James Taylor added a comment -

          Pushed to 4.x and master branches. Thanks for the contribution, Akshita Malhotra!

          Show
          jamestaylor James Taylor added a comment - Pushed to 4.x and master branches. Thanks for the contribution, Akshita Malhotra !

            People

            • Assignee:
              akshita.malhotra Akshita Malhotra
              Reporter:
              jamestaylor James Taylor
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development