Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-9896

OutOfMemoryError: Requested array size exceeds VM limit when LocalCatalog is enabled

    XMLWordPrintableJSON

Details

    Description

      OutOfMemoryError: Requested array size exceeds VM limit when LocalCatalog is enabled.  

      The basic information of the large table is as follows:

      101 columns, 785243 partitions, 5729866 files.

      I0626 20:59:04.029678 3392438 jni-util.cc:256] java.lang.OutOfMemoryError: Requested array size exceeds VM limit
      I0626 20:59:04.030231 3392438 status.cc:124] OutOfMemoryError: Requested array size exceeds VM limit
          @           0xb35f19
          @          0x113112e
          @           0xb23b87
          @           0xb0e339
          @           0xc15a52
          @           0xc09e4c
          @           0xb01de9
          @           0xf159e8
          @           0xf0cd7e
          @           0xf0dc11
          @          0x11a1e3f
          @          0x11a29e9
          @          0x1790be9
          @     0x7f55188a2e24
          @     0x7f55185cf35c
      E0626 20:59:04.030258 3392438 catalog-server.cc:176] OutOfMemoryError: Requested array size exceeds VM limit
      

       The source code corresponding to the error is as follows:

      void GetPartialCatalogObject(TGetPartialCatalogObjectResponse& resp,
            const TGetPartialCatalogObjectRequest& req) override {    
            // TODO(todd): capture detailed metrics on the types of inbound requests, lock
            // wait times, etc.
            // TODO(todd): add some kind of limit on the number of concurrent requests here 
            // to avoid thread exhaustion -- eg perhaps it would be best to use a trylock
            // on the catalog locks, or defer these calls to a separate (bounded) queue,
            // so a heavy query workload against a table undergoing a slow refresh doesn't
            // end up taking down the catalog by creating thousands of threads.
                VLOG_RPC << "GetPartialCatalogObject(): request=" << ThriftDebugString(req);
                Status status = catalog_server_->catalog()->GetPartialCatalogObject(req, &resp);
                if (!status.ok()) LOG(ERROR) << status.GetDetail(); //catalog-server.cc:176
                TStatus thrift_status;
                status.ToThrift(&thrift_status);
                resp.__set_status(thrift_status);
                VLOG_RPC << "GetPartialCatalogObject(): response=" << ThriftDebugString(resp);
         }
      

      https://issues.apache.org/jira/browse/IMPALA-7436 

      The following code will still load all partitions: 

      //org.apache.impala.catalog.local.LocalFsTable
      @Override  public long getTotalHdfsBytes() {    
              // TODO(todd): this is slow because it requires loading all partitions. Remove if possible.   
              long size = 0;    
              for (FeFsPartition p: loadPartitions(getPartitionIds())) {
                    size += p.getSize();    
              }   
              return size;  
      }
      

      Attachments

        Issue Links

          Activity

            People

              guojingfeng guojingfeng
              abeltian abeltian
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: