Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-11427

[C++] Arrow uses AVX512 instructions even when not supported by the OS

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 4.0.0
    • C++, Python
    • Windows Server 2012 Datacenter, Azure VM (D2_v2), Intel Xeon Platinum 8171m

    Description

      Update: Azure (D2_v2) VM no longer spins-up with Xeon Platinum 8171m, so I'm unable to test it with other OS's.  Azure VM's are assigned different type of CPU's of same "class" depending on availability. I will try my "luck" later.

      VM's w/ Xeon Platinum 8171m running on Azure (D2_v2) start crashing after upgrading from pyarrow 2.0 to pyarrow 3.0. However, this only happens when reading parquet files larger than 4096 bits!?

      Windows closes Python with exit code 255 and produces this:

       

      Faulting application name: python.exe, version: 3.8.3150.1013, time stamp: 0x5ebc7702 Faulting module name: arrow.dll, version: 0.0.0.0, time stamp: 0x60060ce3 Exception code: 0xc000001d Fault offset: 0x000000000047aadc Faulting process id: 0x1b10 Faulting application start time: 0x01d6f4a43dca3c14 Faulting application path: D:\SvcFab\_App\SomeApp.FabricType_App32\SomeApp.Fabric.Executor.ProcessActorPkg.Code.1.0.218-prod\Python38\python.exe Faulting module path: D:\SvcFab\_App\SomeApp.FabricType_App32\temp\Executions\50cfffe8-9250-4ac7-8ba8-08d8c2bb3edf\.venv\lib\site-packages\pyarrow\arrow.dll

       

      Tested on:

      OS Xeon Platinum 8171m or 8272CL Other CPUs
      Windows Server 2012 Data Center Fail OK
      Windows Server 2016 Data Center  OK OK
      Windows Server 2019 Data Center    
      Windows 10   OK

       

      Example code (Python): 

      import numpy as np
      import pandas as pd
      
      data_len = 2**5
      data = pd.DataFrame(
          {"values": np.arange(0., float(data_len), dtype=float)},
          index=np.arange(0, data_len, dtype=int)
      )
      
      data.to_parquet("test.parquet")
      data = pd.read_parquet("test.parquet", engine="pyarrow")  # fails here!
      

      Attachments

        Issue Links

          Activity

            People

              apitrou Antoine Pitrou
              ali.cetin Ali Cetin
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 50m
                  1h 50m