Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-16090

Unable to connect to flight server in container using self-signed certificate

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Not A Problem
    • 7.0.0
    • 7.0.0
    • FlightRPC, Python
    • None

    Description

      hi 

      I'm busy trying to build a python Arrow server on a docker container. The rationale for moving it into a container is to isolate components of my program so if there's an exception/performance issue where something gobbles all the memory I'm able to quickly kill the container without bringing down the entire program.

      I'm having problems connecting a local python script client to the server in the container. I'm not sure if it's a certificate issue /grpc issue/arrow server config issue. Going to break down what I've done below. Any help would be appreciated

      1. Grabbed the arrow python server from the github repo.
      2. Since I want to implement secure communication I'll need a certificate - self-signed should be fine for development. Generate development certificate using dotnet dev-certs. After trusting certificate, export it using cmd in windows.
      dotnet dev-certs https --trust
      dotnet dev-certs https -ep "test.pfx" -p testpassword

      1. My understanding is that the Arrow server only accepts .crt and .key files for public private key. I used WSL and SSL to convert the pfx file using this article from IBM.

      2. Placing the public and private key in the same folder as my server script - I adjust the code as follows to not need to pass things in via args.

      scheme = "grpc+tls"        
      
      with open("testPublicKey.crt", "rb") as cert_file:
           tls_cert_chain = cert_file.read()        
      with open("testPrivateKey.key", "rb") as key_file:            
          tls_private_key = key_file.read()        
      
      tls_certificates.append((tls_cert_chain, tls_private_key)) 

      My client code is a slimmed-down version of the one on the repo as a test I want to push some dummy data into the server.

      import pyarrow
      import pyarrow.flight
      import pandas as pd# Assumes that data is a Dataframe
      
      def pushToServer(name, data, client):
          objectToSend = pyarrow.Table.from_pandas(data)
          writer, _ = client.do_put(pyarrow.flight.FlightDescriptor.for_path(name), objectToSend.schema)
          writer.write_table(objectToSend)
          writer.close()
      
      def getClient():
          return pyarrow.flight.FlightClient("grpc+tcp://localhost:5005")
      def main():
          client = getClient()
          data = {'Country': ['Belgium', 'India', 'Brazil'],            'Capital': ['Brussels', 'New Delhi', 'Brasilia'],            'Population': [11190846, 1303171035, 207847528]}    df = pd.DataFrame(data, columns=['Country', 'Capital', 'Population'])
          pushToServer("PredictedValues", df, client)if __name__ == '__main__':
          try:
              main()
          except Exception as e:
              print(e) 

      3. Running this on my local machine is fine- now I want to move the server into the container. I set up the docker file in the same folder as server script. See below (I know it's not ideal, but it does the job)

      FROM mcr.microsoft.com/dotnet/sdk
      EXPOSE 5005
      COPY server.py /home

      build the image  and run the container as below

      docker build -t test .
      docker run -it -p 5005:5005 test

      4. In the container, I quickly get python and pyarrow installed and then start the server

      apt-get update
      apt-get install python3.10 python3-pip
      pip install pyarrow
      //start server time
      cd home
      python3 server.py
      //responds with "Serving on grpc+tls://localhost:5005"

      5. Since the ports are mapped when we started the container, I rerun the client on my local and I'm greeted with this error on the client end.

      gRPC returned unavailable error, with message: failed to connect to all addresses. Client context: IOError: Could not write record batch to stream. Detail: Internal. gRPC client debug context: {"created":"@1648805430.279000000","description":"Failed to pick subchannel","file":"C:\vcpkg\buildtrees\grpc\src\2180080eb4-87c05d756b.clean\src\core\ext\filters\client_channel\client_channel.cc","file_line":3159,"referenced_errors":[{"created":"@1648805430.279000000","description":"failed to connect to all addresses","file":"C:\vcpkg\buildtrees\grpc\src\2180080eb4-87c05d756b.clean\src\core\lib\transport\error_utils.cc","file_line":147,"grpc_status":14}]}. Additionally, could not finish writing record batches before closing 

      Putting a try-catch on the server-side doesn't provide any more info, unfortunately.

      I've already ruled out that I might have a dodgy certificate. I've used the same certificate to set up a basic C# kestrel server in a container using HTTPS. I've also tried the above using a C# server with the same issue.

      Is there any obvious I'm missing in the config? I haven't found any examples where people use certificates with pyarrow, so a bit at a loss.

       

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              ThatStatsGuy Chris Dunderdale
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: