Uploaded image for project: 'Apache Knox'
  1. Apache Knox
  2. KNOX-949

WebHDFS proxy replaces %20 encoded spaces in URL with + encoding

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 0.11.0
    • 0.13.0
    • None
    • None

    Description

      If a file with spaces in the name (e.g. foo bar.txt) is requested from HDFS, through WebHDFS and Knox - then Knox rewrites the %20 encoding in the URL sent by the client, with + encoding (e.g. foo%20bar.txt -> foo+bar.txt). This results in an HTTP 404 being returned by WebHDFS, and hence by Knox. Requesting the same file directly from WebHDFS works. Example

      Client request

      curl "https://<hostname>:18443/gateway/<cluster>/webhdfs/v1/docs/filename%20with%20spaces.pdf?op=OPEN" \
           -<username>:<password> -k -s
      

      Knox response body

      {"exception":"FileNotFoundException",
       "javaClassName":"java.io.FileNotFoundException",
       "message":"File /docs/filename+with+spaces.pdf not found."}
      

      Knox logs

      ==> /var/log/hadoop/knox/gateway-audit.log <==
      17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS||||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename
      with spaces.pdf?op=OPEN|unavailable|Request method: GET
      17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||authentication|uri|/gateway/<cluster>/webhdfs/v1/docs/filename
      with spaces.pdf?op=OPEN|success|
      17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||authentication|uri|/gateway/<cluster>/webhdfs/v1/docs/filename
      with spaces.pdf?op=OPEN|success|Groups: []
      17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||authorization|uri|/gateway/<cluster>/webhdfs/v1/docs/filename
      with spaces.pdf?op=OPEN|success|
      17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<namenode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces.pdf?op=OPEN&doAs=<username>|unavailable|Request
      method: GET
      17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<namenode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces.pdf?op=OPEN&doAs=<username>|success|Response
      status: 404
      17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename
      with spaces.pdf?op=OPEN|success|Response status: 404
      
      ==> /var/log/hadoop/knox/gateway.log <==
      2017-05-24 15:51:05,254 INFO  hadoop.gateway (KnoxLdapRealm.java:getUserDn(691)) - Computed
      userDn: uid=<username>,cn=users,cn=accounts,dc=<cluster> using dnTemplate for
      principal: <username>
      2017-05-24 15:51:05,259 INFO  hadoop.gateway (AclsAuthorizationFilter.java:doFilter(85)) -
      Access Granted: true
      

      Direct WebHDFS request for the same file

      # curl -si -u: "http://<namenode>:50070/webhdfs/v1/docs/filename%20with%20spaces.pdf?op=OPEN"
      --negotiate -L | head -n40
      HTTP/1.1 401 Authentication required
      Cache-Control: must-revalidate,no-cache,no-store
      Date: Wed, 24 May 2017 19:01:41 GMT
      Pragma: no-cache
      Date: Wed, 24 May 2017 19:01:41 GMT
      Pragma: no-cache
      X-FRAME-OPTIONS: SAMEORIGIN
      WWW-Authenticate: Negotiate
      Set-Cookie: hadoop.auth=; Path=/; HttpOnly
      Content-Type: text/html; charset=iso-8859-1
      Content-Length: 1533
      Server: Jetty(6.1.26.hwx)
      
      HTTP/1.1 307 TEMPORARY_REDIRECT
      Cache-Control: no-cache
      Expires: Wed, 24 May 2017 19:01:42 GMT
      Date: Wed, 24 May 2017 19:01:42 GMT
      Pragma: no-cache
      Expires: Wed, 24 May 2017 19:01:42 GMT
      Date: Wed, 24 May 2017 19:01:42 GMT
      Pragma: no-cache
      X-FRAME-OPTIONS: SAMEORIGIN
      WWW-Authenticate: Negotiate YGkGCSqGSIb3EgECAgIAb1owWKADAgEFoQMCAQ+iTDBKoAMCARKiQwRBQM/auuLcl2xey6wMp6EjCPJFSqK3snscxMzW7RvfgxOo7182GzD5N9jf+OWGr+tjpvlRX0c/7iTBfYKSetf4ekU=
      Set-Cookie: hadoop.auth="u=admin&p=admin@CYSAFA&t=kerberos&e=1495688502002&s=b7p35TgaxItAUTkKJuSXuynoq9E=";
      Path=/; HttpOnly
      Content-Type: application/octet-stream
      Location: http://<datanode3>:1022/webhdfs/v1/docs/filename%20with%20spaces.pdf?op=OPEN&delegation=HgAFYWRtaW4FYWRtaW4AigFcO9YJ8ooBXF_ijfJFAxSBYFUnsXY3up11ZNIi4hIi__5RvRJXRUJIREZTIGRlbGVnYXRpb24PMTcyLjE4LjAuOTo4MDIw&namenoderpcaddress=<namenode>:8020&offset=0
      Content-Length: 0
      Server: Jetty(6.1.26.hwx)
      
      HTTP/1.1 200 OK
      Access-Control-Allow-Methods: GET
      Access-Control-Allow-Origin: *
      Content-Type: application/octet-stream
      Connection: close
      Content-Length: 13365618
      
      %����1.6
      <</Filter/FlateDecode/First 157/Length 5350/N 16/Type/ObjStm>>stream
      ...
      

      See also

      Attachments

        Issue Links

          Activity

            People

              lmccay Larry McCay
              willmerae Alex Willmer
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: