Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: HADOOP-10388
    • Fix Version/s: HADOOP-10388
    • Component/s: native
    • Labels:
      None

      Description

      Implement listStatus and getFileInfo in the native client.

      1. HADOOP-10725-pnative.001.patch
        20 kB
        Colin Patrick McCabe
      2. HADOOP-10725-pnative.002.patch
        58 kB
        Colin Patrick McCabe
      3. HADOOP-10725-pnative.003.patch
        57 kB
        Colin Patrick McCabe
      4. HADOOP-10725-pnative.004.patch
        10 kB
        Colin Patrick McCabe

        Activity

        Hide
        Colin Patrick McCabe added a comment -

        rebase on previous changes, clean up URI stuff a bit

        Show
        Colin Patrick McCabe added a comment - rebase on previous changes, clean up URI stuff a bit
        Hide
        Abraham Elmahrek added a comment -
        1. (hadoop-native-core/src/main/native/fs/fs.c) Pull this out into a separate function? Seems like an operation that will have to be done frequently.
          +    if (hdfs_bld->nn) {
          +        if ((!index(hdfs_bld->nn, '/')) && (index(hdfs_bld->nn, ':'))) {
          +            // If the connection URI was set via hdfsBuilderSetNameNode, it may
          +            // not be a real URI, but just a <hostname>:<port> pair.  This won't
          +            // parse correctly unless we add a hdfs:// scheme in front of it.
          +            err = dynprintf(&uri_str, "hdfs://%s", hdfs_bld->nn);
          +            if (err)
          +                goto done;
          +        } else {
          +            uri_str = strdup(hdfs_bld->nn);
          +            if (!uri_str) {
          +                err = hadoop_lerr_alloc(ENOMEM, "hdfs_builder_parse_conn_uri: OOM");
                           goto done;
                       }    
          -            uri_str = malloced_uri_str;
          
        2. (hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.c) Should precedence be given to the explicitly defined "port" member or the pre-existing port in the URI? It seems like an explicit definition in the builder should take precedence?
          -        if (lastColon && (strspn(lastColon + 1, "0123456789") ==                                                                                                                                                                                                                                                                                                                                                                     
          -                          strlen(lastColon + 1))) {
          -            fprintf(stderr, "port %d was given, but URI '%s' already "
          -                "contains a port!\n", bld->port, bld->nn);
          -            return EINVAL;
          +        if (!(lastColon && (strspn(lastColon + 1, "0123456789") ==
          +                          strlen(lastColon + 1)))) {
          +            // If the URI doesn't include a port, add bld->port.
          +            // Note that bld->port is ignored if the URI already has a port.
          +            snprintf(suffix, sizeof(suffix), ":%d", bld->port);
          
        3. (hadoop-native-core/src/main/native/ndfs/ndfs.c) Is this how the previous HDFS clients worked? Using the previous seen filename won't work if the file has been removed. Just curious...
          +        if (entries_len > 0) {
          +            prev = entries[entries_len - 1].mName;
          +        } else {
          +            prev = "";
          +        }
          
        4. (hadoop-native-core/src/main/native/jni/jnifs.c) This code segment appears to be exactly the same as hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.c. Maybe a utility function would be useful?
          -        if (lastColon && (strspn(lastColon + 1, "0123456789") ==                                                                                                                                                                                                                                                                                                                                                                     
          -                          strlen(lastColon + 1))) {
          -            fprintf(stderr, "port %d was given, but URI '%s' already "
          -                "contains a port!\n", bld->port, bld->nn);
          -            return EINVAL;
          +        if (!(lastColon && (strspn(lastColon + 1, "0123456789") ==
          +                          strlen(lastColon + 1)))) {
          +            // If the URI doesn't include a port, add bld->port.
          +            // Note that bld->port is ignored if the URI already has a port.
          +            snprintf(suffix, sizeof(suffix), ":%d", bld->port);
          

        All in all good stuff! Make sure to create follow up Jira's. ie: Block location is left out in 'ndfs_list_partial'.

        Show
        Abraham Elmahrek added a comment - (hadoop-native-core/src/main/native/fs/fs.c) Pull this out into a separate function? Seems like an operation that will have to be done frequently. + if (hdfs_bld->nn) { + if ((!index(hdfs_bld->nn, '/')) && (index(hdfs_bld->nn, ':'))) { + // If the connection URI was set via hdfsBuilderSetNameNode, it may + // not be a real URI, but just a <hostname>:<port> pair. This won't + // parse correctly unless we add a hdfs:// scheme in front of it. + err = dynprintf(&uri_str, "hdfs: //%s" , hdfs_bld->nn); + if (err) + goto done; + } else { + uri_str = strdup(hdfs_bld->nn); + if (!uri_str) { + err = hadoop_lerr_alloc(ENOMEM, "hdfs_builder_parse_conn_uri: OOM" ); goto done; } - uri_str = malloced_uri_str; (hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.c) Should precedence be given to the explicitly defined "port" member or the pre-existing port in the URI? It seems like an explicit definition in the builder should take precedence? - if (lastColon && (strspn(lastColon + 1, "0123456789" ) == - strlen(lastColon + 1))) { - fprintf(stderr, "port %d was given, but URI '%s' already " - "contains a port!\n" , bld->port, bld->nn); - return EINVAL; + if (!(lastColon && (strspn(lastColon + 1, "0123456789" ) == + strlen(lastColon + 1)))) { + // If the URI doesn't include a port, add bld->port. + // Note that bld->port is ignored if the URI already has a port. + snprintf(suffix, sizeof(suffix), ":%d" , bld->port); (hadoop-native-core/src/main/native/ndfs/ndfs.c) Is this how the previous HDFS clients worked? Using the previous seen filename won't work if the file has been removed. Just curious... + if (entries_len > 0) { + prev = entries[entries_len - 1].mName; + } else { + prev = ""; + } (hadoop-native-core/src/main/native/jni/jnifs.c) This code segment appears to be exactly the same as hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.c. Maybe a utility function would be useful? - if (lastColon && (strspn(lastColon + 1, "0123456789" ) == - strlen(lastColon + 1))) { - fprintf(stderr, "port %d was given, but URI '%s' already " - "contains a port!\n" , bld->port, bld->nn); - return EINVAL; + if (!(lastColon && (strspn(lastColon + 1, "0123456789" ) == + strlen(lastColon + 1)))) { + // If the URI doesn't include a port, add bld->port. + // Note that bld->port is ignored if the URI already has a port. + snprintf(suffix, sizeof(suffix), ":%d" , bld->port); All in all good stuff! Make sure to create follow up Jira's. ie: Block location is left out in 'ndfs_list_partial'.
        Hide
        Colin Patrick McCabe added a comment -

        (hadoop-native-core/src/main/native/fs/fs.c) Pull this out into a separate function? Seems like an operation that will have to be done frequently.

        This is a bit of a special case just for the connection URI. I guess the issue is that you have people connecting with stuff like "localhost:8020", which isn't technically a well-formed URI, but which we sort of have to handle (by looking at it as authority=localhost, port=8020). On the other hand, when someone gives you a path that looks like "myfile:123", you just want to parse it with the standard URI parsing code. We might need more massaging for files with colons in them later, but it's a bit of a grey area (see HDFS-13) so I'd like to avoid dealing with it for now. For now, I'd like to keep this hack for the connection uri, but not for others.

        (hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.c) Should precedence be given to the explicitly defined "port" member or the pre-existing port in the URI? It seems like an explicit definition in the builder should take precedence?

        So there are three options:
        1. fail with error message (current behavior in trunk)
        2. hdfsBuilderSetNameNodePort wins if set
        3. URI port wins hdfsBuilderSetNameNodePort if set

        #2 is hard to implement for jniFS. If you're given a URI such as hdfs://server:123/foo/bar, you'd have to replace 123 with whatever port you liked through string operations, prior to sending along the URI to the java code.

        I wish we had never added hdfsBuilderSetNameNodePort... it's definitely superfluous, since the port can be in the URI. Maybe we should just stick with option #1 for now and error out when there is a conflict.

        (hadoop-native-core/src/main/native/ndfs/ndfs.c) Is this how the previous HDFS clients worked? Using the previous seen filename won't work if the file has been removed. Just curious...

        Yes, this is how the Java code works. I don't think there's an issue with the previous filename getting removed, either. Doing a listStatus with a filename just means that you want filenames that sort after that filename, not that you necessarily think there is such a filename.

        (hadoop-native-core/src/main/native/jni/jnifs.c) This code segment appears to be exactly the same as hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.c. Maybe a utility function would be useful?

        The src/main/native/libhdfs directory is going away, to be replaced by the jnifs/ directory. I haven't done that yet, but it's just an svn delete, not a very interesting patch.

        Show
        Colin Patrick McCabe added a comment - (hadoop-native-core/src/main/native/fs/fs.c) Pull this out into a separate function? Seems like an operation that will have to be done frequently. This is a bit of a special case just for the connection URI. I guess the issue is that you have people connecting with stuff like "localhost:8020", which isn't technically a well-formed URI, but which we sort of have to handle (by looking at it as authority=localhost, port=8020). On the other hand, when someone gives you a path that looks like "myfile:123", you just want to parse it with the standard URI parsing code. We might need more massaging for files with colons in them later, but it's a bit of a grey area (see HDFS-13 ) so I'd like to avoid dealing with it for now. For now, I'd like to keep this hack for the connection uri, but not for others. (hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.c) Should precedence be given to the explicitly defined "port" member or the pre-existing port in the URI? It seems like an explicit definition in the builder should take precedence? So there are three options: 1. fail with error message (current behavior in trunk) 2. hdfsBuilderSetNameNodePort wins if set 3. URI port wins hdfsBuilderSetNameNodePort if set #2 is hard to implement for jniFS. If you're given a URI such as hdfs://server:123/foo/bar, you'd have to replace 123 with whatever port you liked through string operations, prior to sending along the URI to the java code. I wish we had never added hdfsBuilderSetNameNodePort ... it's definitely superfluous, since the port can be in the URI. Maybe we should just stick with option #1 for now and error out when there is a conflict. (hadoop-native-core/src/main/native/ndfs/ndfs.c) Is this how the previous HDFS clients worked? Using the previous seen filename won't work if the file has been removed. Just curious... Yes, this is how the Java code works. I don't think there's an issue with the previous filename getting removed, either. Doing a listStatus with a filename just means that you want filenames that sort after that filename, not that you necessarily think there is such a filename. (hadoop-native-core/src/main/native/jni/jnifs.c) This code segment appears to be exactly the same as hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.c. Maybe a utility function would be useful? The src/main/native/libhdfs directory is going away, to be replaced by the jnifs/ directory. I haven't done that yet, but it's just an svn delete, not a very interesting patch.
        Hide
        Colin Patrick McCabe added a comment -

        new patch with update:

        • Don't change the existing behavior of bailing out with an error if the port specified with hdfsBuilderSetNameNodePort differs from the URI port
        Show
        Colin Patrick McCabe added a comment - new patch with update: Don't change the existing behavior of bailing out with an error if the port specified with hdfsBuilderSetNameNodePort differs from the URI port
        Hide
        Colin Patrick McCabe added a comment -

        I'm going to split out the umask and uri fixes into separate patches to make this easier to review

        Show
        Colin Patrick McCabe added a comment - I'm going to split out the umask and uri fixes into separate patches to make this easier to review
        Hide
        Colin Patrick McCabe added a comment -

        This version should be a lot easier to read, since I split the URL, conf, and other changes into separate JIRAs.

        Show
        Colin Patrick McCabe added a comment - This version should be a lot easier to read, since I split the URL, conf, and other changes into separate JIRAs.
        Hide
        Abraham Elmahrek added a comment -

        +1... Good stuff dude.

        Show
        Abraham Elmahrek added a comment - +1... Good stuff dude.

          People

          • Assignee:
            Colin Patrick McCabe
            Reporter:
            Colin Patrick McCabe
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development