Details

    • Type: Task Task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Clownfish
    • Labels:
      None

      Description

      In order to get compiled extension working on Windows, we'll have to define the visibility of our extern variables and functions. On Windows every exported symbol of a DLL has to be marked with __declspec(dllexport) when compiling the DLL. If you're linking against a DLL, every symbol imported from the DLL has to be marked __declspec(dllimport)

      On UNIX, every symbol is exported by default, so defining visibility is not strictly necessary. But hiding symbols that don't have to be exported has the benefit of reducing size and speeding up loading of a DLL. Hidden symbols also allow the compiler to generate more optimized code.

      The standard approach is to compile with the GCC option -fvisibility=hidden which emulates the Windows behavior. Then a macro is defined roughly like that (should probably be handled by charmonizer):

      #if defined __GNUC__
      #  if defined _WIN32 || defined __CYGWIN__
      #    define CHY_EXPORT __attribute__ ((dllexport))
      #  elif __GNUC__ >= 4
      #    define CHY_EXPORT __attribute__ ((visibility ("default")))
      #  else
      #    define CHY_EXPORT
      #  endif
      #elif defined _MFC_VER
      #  define CHY_EXPORT __declspec(dllexport)
      #else
      #  define CHY_EXPORT
      #endif
      

      This macro can then be used like that:

      CHY_EXPORT void
      exported_function();
      
      extern CHY_EXPORT int exported_variable;
      

      When compiling an extension, we also have to handle __declspec(dllimport) on Windows. For the generated headers, we could define CHY_IMPORT and use it instead of CHY_EXPORT for "included" headers. For the code in XSBind.[ch], we could define BUILDING_XSBIND only during compilation of XSBind.c and then use something like that:

      #if defined __GNUC__
      #  if defined _WIN32 || defined __CYGWIN__
      #    if BUILDING_XSBIND
      #      define XSBIND_EXPORT __attribute__ ((dllexport))
      #    else
      #      define XSBIND_EXPORT __attribute__ ((dllimport))
      #    endif
      #  elif __GNUC__ >= 4
      #    define XSBIND_EXPORT __attribute__ ((visibility ("default")))
      #  else
      #    define XSBIND_EXPORT
      #  endif
      #elif defined _MFC_VER
      #  if BUILDING_XSBIND
      #    define XSBIND_EXPORT __declspec(dllexport)
      #  else
      #    define XSBIND_EXPORT __declspec(dllimport)
      #  endif
      #else
      #  define XSBIND_EXPORT
      #endif
      
      XSBIND_EXPORT cfish_Obj*
      cfish_XSBind_new_blank_obj(SV *either_sv);
      

        Activity

        Hide
        Nick Wellnhofer added a comment -

        Here are some first proof-of-concept patches. I'm not sure if the new Charmonizer module works on anything but GCC >= 4.0 on UNIX. It seems that GCC only warns about unknown attributes, so it might not be so easy to detect whether certain attributes are supported.

        Before:

        $ nm -D blib/arch/auto/Lucy/Lucy.so |wc -l
        12457
        

        After:

        $ nm -D blib/arch/auto/Lucy/Lucy.so |wc -l
        11434
        

        We still have to export all the method symbols because they might be used in the initialization of VTables of extension classes (and possibly other places). We could leverage the recent VTable bootstrap changes and make it possible to pass NULL for methods that are not redefined in a subclass. VTable_bootstrap could then fill in the method from the parent class without needing its symbol.

        Show
        Nick Wellnhofer added a comment - Here are some first proof-of-concept patches. I'm not sure if the new Charmonizer module works on anything but GCC >= 4.0 on UNIX. It seems that GCC only warns about unknown attributes, so it might not be so easy to detect whether certain attributes are supported. Before: $ nm -D blib/arch/auto/Lucy/Lucy.so |wc -l 12457 After: $ nm -D blib/arch/auto/Lucy/Lucy.so |wc -l 11434 We still have to export all the method symbols because they might be used in the initialization of VTables of extension classes (and possibly other places). We could leverage the recent VTable bootstrap changes and make it possible to pass NULL for methods that are not redefined in a subclass. VTable_bootstrap could then fill in the method from the parent class without needing its symbol.
        Hide
        Marvin Humphrey added a comment -

        > We still have to export all the method symbols because they might be used in
        > the initialization of VTables of extension classes (and possibly other
        > places).

        You mean the implementing functions? We should be able to just copy the
        parent VTable, as VTable_singleton() does when creating a new subclass at
        runtime.

        The only place that we have to worry about symbol import is when the
        implementation is defined in a separate file, as it is for host
        implementations such as Lucy_Doc_Get_Size();

        Say that we autogenerate VTable bootstrapping code to look for functions
        beginning with "S_" (because the implementations are static functions). For
        methods where the implementation must be elsewhere, we can have the static
        function wrap an ordinary (implicitly extern) function:

            // core/Lucy/Document/Doc.c
        
            static uint32_t
            S_Doc_get_size(Doc *self) {
                return lucy_Doc_get_size_IMPL(self);
            }
        
            // perl/xs/Lucy/Document/Doc.c
        
            uint32_t
            lucy_Doc_get_size_IMPL(lucy_Doc *self) {
                return self->fields ? HvKEYS((HV*)self->fields) : 0;
            }
        
        Show
        Marvin Humphrey added a comment - > We still have to export all the method symbols because they might be used in > the initialization of VTables of extension classes (and possibly other > places). You mean the implementing functions? We should be able to just copy the parent VTable, as VTable_singleton() does when creating a new subclass at runtime. The only place that we have to worry about symbol import is when the implementation is defined in a separate file, as it is for host implementations such as Lucy_Doc_Get_Size(); Say that we autogenerate VTable bootstrapping code to look for functions beginning with "S_" (because the implementations are static functions). For methods where the implementation must be elsewhere, we can have the static function wrap an ordinary (implicitly extern) function: // core/Lucy/Document/Doc.c static uint32_t S_Doc_get_size(Doc *self) { return lucy_Doc_get_size_IMPL(self); } // perl/xs/Lucy/Document/Doc.c uint32_t lucy_Doc_get_size_IMPL(lucy_Doc *self) { return self->fields ? HvKEYS((HV*)self->fields) : 0; }
        Hide
        Marvin Humphrey added a comment -

        The patches look like a good start to me, +1 to commit!

        Show
        Marvin Humphrey added a comment - The patches look like a good start to me, +1 to commit!
        Hide
        Nick Wellnhofer added a comment -

        > You mean the implementing functions? We should be able to just copy the
        > parent VTable, as VTable_singleton() does when creating a new subclass at
        > runtime.
        >
        > The only place that we have to worry about symbol import is when the
        > implementation is defined in a separate file, as it is for host
        > implementations such as Lucy_Doc_Get_Size();

        I wasn't yet thinking of making the implementing functions static, but to hide them from compiled extensions by not exporting them in the DLL.

        Show
        Nick Wellnhofer added a comment - > You mean the implementing functions? We should be able to just copy the > parent VTable, as VTable_singleton() does when creating a new subclass at > runtime. > > The only place that we have to worry about symbol import is when the > implementation is defined in a separate file, as it is for host > implementations such as Lucy_Doc_Get_Size(); I wasn't yet thinking of making the implementing functions static, but to hide them from compiled extensions by not exporting them in the DLL.
        Hide
        Nick Wellnhofer added a comment -

        With the reworked patchset 03-06 I was able to successfully build a compiled extension on Windows 7 64-bit with ActivePerl and MSVC. And we're down to 9553 exported symbols in Lucy.so.

        I only don't like the name "method data". Maybe someone has a better idea.

        Show
        Nick Wellnhofer added a comment - With the reworked patchset 03-06 I was able to successfully build a compiled extension on Windows 7 64-bit with ActivePerl and MSVC. And we're down to 9553 exported symbols in Lucy.so. I only don't like the name "method data". Maybe someone has a better idea.
        Hide
        Marvin Humphrey added a comment -

        > With the reworked patchset 03-06 I was able to successfully build a compiled
        > extension on Windows 7 64-bit with ActivePerl and MSVC. And we're down to
        > 9553 exported symbols in Lucy.so.

        Sweet!!

        • 0003-Unify-method-data-for-initialization-and-callbacks.patch looks good and
          suggests an interesting direction outside the scope of this issue, which I
          will post about on the dev list. FWIW, I'd suggest "cfish_Method" for the
          name of the struct, or possibly "cfish_MetaData" or "cfish_MetaMethod", and
          "_META" for the variables. Auxillary suggestion: eventually, it would be
          handy to have the host alias stored alongside the rest of the method
          introspection data – that would allow us to simplify some code in VTable.c
          and might eventually enable us to improve certain error messages by spelling
          the method name as documented in the host language binding docs.
        • 0004-Preliminary-Charmonizer-support-for-symbol-export.patch looks perfect.
        • 0005-Switch-to-fvisibility-hidden-and-start-using-CHY_EXP.patch looks good,
          thought I have one comment. It took me a while to figure out the reasoning
          behind changing up the export symbols in the autogenerated .h files based on
          whether the .cfh file was "included". Up until now the output of CFC has
          been deterministic for a given set of .cfh files; after this patch, that's
          no longer true. My first instinct would have been to use an #ifdef chain as
          described above; I think you chose to do things this way so that the
          extension author wouldn't have to add an extra pound-define to to their C
          files, right?
        • 0006-Install-import-library-on-Windows.patch looks good.

        +1 to commit the patch series!

        Show
        Marvin Humphrey added a comment - > With the reworked patchset 03-06 I was able to successfully build a compiled > extension on Windows 7 64-bit with ActivePerl and MSVC. And we're down to > 9553 exported symbols in Lucy.so. Sweet!! 0003-Unify-method-data-for-initialization-and-callbacks.patch looks good and suggests an interesting direction outside the scope of this issue, which I will post about on the dev list. FWIW, I'd suggest "cfish_Method" for the name of the struct, or possibly "cfish_MetaData" or "cfish_MetaMethod", and "_META" for the variables. Auxillary suggestion: eventually, it would be handy to have the host alias stored alongside the rest of the method introspection data – that would allow us to simplify some code in VTable.c and might eventually enable us to improve certain error messages by spelling the method name as documented in the host language binding docs. 0004-Preliminary-Charmonizer-support-for-symbol-export.patch looks perfect. 0005-Switch-to-fvisibility-hidden-and-start-using-CHY_EXP.patch looks good, thought I have one comment. It took me a while to figure out the reasoning behind changing up the export symbols in the autogenerated .h files based on whether the .cfh file was "included". Up until now the output of CFC has been deterministic for a given set of .cfh files; after this patch, that's no longer true. My first instinct would have been to use an #ifdef chain as described above; I think you chose to do things this way so that the extension author wouldn't have to add an extra pound-define to to their C files, right? 0006-Install-import-library-on-Windows.patch looks good. +1 to commit the patch series!
        Hide
        Nick Wellnhofer added a comment -

        > 0003-Unify-method-data-for-initialization-and-callbacks.patch looks good and
        > suggests an interesting direction outside the scope of this issue, which I
        > will post about on the dev list. FWIW, I'd suggest "cfish_Method" for the
        > name of the struct, or possibly "cfish_MetaData" or "cfish_MetaMethod", and
        > "_META" for the variables.

        Hmm, "cfish_Method" would clash with the name of a real Method class once it's introduced. Although we could create Method objects without using the struct then, it might be worth to keep the struct for initialization. It's more compact memory-wise than to unroll everything into function arguments.

        "cfish_MetaData" doesn't tell that the metadata is about methods.

        > 0005-Switch-to-fvisibility-hidden-and-start-using-CHY_EXP.patch looks good,
        > thought I have one comment. It took me a while to figure out the reasoning
        > behind changing up the export symbols in the autogenerated .h files based on
        > whether the .cfh file was "included". Up until now the output of CFC has
        > been deterministic for a given set of .cfh files; after this patch, that's
        > no longer true. My first instinct would have been to use an #ifdef chain as
        > described above; I think you chose to do things this way so that the
        > extension author wouldn't have to add an extra pound-define to to their C
        > files, right?

        We have to exchange CHY_EXPORT for CHY_IMPORT in headers of included classes because of the way DLL symbols are handled on Windows.

        The reason I went with my approach is that we'd have to introduce per-parcel #defines if we want to make the CFC output deterministic. That's more complicated. It would also mean that an extension could never use the same parcel as the project it extends.

        If we use the #ifdef trick, the question is where to put that #ifdef. I think it should go in a per-parcel header that gets installed in Clownfish/_include, but we don't have something like that at the moment.

        We could also autogenerate a #define in parcel.h for every parcel. In Lucy we'd have for example:

        #define LUCY_PUBLIC CHY_EXPORT
        

        In an extension it would be:

        #define LUCY_PUBLIC CHY_IMPORT
        #define EXT_PUBLIC CHY_EXPORT
        

        This would mean to keep track of every parcel encountered during compilation. It would also make the contents of parcel.h non-deterministic again.

        Show
        Nick Wellnhofer added a comment - > 0003-Unify-method-data-for-initialization-and-callbacks.patch looks good and > suggests an interesting direction outside the scope of this issue, which I > will post about on the dev list. FWIW, I'd suggest "cfish_Method" for the > name of the struct, or possibly "cfish_MetaData" or "cfish_MetaMethod", and > "_META" for the variables. Hmm, "cfish_Method" would clash with the name of a real Method class once it's introduced. Although we could create Method objects without using the struct then, it might be worth to keep the struct for initialization. It's more compact memory-wise than to unroll everything into function arguments. "cfish_MetaData" doesn't tell that the metadata is about methods. > 0005-Switch-to-fvisibility-hidden-and-start-using-CHY_EXP.patch looks good, > thought I have one comment. It took me a while to figure out the reasoning > behind changing up the export symbols in the autogenerated .h files based on > whether the .cfh file was "included". Up until now the output of CFC has > been deterministic for a given set of .cfh files; after this patch, that's > no longer true. My first instinct would have been to use an #ifdef chain as > described above; I think you chose to do things this way so that the > extension author wouldn't have to add an extra pound-define to to their C > files, right? We have to exchange CHY_EXPORT for CHY_IMPORT in headers of included classes because of the way DLL symbols are handled on Windows. The reason I went with my approach is that we'd have to introduce per-parcel #defines if we want to make the CFC output deterministic. That's more complicated. It would also mean that an extension could never use the same parcel as the project it extends. If we use the #ifdef trick, the question is where to put that #ifdef. I think it should go in a per-parcel header that gets installed in Clownfish/_include, but we don't have something like that at the moment. We could also autogenerate a #define in parcel.h for every parcel. In Lucy we'd have for example: #define LUCY_PUBLIC CHY_EXPORT In an extension it would be: #define LUCY_PUBLIC CHY_IMPORT #define EXT_PUBLIC CHY_EXPORT This would mean to keep track of every parcel encountered during compilation. It would also make the contents of parcel.h non-deterministic again.
        Hide
        Marvin Humphrey added a comment -

        Here's a report, "bad_encapsulation_report.txt", which shows 33 cases of
        direct usage of implementing functions in the Lucy core, and the script which
        was used to generate the report.

        All of these ought to be fixed to use either proper method invocations,
        SUPER_METHOD, or METHOD.

        Show
        Marvin Humphrey added a comment - Here's a report, "bad_encapsulation_report.txt", which shows 33 cases of direct usage of implementing functions in the Lucy core, and the script which was used to generate the report. All of these ought to be fixed to use either proper method invocations, SUPER_METHOD, or METHOD.
        Hide
        Marvin Humphrey added a comment -

        > Hmm, "cfish_Method" would clash with the name of a real Method class once
        > it's introduced. Although we could create Method objects without using the
        > struct then, it might be worth to keep the struct for initialization. It's
        > more compact memory-wise than to unroll everything into function arguments.

        My main concern is that we avoid publishing the struct definition.

        I think both of our objectives can be met by scope-limiting the method data
        struct def to the source file, defining an array of structs and looping over
        the array to feed a constructor. That way, we can e.g. add a member var to
        Clownfish::Method without breaking extensions.

        > "cfish_MetaData" doesn't tell that the metadata is about methods.

        Sorry, that was a brain-o.

        That suggestion should have read "cfish_MethodMetadata".

        > We have to exchange CHY_EXPORT for CHY_IMPORT in headers of included classes
        > because of the way DLL symbols are handled on Windows.
        >
        > The reason I went with my approach is that we'd have to introduce per-parcel
        > #defines if we want to make the CFC output deterministic. That's more
        > complicated.

        OK, you've persuaded me. The more I think about it the more I like it!

        > It would also mean that an extension could never use the same
        > parcel as the project it extends.

        Well, that's actually the intent – no two extensions should ever have the
        same parcel. The parcel is the Clownfish unit of distribution. If you had
        two distros with the same parcel, they would collide, both at installation
        time and at runtime.

        > We could also autogenerate a #define in parcel.h for every parcel.

        > This would mean to keep track of every parcel encountered during compilation.

        That's a lot of bookkeeping, all right.

        I believe that under your system, developers will expend less effort and get
        more reliable results. And if something goes wrong, it will be easy to
        tell – there will massive catastrophic link errors.

        Show
        Marvin Humphrey added a comment - > Hmm, "cfish_Method" would clash with the name of a real Method class once > it's introduced. Although we could create Method objects without using the > struct then, it might be worth to keep the struct for initialization. It's > more compact memory-wise than to unroll everything into function arguments. My main concern is that we avoid publishing the struct definition. I think both of our objectives can be met by scope-limiting the method data struct def to the source file, defining an array of structs and looping over the array to feed a constructor. That way, we can e.g. add a member var to Clownfish::Method without breaking extensions. > "cfish_MetaData" doesn't tell that the metadata is about methods. Sorry, that was a brain-o. That suggestion should have read "cfish_MethodMetadata". > We have to exchange CHY_EXPORT for CHY_IMPORT in headers of included classes > because of the way DLL symbols are handled on Windows. > > The reason I went with my approach is that we'd have to introduce per-parcel > #defines if we want to make the CFC output deterministic. That's more > complicated. OK, you've persuaded me. The more I think about it the more I like it! > It would also mean that an extension could never use the same > parcel as the project it extends. Well, that's actually the intent – no two extensions should ever have the same parcel. The parcel is the Clownfish unit of distribution. If you had two distros with the same parcel, they would collide, both at installation time and at runtime. > We could also autogenerate a #define in parcel.h for every parcel. > This would mean to keep track of every parcel encountered during compilation. That's a lot of bookkeeping, all right. I believe that under your system, developers will expend less effort and get more reliable results. And if something goes wrong, it will be easy to tell – there will massive catastrophic link errors.
        Hide
        Nick Wellnhofer added a comment -

        I committed the latest patchset with some renamed variables to trunk.

        Show
        Nick Wellnhofer added a comment - I committed the latest patchset with some renamed variables to trunk.
        Hide
        Nick Wellnhofer added a comment - - edited

        I came to the conclusion that using per-parcel defines which resolve to either CHY_EXPORT or CHY_IMPORT might be a better idea after all. Especially for the C bindings, I'd prefer CFC to generate deterministic output. Otherwise, we'd have to generate a second set of header files to compile the test executables (at least on Windows) or for installation.

        My proposal is to use a per-parcel define like LUCY_VISIBLE for externally visible symbols and to put the following #ifdef in parcel.h:

        #ifdef CFP_LUCY
          #define LUCY_VISIBLE CHY_EXPORT
        #else
          #define LUCY_VISIBLE CHY_IMPORT
        #endif
        

        We'd need a similar #ifdef for every included parcel but this can be done later. Then I'd define CFP_LUCY via compiler flags only when compiling Lucy.

        I'm only not sure what the best name for these macros would be. For example, instead of LUCY_VISIBLE, we could also use LUCY_PUBLIC or LUCY_EXTERN.

        Show
        Nick Wellnhofer added a comment - - edited I came to the conclusion that using per-parcel defines which resolve to either CHY_EXPORT or CHY_IMPORT might be a better idea after all. Especially for the C bindings, I'd prefer CFC to generate deterministic output. Otherwise, we'd have to generate a second set of header files to compile the test executables (at least on Windows) or for installation. My proposal is to use a per-parcel define like LUCY_VISIBLE for externally visible symbols and to put the following #ifdef in parcel.h: #ifdef CFP_LUCY #define LUCY_VISIBLE CHY_EXPORT #else #define LUCY_VISIBLE CHY_IMPORT #endif We'd need a similar #ifdef for every included parcel but this can be done later. Then I'd define CFP_LUCY via compiler flags only when compiling Lucy. I'm only not sure what the best name for these macros would be. For example, instead of LUCY_VISIBLE, we could also use LUCY_PUBLIC or LUCY_EXTERN.
        Hide
        Marvin Humphrey added a comment -

        +1 for LUCY_VISIBLE. There may be some times when we export non-public symbols as an implementation detail – marking them as PUBLIC might cause confusion.

        Thank you for working hard to make CFC's code output deterministic; it's probably going to spare us from some baffling bug hunts.

        Show
        Marvin Humphrey added a comment - +1 for LUCY_VISIBLE. There may be some times when we export non-public symbols as an implementation detail – marking them as PUBLIC might cause confusion. Thank you for working hard to make CFC's code output deterministic; it's probably going to spare us from some baffling bug hunts.
        Hide
        Nick Wellnhofer added a comment -

        The symbol visibility changes have been done.

        Show
        Nick Wellnhofer added a comment - The symbol visibility changes have been done.

          People

          • Assignee:
            Nick Wellnhofer
            Reporter:
            Nick Wellnhofer
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development