Commons Email
  1. Commons Email
  2. EMAIL-92

Improve support to embed images in HTML eMails

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.3
    • Labels:
      None

      Description

      I have created a improvement on top of HtmlEmail class which automatically detects <img src=".."/> tags in the HTML source and which embeds all the URLs correctly in the email so they are sent as attachments as part of the email.

      It is implemented as new class ImageHtmlEmail, no existing code of commons email is touched at all.

      This make sending HTML Mails much easier, now it is as simple as

      ImageHtmlEmail email = new ImageHtmlEmail();
      email.setHostName(SMTP_HOST);
      email.addTo(EMAIL_TO, "DS");
      email.setFrom(EMAIL_FROM, "Me");
      email.setSubject(EMAIL_SUBJECT);

      String html = FileUtils.readFileToString(new File("/tmp/test_email.html"));

      // set the html message
      email.setHtmlMsg(html, new File("/tmp"));

      // set the alternative message
      email.setTextMsg("Your email client does not support HTML messages");

      // send the email
      email.send();

      as stated in the mail, I am making this availalbe under the Apache License 2.0 so that it can be integrated if it sounds useful.

      1. ImageHtmlEmail.java
        4 kB
        Dominik Stadler
      2. ImageHtmlEmail.java
        4 kB
        Dominik Stadler
      3. EMAIL-92-with-test.patch
        41 kB
        Dominik Stadler

        Activity

        Hide
        Siegfried Goeschl added a comment -

        Is now part of commons-email

        Show
        Siegfried Goeschl added a comment - Is now part of commons-email
        Hide
        Siegfried Goeschl added a comment -

        I applied the patch and this patch also fixed a broken test (which was commented out) - looks good to me ...

        Show
        Siegfried Goeschl added a comment - I applied the patch and this patch also fixed a broken test (which was commented out) - looks good to me ...
        Hide
        Dominik Stadler added a comment - - edited

        Looks better now. Here another small patch, it seems the regex was corrupted with the last checkin, there is now a backslash missing in front of "s":

        ### Eclipse Workspace Patch 1.0
        #P commons-email-trunk
        Index: src/java/org/apache/commons/mail/ImageHtmlEmail.java
        ===================================================================
        --- src/java/org/apache/commons/mail/ImageHtmlEmail.java	(revision 963592)
        +++ src/java/org/apache/commons/mail/ImageHtmlEmail.java	(working copy)
        @@ -54,11 +55,11 @@
              * newlines on any place, HTML is not case sensitive and there can be
              * arbitrary text between "IMG" and "SRC" like IDs and other things.
              */
        -    public static final String REGEX_IMG_SRC = "(<[Ii][Mm][Gg]\\s*[^>]*?\\s+[Ss][Rr][Cc]\\s*=s*[\"'])([^\"']+?)([\"'])";
        +    public static final String REGEX_IMG_SRC = "(<[Ii][Mm][Gg]\\s*[^>]*?\\s+[Ss][Rr][Cc]\\s*=\\s*[\"'])([^\"']+?)([\"'])";
         
             public static final String REGEX_SCRIPT_SRC = "(<[Ss][Cc][Rr][Ii][Pp][Tt]\\s*.*?\\s+[Ss][Rr][Cc]\\s*=\\s*[\"'])([^\"']+?)([\"'])";
         
        -    // this pattern looks for the HTML imgage tag which indicates embedded images,
        +    // this pattern looks for the HTML image tag which indicates embedded images,
             // the grouping is necessary to allow to replace the element with the CID
             protected static final Pattern pattern = Pattern.compile(REGEX_IMG_SRC);
         
        Show
        Dominik Stadler added a comment - - edited Looks better now. Here another small patch, it seems the regex was corrupted with the last checkin, there is now a backslash missing in front of "s": ### Eclipse Workspace Patch 1.0 #P commons-email-trunk Index: src/java/org/apache/commons/mail/ImageHtmlEmail.java =================================================================== --- src/java/org/apache/commons/mail/ImageHtmlEmail.java (revision 963592) +++ src/java/org/apache/commons/mail/ImageHtmlEmail.java (working copy) @@ -54,11 +55,11 @@ * newlines on any place, HTML is not case sensitive and there can be * arbitrary text between "IMG" and "SRC" like IDs and other things. */ - public static final String REGEX_IMG_SRC = "(<[Ii][Mm][Gg]\\s*[^>]*?\\s+[Ss][Rr][Cc]\\s*=s*[\" '])([^\ "']+?)([\" '])"; + public static final String REGEX_IMG_SRC = "(<[Ii][Mm][Gg]\\s*[^>]*?\\s+[Ss][Rr][Cc]\\s*=\\s*[\" '])([^\ "']+?)([\" '])"; public static final String REGEX_SCRIPT_SRC = "(<[Ss][Cc][Rr][Ii][Pp][Tt]\\s*.*?\\s+[Ss][Rr][Cc]\\s*=\\s*[\" '])([^\ "']+?)([\" '])"; - // this pattern looks for the HTML imgage tag which indicates embedded images, + // this pattern looks for the HTML image tag which indicates embedded images, // the grouping is necessary to allow to replace the element with the CID protected static final Pattern pattern = Pattern.compile(REGEX_IMG_SRC);
        Hide
        Siegfried Goeschl added a comment -

        Fixed the replacement of identical image resources

        Show
        Siegfried Goeschl added a comment - Fixed the replacement of identical image resources
        Hide
        Siegfried Goeschl added a comment -

        I have a look at it ...

        Show
        Siegfried Goeschl added a comment - I have a look at it ...
        Hide
        Dominik Stadler added a comment -

        Some items that I found:

        • Currently you only replace img-tags under the following condition:
           
          if(!this.inlineEmbeds.containsKey(imageDataSource.getName()))
          

        what if the same image is included in the HTML multiple times? With this check it would only be replaced once and then not any more? It seems the embed() call performs this check internally as well and returns the same cid for existing images, so it would be save to call embed() with every DataSource that we find.

        • I enhanced the regex slightly to cover cases that we handled incorrectly before, the full regex is now:
          public static final String REGEX_IMG_SRC = "(<[Ii][Mm][Gg]\\s*[^>]?\\s+[Ss][Rr][Cc]\\s=
          s*[\"'])([^\"']+?)([\"'])";

        The Following addition to the regex-test verifies that the regex is working with the newly discovered cases:

        		// had a problem with multiple img-source tags
        		matcher = pattern
        				.matcher("<img src=\"file1\"/><img src=\"file2\"/>");
        		assertTrue(matcher.find());
        		assertEquals("file1", matcher.group(2));
        		assertTrue(matcher.find());
        		assertEquals("file2", matcher.group(2));
        
        		matcher = pattern
        				.matcher("<img src=\"file1\"/><img src=\"file2\"/><img src=\"file3\"/><img src=\"file4\"/><img src=\"file5\"/>");
        		assertTrue(matcher.find());
        		assertEquals("file1", matcher.group(2));
        		assertTrue(matcher.find());
        		assertEquals("file2", matcher.group(2));
        		assertTrue(matcher.find());
        		assertEquals("file3", matcher.group(2));
        		assertTrue(matcher.find());
        		assertEquals("file4", matcher.group(2));
        		assertTrue(matcher.find());
        		assertEquals("file5", matcher.group(2));
        
        		// try with invalid HTML that is seens sometimes, i.e. without closing "/" or "</img>"
        		matcher = pattern
        				.matcher("<img src=\"file1\"><img src=\"file2\">");
        		assertTrue(matcher.find());
        		assertEquals("file1", matcher.group(2));
        		assertTrue(matcher.find());
        		assertEquals("file2", matcher.group(2));
        
        Show
        Dominik Stadler added a comment - Some items that I found: Currently you only replace img-tags under the following condition: if (! this .inlineEmbeds.containsKey(imageDataSource.getName())) what if the same image is included in the HTML multiple times? With this check it would only be replaced once and then not any more? It seems the embed() call performs this check internally as well and returns the same cid for existing images, so it would be save to call embed() with every DataSource that we find. I enhanced the regex slightly to cover cases that we handled incorrectly before, the full regex is now: public static final String REGEX_IMG_SRC = "(< [Ii] [Mm] [Gg] \\s* [^>] ?\\s+ [Ss] [Rr] [Cc] \\s = s* [\"'] )( [^\"'] +?)( [\"'] )"; The Following addition to the regex-test verifies that the regex is working with the newly discovered cases: // had a problem with multiple img-source tags matcher = pattern .matcher( "<img src=\" file1\ "/><img src=\" file2\ "/>" ); assertTrue(matcher.find()); assertEquals( "file1" , matcher.group(2)); assertTrue(matcher.find()); assertEquals( "file2" , matcher.group(2)); matcher = pattern .matcher( "<img src=\" file1\ "/><img src=\" file2\ "/><img src=\" file3\ "/><img src=\" file4\ "/><img src=\" file5\ "/>" ); assertTrue(matcher.find()); assertEquals( "file1" , matcher.group(2)); assertTrue(matcher.find()); assertEquals( "file2" , matcher.group(2)); assertTrue(matcher.find()); assertEquals( "file3" , matcher.group(2)); assertTrue(matcher.find()); assertEquals( "file4" , matcher.group(2)); assertTrue(matcher.find()); assertEquals( "file5" , matcher.group(2)); // try with invalid HTML that is seens sometimes, i.e. without closing "/" or "</img>" matcher = pattern .matcher( "<img src=\" file1\ "><img src=\" file2\ ">" ); assertTrue(matcher.find()); assertEquals( "file1" , matcher.group(2)); assertTrue(matcher.find()); assertEquals( "file2" , matcher.group(2));
        Hide
        Siegfried Goeschl added a comment -

        Added the reworked contribution to SVN.

        Show
        Siegfried Goeschl added a comment - Added the reworked contribution to SVN.
        Hide
        Siegfried Goeschl added a comment -

        I nearly finished my work on your code - I leave a quick note when I commit my changes. Have a look at it if everything is fine and hopefully I did not break your usecases

        Show
        Siegfried Goeschl added a comment - I nearly finished my work on your code - I leave a quick note when I commit my changes. Have a look at it if everything is fine and hopefully I did not break your usecases
        Hide
        Dominik Stadler added a comment -

        you are right about &, it needs to be decoded when we read the src-tag, I will look into ways to do general HTML decoding at that point

        for URL, this would reqiure an additional parameter for baseURL to allow us to construct a valid URL here as the HTML only contains a relative adress here. We should probably put this into a second method to have the possibility to provide it or not.

        Show
        Dominik Stadler added a comment - you are right about &, it needs to be decoded when we read the src-tag, I will look into ways to do general HTML decoding at that point for URL, this would reqiure an additional parameter for baseURL to allow us to construct a valid URL here as the HTML only contains a relative adress here. We should probably put this into a second method to have the possibility to provide it or not.
        Hide
        Siegfried Goeschl added a comment -

        I wrote a few additional tests to check if I can easily create an email from my Sonar installation which is currently not possible

        Here the image markup from http://nemo.sonarsource.org/project/index/51834

        "<img alt=\"Chart?ck=xradar&w=120&h=120&c=7fff00|7fff00&m=4&g=0\" src=\"/chart ck=xradar&w=120&h=120&c=7fff00|7fff00&m=4&g=0.2&l=A,C,S,T&v=3.0,3.0,2.0,2.0\"");

        which should result in the following URL

        http://nemo.sonarsource.org/chart?ck=xradar&w=120&h=120&c=7fff00|7fff00&m=4&g=0.2&l=A,C,S,T&v=3.0,3.0,2.0,2.0

        but throws the following exception

        SEVERE: Could not download URL: /chart?ck=xradar&w=120&h=120&c=7fff00|7fff00&m=4&g=0.2&l=A,C,S,T&v=3.0,3.0,2.0,2.0 Exception: no protocol: /chart?ck=xradar&w=120&h=120&c=7fff00|7fff00&m=4&g=0.2&l=A,C,S,T&v=3.0,3.0,2.0,2.0

        Basically I see two problems

        +) creating the correct URL
        +) resolve the "&amp"

        Can you get in touch with me since I'm working on a modified code base (I will attach it as patch)

        Show
        Siegfried Goeschl added a comment - I wrote a few additional tests to check if I can easily create an email from my Sonar installation which is currently not possible Here the image markup from http://nemo.sonarsource.org/project/index/51834 "<img alt=\"Chart?ck=xradar&w=120&h=120&c=7fff00|7fff00&m=4&g=0\" src=\"/chart ck=xradar&w=120&h=120&c=7fff00|7fff00&m=4&g=0.2&l=A,C,S,T&v=3.0,3.0,2.0,2.0\""); which should result in the following URL http://nemo.sonarsource.org/chart?ck=xradar&w=120&h=120&c=7fff00 |7fff00&m=4&g=0.2&l=A,C,S,T&v=3.0,3.0,2.0,2.0 but throws the following exception SEVERE: Could not download URL: /chart?ck=xradar&w=120&h=120&c=7fff00|7fff00&m=4&g=0.2&l=A,C,S,T&v=3.0,3.0,2.0,2.0 Exception: no protocol: /chart?ck=xradar&w=120&h=120&c=7fff00|7fff00&m=4&g=0.2&l=A,C,S,T&v=3.0,3.0,2.0,2.0 Basically I see two problems +) creating the correct URL +) resolve the "&amp" Can you get in touch with me since I'm working on a modified code base (I will attach it as patch)
        Hide
        Dominik Stadler added a comment -

        Hi Siegfried, the latest attachment contains a full patch with a small addition to have <script>-elements also included and a helper method to extract all images from html without sending an email.

        There are also test-cases included that cover the class. One of the cases verifies that you can use something like <IMG SRC="images/somepict.png"/> in the HTML, you only need to pass the correct "base directory" for this to work, i.e. the parent directory of the "images" directory in this case as the code combines the specified base-directory and the image-src when looking for the files.

        Show
        Dominik Stadler added a comment - Hi Siegfried, the latest attachment contains a full patch with a small addition to have <script>-elements also included and a helper method to extract all images from html without sending an email. There are also test-cases included that cover the class. One of the cases verifies that you can use something like <IMG SRC="images/somepict.png"/> in the HTML, you only need to pass the correct "base directory" for this to work, i.e. the parent directory of the "images" directory in this case as the code combines the specified base-directory and the image-src when looking for the files.
        Hide
        Dominik Stadler added a comment -
        • updated code for ImageHtmlEmail.java, we now also support adding script-elements in addition to image-files.
        • added full unit test case

        this is a patch that applies against the latest commons-email from trunk, only thing missing is updating maven build script as I used an eclipse project for building/executing.

        Show
        Dominik Stadler added a comment - updated code for ImageHtmlEmail.java, we now also support adding script-elements in addition to image-files. added full unit test case this is a patch that applies against the latest commons-email from trunk, only thing missing is updating maven build script as I used an eclipse project for building/executing.
        Hide
        Siegfried Goeschl added a comment -

        Is there a way to embed relative referenced images , e.g. <img src="images/logo.png" alt="" /> - as far as I understand it the code does not pick up those images ...

        Siegfried Goeschl

        Show
        Siegfried Goeschl added a comment - Is there a way to embed relative referenced images , e.g. <img src="images/logo.png" alt="" /> - as far as I understand it the code does not pick up those images ... Siegfried Goeschl
        Hide
        Siegfried Goeschl added a comment -

        Hi Dominik,

        do you have any regression tests for the code?

        Siegfried Goeschl

        Show
        Siegfried Goeschl added a comment - Hi Dominik, do you have any regression tests for the code? Siegfried Goeschl
        Hide
        Dominik Stadler added a comment -

        During testing I found some instances of urls that were not catched correctly because the regular expression was not taking more cases into account, it also did not behave nicely on empty input html. The attached updated version fixes these two issues.

        Show
        Dominik Stadler added a comment - During testing I found some instances of urls that were not catched correctly because the regular expression was not taking more cases into account, it also did not behave nicely on empty input html. The attached updated version fixes these two issues.
        Hide
        Dominik Stadler added a comment -

        New class ImageHtmlEmail to automatically embed images from HTML emails.

        Show
        Dominik Stadler added a comment - New class ImageHtmlEmail to automatically embed images from HTML emails.

          People

          • Assignee:
            Siegfried Goeschl
            Reporter:
            Dominik Stadler
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development