If you are reading this, please get modern browser.
skip to main content | skip to main navigation | skip to secondary content

Perfect Third Party Ads

~ 31st August 2005. · 15:39 CET · permanent link · printer friendly ~

Everyone who ever worked on a high-profile web site with web standards know there’s inevitable moment when the site has to go to its’ new owners. More than often we fall in love with the site’s tight structure and semantic markup. My heart breaks at that right moment when I have to insert a third-party advertisement script, which I know in advance is a tag soup. But can we do something about it?

Lucky for Us, We Have Regular Expressions

With regular expressions and a bit of PHP we can clean HTML junk in the third-party scripts and turn it into nice, tidy markup. I assume that you have at least adequate knowledge of PHP.

Let’s see the typical piece of HTML we often get from an ad supplier:

<style>
a { COLOR: RED; BACKGROUND: GREEN; }
a:hover { COLOR: YELLOW; BACKGROUND: ORANGE; }
</style>
<table border=0 width=100% background=#FFFFF>
<tr>
<td><a href="http://somesite.com/">ad link 1</a></td>
</tr>
<tr>
<td><a href="http://somesite.com/">ad link 2</a></td>
</tr>
<tr>
<td><a href="http://somesite.com/">ad link 3</a></td>
</tr>
</table>

The above is most likely to be inserted somewhere under vertical navigation or within the rest ugly advertisement kids. The <style> tags shouldn’t be anywhere else than inside the <head> tags. I also personally don’t like that table, so we’ll attempt to transform it into a much more appropriate unordered list. But I suppose that’s just my weird taste.

Clean it While It’s in the Buffer

I suppose if you’re working on some content-driven, large-scale, high-profile web site, you output everything from a buffer, but that’s a discussion for some other occasion. It is the best to use output buffering to clean bad markup before it’s sent to the browser.

Careful Planning is the Key

First, let’s make a new structure. To make the HTML semantically correct we should place the above sample links into an unordered list. The final markup should look like the following:

<div id="advertisementId" class="ads">
<ul>
<li><a href="http://somesite.com/">ad link 1</a></li>
<li><a href="http://somesite.com/">ad link 2</a></li>
<li><a href="http://somesite.com/">ad link 3</a></li>
</ul>
</div>

The CSS should be append to your main CSS file or better yet (and if the ad supplier permits this kind of modification of the ads), you or your team designer should style ad links according to the site’s main look and feel. It’s still very rarely the case, but lately it’s getting better, especially with the clients who begin to understand that what attracts visitors is the web site’s content.

If you start replacing without the major plan, you’ll probably end up spending too much of your valuable hours, which could have been spent on cross-browser debugging or accessibility improvements, to name a few.

In PHP, strings are replaced with two very powerful functions: the str_replace() and the preg_replace(). The former is useful for small chunks and it takes advantage over the later with its’ speed. preg_replace() deals well with regular expression patterns, but is also more intensive for the server processor – it’s something you don’t want to play with on the popular web site. However, if applied carefully, it shouldn’t affect the performance.

Step By Step Replacement

First thing’s first – let’s remove white space between HTML elements – it will save us a lot of trouble later:

$content = preg_replace('/>(\s|\n|\r)*<si' , '><', $content);

The next step is also very simple – we’ll remove everything within <style> and </style>, including those two.

$content = preg_replace('/<style.*?style>/si', '', $content);

After we got rid of the improperly placed <style> element, we should remove <table> tags and place all those table rows into a division. At this point we can also define an id and a class attribute for that division and also add the unordered list for the list items.

$content = preg_replace('/<table.*?>(.*)?<\/table>/si', '<div id="advertisementId" class="ads"><ul>$1</ul></div>', $content);

We simply pulled everything inside the <table></table> and pushed it into a <div id="advertisementId" class="ads"><ul></ul></div>. This is stil pretty untasty tag soup, but the only thing that’s left to be made is transforming each table row into a list item…

$content = preg_replace('/<tr><td>/si', '<li>', $content);
$content = preg_replace('/<\/td><\/tr>/si', '</li>', $content);

… and there it is – a perfectly tight markup. Below is the complete code, which you can copy to a file with a .php extension and try it at the safety of your home or office:

<?php
function clean_HTML($content) {
   $content = preg_replace('/>(\s|\n|\r)*<si' , '><', $content);
   $content = preg_replace('/<style.*?style>/si', '', $content);
   $content = preg_replace('/<table.*?>(.*)?<\/table>/si', '<div id="advertisementId" class="ads"><ul>$1</ul></div>', $content);
   $content = preg_replace('/<tr><td>/si', '<li>', $content);
   $content = preg_replace('/<\/td><\/tr>/si', '</li>', $content);
   return $content;
}
ob_start('clean_HTML');
?>

<style> a { COLOR: RED; BACKGROUND: GREEN; } a:hover { COLOR: YELLOW; BACKGROUND: ORANGE; } </style> <table border=0 width=100% background=#FFFFF> <TR> <TD><a href="http://somesite.com/">ad link 1</a></td> </tr> <tr> <td><a href="http://somesite.com/">ad link 2</a></td> </tr> <tr> <td><a href="http://somesite.com/">ad link 3</a></td> </tr> </table>

6 Comments

  1. Careful when doing this - usually it’s a violation of an advertisers terms - I myself have been denied advertising fees in the past because of modifying an advertiser’s markup.

  2. @Ryan: Of course, doing such intervention should be approved by feed supplier.

    @Terry: If the client is the one who has the final word (between the client and advertiser), then there’s usually no problem about that, since our clients mostly have a fair share of confidence in our competence.

  3. Exactly Marko! Indeed, I was not suggesting it was a problem, nor did I think for a moment that it had anything to do with competence, especially for your company.

    Thanks again for the article.

  4. Sweet. Nice article. :)

    I’d been thinking about this just the other day.. but i lacked the regexp voodoo skills. So, thanks Marko. ;)

    Have any of you guys tried this with Google AdSense? Do you know if it will mess with their rewards?

  5. Since Google Adsa are served via JavaScript, the method described is not possible.

  6. Actually, it is possible, but you have to fetch the JS file and parse it with PHP. There’s no difference if you include .txt or .js if you know what to do with it. It’s done here and it works : ).

Sorry, the comment form is closed at this time, but if you have anything to say, please send me a message.

* Please keep in mind that this is a personal web site and it does not reflect the position or opinion of my respective employers, organizations or partners.

Typetester – compare screen type Supported by Veer.

What is this?

A web log of Marko Dugonjić, web professional from Croatia. Topics covered:

Translate this site

German, Spanish, Italian, French or Japanese (via).

See you there!

Feel like buying a book?

Try with maratz.com aStore

Worth visiting

top of the page | skip to main content | skip to main navigation | skip to secondary content