sklar.com

...composed of an indefinite, perhaps infinite number of hexagonal galleries...

© 1994-2017. David Sklar. All rights reserved.

DAD: Extended Abstract

Introduction

At Student.Net, Perl serves millions of ad impressions every month with DAD, a mod_perl application with advanced targeting capabilities, a comprehensive administrative interface, and automatic client reporting.

An ad server is a revenue-generating, business-critical application. It may not surprise anyone at the Perl Conference to learn that Perl functions robustly and admirably under those conditions, but every so often, the rest of the world needs a reminder. Mod_perl's architecture gives this critical application the performance it needs.

Most comprehensive ad servers are very expensive and involve not only large start-up fees, but ongoing "support" contracts. Additionally, because they are proprietary commercial products, one has no way of finding out why a bug exists and no control over how long it will take to fix. Plus, while everyone wants to tie their ad server to their user databases to have custom targeted ads, no commercial product allows this without some moderately-to-severely complicated C or Java programming. The cycle to create this targeting of "code, compile-as-shared-library-or-class, restart server, test, hunt down error, re-code" does not make for very efficient ad server customization.

Fortunately, Perl can save the day. DAD is free, open-source, customizable, and portable. It requires mod_perl and apache, but if embedded Perl interpreters emerge for other webservers, it shouldn't be too difficult to use them with DAD. It runs on any of the billion platforms that Perl does, and can use for its backend any of the databases with DBI drivers. If you move your site from Linux and MySQL to Solaris and Oracle you only need to change one line in DAD's configuration.

From a technical perspective, because DAD is a mod_perl application, it does some things better than CGI, and does other things that CGI just can't do. Because CGI processes only live as long as the request that created them does, they need to establish a new database connection each time they are executed. Even on databases with quick connection times, like MySQL, this can lead to a lot of TCP sockets hanging around in TIME_WAIT. With a database that has a slower connection startup time, this delay can make it impossible to server a large number of requests. Apache::DBI manages per-process DBI connections that persist for the lifetime of an apache child process, eliminating the problems of establishing a new database connection for each request.

DAD also takes advantage of apache's request chain to make targeting decisions about multiple ads on the same page. Each ad insertion on a page is a subrequest of the main request for the page. If called with an "exclude" flag, DAD can pass information between individual ad insertions through the main request to ensure that the same ad does not appear on the same page more than once. This technique can also be applied to entire ad campaigns. Sometimes, advertisers will demand that their ads not run on the same page as a competitor's ads. If these two campaigns are marked as "enemies", DAD will use this exclude functionality to prevent ads from the two campaigns from appearing on the same page.

Mod_perl also gives DAD the speed of a compiled-in C module, but with much greater customizability and extensibility. New targeting dimensions are only a matter of a few lines of Perl and a column in a table. The functions that DAD uses to identify users and sessions for targeting are abstracted out from DAD's core and can be easily changed. Plus, the rules that DAD uses to determine what ads to serve in fatal conditions (e.g. loss of database connectivity) can be simply altered and expanded.

Organizing Ads

To understand how DAD functions, it is necessary to understand how it organizes ads. Each ad belongs to one campaign and zero or more groups. Campaign is an advertiser-centric collection: all ads from a given advertiser will normally be in the same campaign. Group is a site-centric collection: all ads that could appear in a particular spot on the site will normally be in the same group.

When DAD is invoked from a page, it is passed a group name. The ad selected will generally be chosen from that group. Multiple invocations of DAD from the same page can request ads from the same or different group.

For example, if Citibank has five different creatives that they want to run on your site, each of those ads would be in the 'citibank' campaign. If Citibank wants to run the ads on your sports and your entertainment pages, you would put each of those ads in both your sports and entertainment groups. On your sports pages, you would ask DAD for ads from the sports group, and on your entertainment pages, you would ask DAD for ads from the entertainment group.

Delivering an ad

Ads are inserted into pages through a virtual subrequest. With SSI, this looks like

<!--#include virtual="/dad/ad?group=groupname" -->
in PHP, this looks like
<?php virtual("/dad/ad?group=groupname"); ?>
Where /dad is the <Location> that DAD has been assigned in the server's configuration. This command will select an ad from those that belong to group groupname and are eligible to be served. Eligibility means:
  1. The current time is after the ad's run start date and before the ad's run end date
  2. The ad has some remaining daily impressions
  3. The ad is not disqualified by any additional targeting dimensions (browser, domain name, etc.)

An ad is selected for delivery out of the pool of eligible ads in proportion relative to the ratio of its remaining daily impressions to the total remaining daily impressions of the eligible ads.

Each ad's remaining daily impressions get recalculated once a day. An ad's remaining daily impressions is set to the total number of remaining impressions for the ad divided by the number of days remaining in the ad's run. This value is used for weighted targeting instead of just impressions remaining in the ad run to provide some even distribution of an ad's delivery over its entire run.

An example: there are three ads in the library group. Ads 1 and 2 run from September 1 to December 1 and ad 3 runs from October 1 to November 1. Today is October 15 and at the time of recalculating each ad's remaining daily impressions, Ad 1 has 90,000 total remaining impresions, Ad 2 has 45,000 total remaining impressions and Ad 3 has 30,000 total remaining impressions. Each ad's remaining daily impressions will get calculated as follows:

So when DAD is commanded to deliver an ad from group library, if no other targeting dimensions are involved, 40% of the time it will choose ad 1, 40% of the time it will choose ad 2, and 20% of the time it will choose ad 2. Note that since each ad's RDI is decremented by 1 each time the ad is delivered, these ratios will fluctuate throughout the day.

Targeting Dimensions

In addition to weighting ads by how many impressions they have remaining, DAD's ad selection process can be influences by additional factors, or dimensions. Before DAD makes its choice of ads based on their remaining impressions it checks each ad against each targeting dimension and makes an ad ineligible to be chosen if it has the wrong value for a dimension.

Two of the targeting dimensions DAD has are Student.Com-specific: PARTNER and LOCAL. We operate different versions of our site for different content partners and for different localities. The <VirtualHost> definitions for each of these site versions set the environment variables PARTNER and LOCAL. Ads can be targeted to appear only on one or more versions of the site based on these variables. An ad's value for this dimension is stored in the database as a 4-byte value - if a bit is turned on, that means the ad is eligible to be served for that PARTNER or LOCAL version of the site. The following code checks whether an ad is eligible for a given PARTNER version of the site. ($ad_partner is the ad's value for this targeting dimension that has been retrieved from the database).

    # DIMENSION: PARTNER
    # unless the $PARTNERS{$ENV{'PARTNER'}} bit is set in $ad_partner or $ad_partner is 0, drop this ad
	if ($ad_partner) {
	    unless ($ad_partner & $PARTNERS{$ENV{'PARTNER'}}) {
		$r->log_error("DAD:DIMENSION:PARTNER: $ads[$i]{id} out ($ad_partner v. $PARTNERS{$ENV{'PARTNER'}})") if $DAD::config::DEBUG;
		$ads[$i]{id} = -1;
	    }
        }

Targeting by browser uses a helper function to parse User-Agent strings:

    # DIMENSION: BROWSER
	    # Does this user have the right browser?
	    if ($ad_browser) {
		my $ua = get_browser($r->header_in('User-Agent'));
		unless ($ad_browser == $BROWSERS{$ua}) {
		    $r->log_error("DAD:DIMENSION:BROWSER: $ads[$i]{id} out ([$ua] => $BROWSERS{$ua} vs. $ad_browser)") if $DAD::config::DEBUG;
		    $ads[$i]{id} = -1;
		}
	    }

[ . . . ]

sub get_browser {
    my $ua = shift || return 'Other';

    # these regexes are listed in order of probability of occurrence
    # so that we do as few tests as possible
    if (($ua =~ m{^Mozilla}i) && !($ua =~ m{\(compatible;}i)) { return 'Netscape'; }
    if ($ua =~ m{\(compatible; MSIE}i)                        { return 'Microsoft'; }
    if ($ua =~ m{^Microsoft Internet Explorer}i)              { return 'Microsoft'; }
    if ($ua =~ m{\(compatible; Opera}i)                       { return 'Opera'; }

    return 'Other';
}

After the targeting dimension checking has been completed, DAD makes an array of just the ads that are still eligible to be served:

   # remove the inappropriate ads
    my @goodAds = sort { my %c = %$a; my %d = %$b; $c{id} <=> $d{id};} @ads;
    $i = 0; while ($goodAds[$i]{id} < 0) {
	$r->log_error("DAD: Skipping position $i with negative id") if $DAD::config::DEBUG;
	$i++;
    }
    @ads = splice(@goodAds,$i);

Limiting Users' Exposure To Ads

Clickthrough rates decline pretty steadily as a user sees the same banner ad over and over again. Advertisers like to be able to limit the number of times that a specific user will see a specific banner. This requires more work by the advertiser -- they need to provide more different banners to your site -- but it can increase clickthroughs. DAD can limit the number of times a given user is exposed to a specific banner.

This capability depends on each user carrying around some kind of session ID. At Student.Com, we use a cookie -- signed in users carry around an identifier, and non-signed in users are assigned an anonymous identifier. The method that you use to assign identifiers to users is external to and separate from DAD. To adapt DAD to your user identifier scheme, you need to modify the functions get_session_id() and is_valid_session(). get_session_id() is passed the Request structure and needs to return the user's session ID. You have access to everything in the request structure -- cookies, headers, notes, etc. -- to tie into your site's session ID scheme. is_valid_session() is passed a session ID and returns 1 if this session ID represents an actual, signed-in user, and 0 if this session ID represents an anonymous user.

If an ad has an exposure limit set, DAD retrieves the current session ID and attempts to find out how many times the current session ID has seen each of the possible ads. If that value exceeds the exposure count, the ad is marked as ineligible. When DAD eventually picks the actual ad to serve, it increments the session ID's count of times that ad has been viewed.

Targeting Ads by Domain Name

DAD can also target ads by the host or domain name of the computer requesting the ad. An ad can be tagged with a regular expression to ensure that the ad only gets delivered to computers with a hostname that matches the regular expression.

Because doing DNS lookups can be slow, DAD uses subrequests and notes to minimize the number of lookups required. If there are multiple ads on the same page, the first ad will lookup the hostname of the client and then store in the notes hash of the main request. Each subsequent ad on the same page, as a subrequest of the main request for the page, will check that notes field before actually doing a DNS lookup, ensuring that the DNS lookup happens only once per page, independent of how many ads there are on the page.

Excluding Ads

DAD takes advantage of mod_perl's subrequest and notes mechanisms to provide a powerful feature: ad exclusion. DAD can prevent the same ad from being shown on one page more than once and can prevent ads from certain campaigns from appearing on the same page.

To ensure that ads on the same page are all different from each other, DAD is called with /exclude appended to its path for each ad:

     <!--#include virtual="/dad/ad/exclude?group=sports" -->
     <!--#include virtual="/dad/ad/exclude?group=sports" -->
Each virtual include is a subrequest of the request for the page that contains them. So, if it is in exclude mode, before it deals with any ads, DAD retrieves the contents of the main request's excludeID notes field:
  if ($fExclude) {
	unless ($r->is_main) {
	    foreach $i (split(',',$r->main->notes("excludeID"))) {
		$excludeIDs{$i} = 1;
		$r->log_error("DAD: Excluding $i") if $DAD::config::DEBUG;
	    }
	}
    }

then, when retrieving possible ads from the database, it ignores ads that have already been shown:

    while (($i) = $sth->fetchrow_array) {
	next if ($fExclude && $excludeIDs{$i});
	push(@ads, { id => $i });
	$r->log_error("DAD: Considering $i") if $DAD::config::DEBUG;
    }

and after delivering an ad, it ads that ad's id to the list of ads that should be excluded:

  if ($fExclude) {
     $excludeIDs{$which} = 1;
     unless ($r->is_main) {
	$r->main->notes("excludeID",join(',',keys %excludeIDs));
     }
  }

This technique is extended to exclude campaigns that are marked as enemies of each other. Frequently, we run ads from competitors simultaneously. AT&T and Sprint have run ads during the same month on Student.Com; so have Visa and American Express. Each is unhappy if their ad runs on the same page as their competitor's. By marking the Visa and American Express campaign as enemies of each other, DAD ensures that their ads never run together, whether or not is it given an /exclude flag. The procedure is similar to excluding single ads. First, the enemies list is retrieved from the main request's enemies notes field:

   # grab the enemies list out of the notes
    unless ($r->is_main) {
	foreach $i (split(',',$r->main->notes('enemies'))) {
	    $enemies{$i} = 1;
	    $r->log_error("DAD: Got Enemy $i") if $DAD::config::DEBUG;
	}
    }

and then, just before each ad's targeting dimensions are analyzed, a check is made to see if that ad belongs to any enemy campaigns ($ad_campaign is the campaign of the ad being considered):

    # is this ad in an enemy campaign?
    if ($enemies{$ad_campaign}) {
	$r->log_error("DAD:ENEMY $ads[$i]{id} in $ad_campaign") if $DAD::config::DEBUG;
	$ads[$i]{id} = -1;
    }

After an ad is delivered, its campaign is added to the main request's enemies list ($enemy_campaign is the campaign of the ad that was selected to be delivered):

   unless ($r->is_main) {
	$enemies{$enemy_campaign} = 1;
	my $enemies_list = join(',',keys %enemies);
	$r->log_error("DAD: Enemies list is [$enemies_list]") if $DAD::config::DEBUG;
	$r->main->notes('enemies',$enemies_list);
    }

Error Recovery

DAD has two levels of error recovery to ensure that an appropriate an ad as possible gets served if something goes wrong. Default ads are served in circumstances where targeting dimensions or remaining impressions make in impossible to fulfill an ad request properly. Fatal ads are served when DAD can't communicate with the database or there are other external errors.

Ads can be individally marked as default. If all ads in a given group have used up all their remaining daily impressions, then DAD tries to make a targeting decision based on the ads' total remaining impressions. If no ads in a group have any remaining impressions, then DAD will randomly select one of the ads in the group that are marked as default. This selection ignores any targeting paramenters that the ads may have set. It is DAD's targeting logic's last resort -- it just tries to serve up an ad in the requested group.

Fatal ads are defined in a hash in DAD itself. They can't be defined in a database or anywhere external because DAD uses them precisely when it can't get access to those external sources. Fatal ads are organized by group name. The code in fatal_ad() picks a fatal ad for a group if that group explicitly has one set. Next, it chooses a fatal ad based on some regex matches on group names. Finally, if nothing else is available, it selects a default fatal ad.

Logging

As important as a flexible system for deciding what ads to serve is a comprehensive log of what has been served. DAD logs information about ad impressions and clickthroughs in a table called events. When an ad is served, a record is inserted into the events table. If that ad is clicked on, its row in the events table is updated with the time of the click.

Events looks like this:

CREATE TABLE events (
  id int(10) unsigned DEFAULT '0' NOT NULL auto_increment,
  ad int(10) unsigned DEFAULT '0' NOT NULL,
  gid int(10) unsigned DEFAULT '0' NOT NULL,
  page int(10) unsigned DEFAULT '0' NOT NULL,
  hit datetime DEFAULT '0000-00-00 00:00:00' NOT NULL,
  click datetime,
  partner int(10) unsigned DEFAULT '0' NOT NULL,
  local int(10) unsigned DEFAULT '0' NOT NULL,
  ip char(15) DEFAULT '' NOT NULL,
  usernum int(10) unsigned,
  status tinyint(3) unsigned,
  PRIMARY KEY (id)
);

id is the event id of an impression and clickthrough. ad is the id of the ad that was served. gid is the group id that the ad was selected from. page is the id of the page that the ad was served out of. This is an index into another table of pages that links the ids, which are integers, to the URIs of the pages. hit is the time of the ad impression. Because the rows in the events table are created when an ad impression is served, each row must have a value set in this column. click is the time, if any, that this ad impression was clicked on. It is NULL if the ad wasn't clicked on. partner and local are the Student.Com-specific values for different versions of our sites. ip is the ip address of the computer that the ad impression was delivered to. usernum is the user identifier returned by get_session_id() if is_valid_session() returns true, otherwise it is NULL. status is whether this ad impression was delivered to an internal or external user. This is calculated based on the IP address of the request. We use this to eliminate ad impressions delivered on our staging server or to other internal hosts from reporting.

Administration

The ad delivery and logging module is only part of DAD. Its other half is the administrative interface for managing ads and viewing statistics about their performance on the site.

Demonstrating and explaining the interface is difficult without a ton of screenshots, but the functionality can be broken down into "ad functions", "group functions" and "campaign functions".

Ad functions: You can search for ads by how many remaining impressions they have, or substrings in their clickthrough URL, alt text, or image src. You can create a new ad by filling in a form with information about the ad creative (image src, URL, alt text), run (start date, end date, impressions, groups and campaign), and targeting dimensions.

Group functions: You can list ads in a given group, create a new group, or delete an existing group.

Campaign functions: You can list members of an existing campaign, create a new campaign, edit or delete an exisiting campaign, and add or remove all members of a campaign to/from a given group.

A list of ads, whether a campaign, group, or search result, is presented as a summary list where each ad links to a more specific report about that ad.

The inline images and links have been removed from the HTML that follows.

A campaign list looks something like:


Campaign: houseads
Ad Default? Hits Clicks Rate
No 72430 543 0.75 %
No 68020 560 0.82 %
No 51808 257 0.50 %
Total: 600161 7751 1.29 %

this report over to

and a detailed report for a specific ad:


Ad: 145
img srchttp://www.student.com/images/234ads/234xword.gif
Clickthrough URLhttp://www.student.net/xword/
alt textThe LA Times Crossword at WWW.Student.Net
Extra textThe LA Times Crossword at WWW.Student.Net
Run start date1997-09-02
Run end date1999-12-31
Daily Impressions230/296
Remaining Impressions181236/300000

Date Hits Clicks Rate
04/27 164 2 1.22 %
04/26 265 4 1.51 %
04/25 240 2 0.83 %
04/24 204 3 1.47 %
04/23 183 1 0.55 %
04/22 201 0 0.00 %
04/21 193 2 1.04 %
04/20 236 6 2.54 %
04/19 267 1 0.37 %
04/18 223 1 0.45 %
04/17 127 1 0.79 %
04/16 122 0 0.00 %
04/15 213 1 0.47 %
04/14 185 6 3.24 %
04/13 205 3 1.46 %
04/12 233 2 0.86 %
04/11 238 2 0.84 %
04/10 289 6 2.08 %
04/09 162 0 0.00 %
04/08 197 3 1.52 %
Total: 4147 46 1.11 %

this report over to

Edit this Ad


"Edit this Ad" links to a form where settings for this ad can be changed:


Editing Ad 145
img src
Auto-size image? (.gif only)
Or enter image dimensions:width: height:
Clickthrough URL
Clickthrough Type normal map
alt text
Extra text
Run start date
Run end date
Impressions
Groups
Partners
Localities
Browser Restriction
Per-User Exposures
Domain Restrictions
Campaign
CPM
Cost Basis
Default Ad?

Components

The DAD functionality discussed so far is implemented in three packages: DAD::public, DAD::private, and DAD::config DAD::public implements ad delivery. It is typically configured with something in httpd.conf like:

<Location /dad>
SetHandler perl-script
PerlHandler DAD::public
</Location>

DAD::private implements the administrative interface. It is typically configured on a server that is not publically accessible with something in httpd.conf like:

<Location /dad-admin>
 SetHandler perl-script
 PerlHandler DAD::private
</Location>

DAD::config contains global settings and database connectivity information. It is used by the other modules;

DAD also provides automatic reporting to external clients. Organized on a per-campaign basis, each client gets a URL, username, and password where they have read-only access to reports about ads in their campaign. This is implemented by DAD::report and DAD::auth in httpd.conf as follows:

<Location /dad-report>
 SetHandler perl-script
 AuthName "DAD Reporting"
 AuthType Basic
 require valid-user
 PerlAuthenHandler DAD::auth
 PerlHandler DAD::report
</Location>

DAD also includes some utility programs:

Future Directions

DAD's immediate future development will focus on two things: cleaner code packaging and enhanced targeting abilities.

DAD was developed incrementally to be specialized to specific needs at Student.Net Publishing — targeting dimensions, database connectivity, session management, etc. While it is not difficult to adapt it for use in other enviroments, it would be helpful if was even easier to add targeting dimensions and their associated UI elements.

Additionally, the more tools in DAD's targeting toolbox, the more situations in which it will be useful. Advertisers never tire of wanting their ads to be served to specific audience slices — DAD is up to the task.