Introduction
At Student.Net, Perl serves millions of ad impressions every
month with DAD, a mod_perl application with advanced targeting
capabilities, a comprehensive administrative interface, and automatic
client reporting.
An ad server is a revenue-generating, business-critical
application. It may not surprise anyone at the Perl Conference to
learn that Perl functions robustly and admirably under those
conditions, but every so often, the rest of the world needs a
reminder. Mod_perl's architecture gives this critical application the
performance it needs.
Most comprehensive ad servers are very expensive and involve
not only large start-up fees, but ongoing "support"
contracts. Additionally, because they are proprietary commercial
products, one has no way of finding out why a bug exists and no
control over how long it will take to fix. Plus, while everyone wants
to tie their ad server to their user databases to have custom targeted
ads, no commercial product allows this without some
moderately-to-severely complicated C or Java programming. The cycle to
create this targeting of "code, compile-as-shared-library-or-class,
restart server, test, hunt down error, re-code" does not make for
very efficient ad server customization.
Fortunately, Perl can save the day. DAD is free, open-source,
customizable, and portable. It requires mod_perl and apache, but if
embedded Perl interpreters emerge for other webservers, it shouldn't
be too difficult to use them with DAD. It runs on any of the billion
platforms that Perl does, and can use for its backend any of the
databases with DBI drivers. If you move your site from Linux and
MySQL to Solaris and Oracle you only need to change one line in DAD's
configuration.
From a technical perspective, because DAD is a mod_perl
application, it does some things better than CGI, and does other
things that CGI just can't do. Because CGI processes only live as long
as the request that created them does, they need to establish a new
database connection each time they are executed. Even on databases
with quick connection times, like MySQL, this can lead to a lot of TCP
sockets hanging around in TIME_WAIT. With a database that has a slower
connection startup time, this delay can make it impossible to server a
large number of requests. Apache::DBI manages per-process DBI
connections that persist for the lifetime of an apache child process,
eliminating the problems of establishing a new database connection for
each request.
DAD also takes advantage of apache's request chain to make
targeting decisions about multiple ads on the same page. Each ad
insertion on a page is a subrequest of the main request for the
page. If called with an "exclude" flag, DAD can pass information
between individual ad insertions through the main request to ensure
that the same ad does not appear on the same page more than once. This
technique can also be applied to entire ad campaigns. Sometimes,
advertisers will demand that their ads not run on the same page as a
competitor's ads. If these two campaigns are marked as "enemies", DAD
will use this exclude functionality to prevent ads from the two
campaigns from appearing on the same page.
Mod_perl also gives DAD the speed of a compiled-in C module,
but with much greater customizability and extensibility. New targeting
dimensions are only a matter of a few lines of Perl and a column in a
table. The functions that DAD uses to identify users and sessions for
targeting are abstracted out from DAD's core and can be easily
changed. Plus, the rules that DAD uses to determine what ads to serve
in fatal conditions (e.g. loss of database connectivity) can be simply
altered and expanded.
Organizing Ads
To understand how DAD functions, it is necessary to understand how it
organizes ads. Each ad belongs to one campaign and zero or more
groups. Campaign is an advertiser-centric collection:
all ads from a given advertiser will normally be in the same
campaign. Group is a site-centric collection: all ads
that could appear in a particular spot on the site will normally be in
the same group.
When DAD is invoked from a page, it is passed a group name. The ad
selected will generally be chosen from that group. Multiple
invocations of DAD from the same page can request ads from the
same or different group.
For example, if Citibank has five different creatives that they
want to run on your site, each of those ads would be in the
'citibank' campaign. If Citibank wants to run the ads on your
sports and your entertainment pages, you would put each of those
ads in both your sports and entertainment groups. On your sports
pages, you would ask DAD for ads from the sports group, and on
your entertainment pages, you would ask DAD for ads from the
entertainment group.
Delivering an ad
Ads are inserted into pages through a virtual subrequest. With
SSI, this looks like
<!--#include virtual="/dad/ad?group=groupname" -->
in PHP, this looks like
<?php virtual("/dad/ad?group=groupname"); ?>
Where /dad is the <Location> that DAD has been
assigned in the server's configuration.
This command will select an ad from those that belong to group
groupname and are eligible to be served. Eligibility means:
- The current time is after the ad's run start date and before the ad's run end date
- The ad has some remaining daily impressions
- The ad is not disqualified by any additional targeting dimensions (browser, domain name, etc.)
An ad is selected for delivery out of the pool of eligible ads in
proportion relative to the ratio of its remaining daily impressions to
the total remaining daily impressions of the eligible ads.
Each ad's remaining daily impressions get recalculated once a day. An
ad's remaining daily impressions is set to the total number of
remaining impressions for the ad divided by the number of days
remaining in the ad's run. This value is used for weighted targeting
instead of just impressions remaining in the ad run to provide some
even distribution of an ad's delivery over its entire run.
An example: there are three ads in the library group. Ads 1
and 2 run from September 1 to December 1 and ad 3 runs from October 1
to November 1. Today is October 15 and at the time of recalculating
each ad's remaining daily impressions, Ad 1 has 90,000 total remaining
impresions, Ad 2 has 45,000 total remaining impressions and Ad 3 has
30,000 total remaining impressions.
Each ad's remaining daily impressions will get calculated as follows:
- Ad 1: 90,000 impressions / 45 days = 2,000 RDI
- Ad 2: 45,000 impressions / 45 days = 1,000 RDI
- Ad 3: 30,000 impressions / 15 days = 2,000 RDI
So when DAD is commanded to deliver an ad from group library, if no
other targeting dimensions are involved, 40% of the time it will
choose ad 1, 40% of the time it will choose ad 2, and 20% of the time
it will choose ad 2. Note that since each ad's RDI is decremented by 1
each time the ad is delivered, these ratios will fluctuate throughout
the day.
Targeting Dimensions
In addition to weighting ads by how many impressions they have
remaining, DAD's ad selection process can be influences by additional
factors, or dimensions. Before DAD makes its choice of ads
based on their remaining impressions it checks each ad against each
targeting dimension and makes an ad ineligible to be chosen if it has
the wrong value for a dimension.
Two of the targeting dimensions DAD has are Student.Com-specific:
PARTNER and LOCAL. We operate different versions of
our site for different content partners and for different
localities. The <VirtualHost> definitions for each of
these site versions set the environment variables PARTNER and
LOCAL. Ads can be targeted to appear only on one or more
versions of the site based on these variables. An ad's value for this
dimension is stored in the database as a 4-byte value - if a bit is
turned on, that means the ad is eligible to be served for that
PARTNER or LOCAL version of the site. The following
code checks whether an ad is eligible for a given PARTNER
version of the site. ( is the ad's value for this
targeting dimension that has been retrieved from the database).
# DIMENSION: PARTNER
# unless the bit is set in or is 0, drop this ad
if () {
unless ( & ) {
("DAD:DIMENSION:PARTNER: {id} out ( v. )") if ::config::DEBUG;
{id} = -1;
}
}
Targeting by browser uses a helper function to parse User-Agent strings:
# DIMENSION: BROWSER
# Does this user have the right browser?
if () {
my = get_browser(('User-Agent'));
unless ( == ) {
("DAD:DIMENSION:BROWSER: {id} out ([] => vs. )") if ::config::DEBUG;
{id} = -1;
}
}
[ . . . ]
sub get_browser {
my = shift || return 'Other';
# these regexes are listed in order of probability of occurrence
# so that we do as few tests as possible
if (( =~ m{^Mozilla}i) && !( =~ m{\(compatible;}i)) { return 'Netscape'; }
if ( =~ m{\(compatible; MSIE}i) { return 'Microsoft'; }
if ( =~ m{^Microsoft Internet Explorer}i) { return 'Microsoft'; }
if ( =~ m{\(compatible; Opera}i) { return 'Opera'; }
return 'Other';
}
After the targeting dimension checking has been completed, DAD makes an array of just the ads that are still eligible to be served:
# remove the inappropriate ads
my @goodAds = sort { my %c = %; my %d = %; {id} <=> {id};} @ads;
= 0; while ({id} < 0) {
("DAD: Skipping position with negative id") if ::config::DEBUG;
++;
}
@ads = splice(@goodAds,);
Limiting Users' Exposure To Ads
Clickthrough rates decline pretty steadily as a user sees the same
banner ad over and over again. Advertisers like to be able to
limit the number of times that a specific user will see a
specific banner. This requires more work by the advertiser --
they need to provide more different banners to your site -- but
it can increase clickthroughs. DAD can limit the number of times
a given user is exposed to a specific banner.
This capability depends on each user carrying around some kind of
session ID. At Student.Com, we use a cookie -- signed in users
carry around an identifier, and non-signed in users are assigned
an anonymous identifier. The method that you use to assign
identifiers to users is external to and separate from DAD. To
adapt DAD to your user identifier scheme, you need to modify the
functions get_session_id() and
is_valid_session(). get_session_id() is passed
the Request structure and needs to return the user's
session ID. You have access to everything in the request
structure -- cookies, headers, notes, etc. -- to tie into your
site's session ID scheme. is_valid_session() is passed
a session ID and returns 1 if this session ID represents an
actual, signed-in user, and 0 if this session ID represents an
anonymous user.
If an ad has an exposure limit set, DAD retrieves the current
session ID and attempts to find out how many times the current
session ID has seen each of the possible ads. If that value
exceeds the exposure count, the ad is marked as ineligible. When
DAD eventually picks the actual ad to serve, it increments the
session ID's count of times that ad has been viewed.
Targeting Ads by Domain Name
DAD can also target ads by the host or domain name of the computer
requesting the ad. An ad can be tagged with a regular expression
to ensure that the ad only gets delivered to computers with a
hostname that matches the regular expression.
Because doing DNS lookups can be slow, DAD uses subrequests and
notes to minimize the number of lookups required. If there are
multiple ads on the same page, the first ad will lookup the
hostname of the client and then store in the notes hash of the
main request. Each subsequent ad on the same page, as a
subrequest of the main request for the page, will check that
notes field before actually doing a DNS lookup, ensuring that
the DNS lookup happens only once per page, independent of how
many ads there are on the page.
Excluding Ads
DAD takes advantage of mod_perl's subrequest and notes mechanisms
to provide a powerful feature: ad exclusion. DAD can prevent the same
ad from being shown on one page more than once and can prevent ads
from certain campaigns from appearing on the same page.
To ensure that ads on the same page are all different from each
other, DAD is called with /exclude appended to its path for
each ad:
<!--#include virtual="/dad/ad/exclude?group=sports" -->
<!--#include virtual="/dad/ad/exclude?group=sports" -->
Each virtual include is a subrequest of the request for the page that
contains them. So, if it is in exclude mode, before it deals with any
ads, DAD retrieves the contents of the main request's
excludeID notes field:
if () {
unless () {
foreach (split(',',->notes("excludeID"))) {
= 1;
("DAD: Excluding ") if ::config::DEBUG;
}
}
}
then, when retrieving possible ads from the database, it ignores ads
that have already been shown:
while (() = ) {
next if ( && );
push(@ads, { id => });
("DAD: Considering ") if ::config::DEBUG;
}
and after delivering an ad, it ads that ad's id to the list of ads
that should be excluded:
if () {
= 1;
unless () {
->notes("excludeID",join(',',keys %excludeIDs));
}
}
This technique is extended to exclude campaigns that are marked as
enemies of each other. Frequently, we run ads from
competitors simultaneously. AT&T and Sprint have run ads during
the same month on Student.Com; so have Visa and American Express. Each
is unhappy if their ad runs on the same page as their competitor's. By
marking the Visa and American Express campaign as enemies of
each other, DAD ensures that their ads never run together, whether or
not is it given an /exclude flag. The procedure is similar to
excluding single ads. First, the enemies list is retrieved from the
main request's enemies notes field:
# grab the enemies list out of the notes
unless () {
foreach (split(',',->notes('enemies'))) {
= 1;
("DAD: Got Enemy ") if ::config::DEBUG;
}
}
and then, just before each ad's targeting dimensions are analyzed, a
check is made to see if that ad belongs to any enemy campaigns
( is the campaign of the ad being considered):
# is this ad in an enemy campaign?
if () {
("DAD:ENEMY {id} in ") if ::config::DEBUG;
{id} = -1;
}
After an ad is delivered, its campaign is added to the main request's
enemies list ( is the campaign of the ad that
was selected to be delivered):
unless () {
= 1;
my = join(',',keys %enemies);
("DAD: Enemies list is []") if ::config::DEBUG;
->notes('enemies',);
}
Error Recovery
DAD has two levels of error recovery to ensure that an appropriate
an ad as possible gets served if something goes
wrong. Default ads are served in circumstances where
targeting dimensions or remaining impressions make in impossible
to fulfill an ad request properly. Fatal ads are served
when DAD can't communicate with the database or there are other
external errors.
Ads can be individally marked as default. If all ads in a
given group have used up all their remaining daily impressions,
then DAD tries to make a targeting decision based on the ads'
total remaining impressions. If no ads in a group have any
remaining impressions, then DAD will randomly select one of the
ads in the group that are marked as default. This
selection ignores any targeting paramenters that the ads may
have set. It is DAD's targeting logic's last resort -- it just
tries to serve up an ad in the requested group.
Fatal ads are defined in a hash in DAD itself. They can't be
defined in a database or anywhere external because DAD uses them
precisely when it can't get access to those external
sources. Fatal ads are organized by group name. The code
in fatal_ad() picks a fatal ad for a group if that
group explicitly has one set. Next, it chooses a fatal ad based
on some regex matches on group names. Finally, if nothing else
is available, it selects a default fatal ad.
Logging
As important as a flexible system for deciding what ads to serve is
a comprehensive log of what has been served. DAD logs information
about ad impressions and clickthroughs in a table called
events. When an ad is served, a record is inserted into the
events table. If that ad is clicked on, its row in the events table is
updated with the time of the click.
Events looks like this:
CREATE TABLE events (
id int(10) unsigned DEFAULT '0' NOT NULL auto_increment,
ad int(10) unsigned DEFAULT '0' NOT NULL,
gid int(10) unsigned DEFAULT '0' NOT NULL,
page int(10) unsigned DEFAULT '0' NOT NULL,
hit datetime DEFAULT '0000-00-00 00:00:00' NOT NULL,
click datetime,
partner int(10) unsigned DEFAULT '0' NOT NULL,
local int(10) unsigned DEFAULT '0' NOT NULL,
ip char(15) DEFAULT '' NOT NULL,
usernum int(10) unsigned,
status tinyint(3) unsigned,
PRIMARY KEY (id)
);
id is the event id of an impression and
clickthrough. ad is the id of the ad that was
served. gid is the group id that the ad was selected
from. page is the id of the page that the ad was served out
of. This is an index into another table of pages that links the ids,
which are integers, to the URIs of the pages. hit is the time
of the ad impression. Because the rows in the events table are created
when an ad impression is served, each row must have a value set in
this column. click is the time, if any, that this ad
impression was clicked on. It is NULL if the ad wasn't
clicked on. partner and local are the
Student.Com-specific values for different versions of our
sites. ip is the ip address of the computer that the ad
impression was delivered to. usernum is the user identifier
returned by get_session_id() if is_valid_session()
returns true, otherwise it is NULL. status is
whether this ad impression was delivered to an internal or external
user. This is calculated based on the IP address of the request. We
use this to eliminate ad impressions delivered on our staging server
or to other internal hosts from reporting.
Administration
The ad delivery and logging module is only part of DAD. Its other half
is the administrative interface for managing ads and viewing
statistics about their performance on the site.
Demonstrating and explaining the interface is difficult without a ton
of screenshots, but the functionality can be broken down into "ad
functions", "group functions" and "campaign functions".
Ad functions: You can search for ads by how many remaining impressions
they have, or substrings in their clickthrough URL, alt text, or image
src. You can create a new ad by filling in a form with information
about the ad creative (image src, URL, alt text), run (start date, end
date, impressions, groups and campaign), and targeting dimensions.
Group functions: You can list ads in a given group, create a new
group, or delete an existing group.
Campaign functions: You can list members of an existing campaign,
create a new campaign, edit or delete an exisiting campaign, and add
or remove all members of a campaign to/from a given group.
A list of ads, whether a campaign, group, or search result, is
presented as a summary list where each ad links to a more specific
report about that ad.
The inline images and links have been removed from the HTML that
follows.
A campaign list looks something like:
| Campaign: houseads |
| Ad |
Default? |
Hits |
Clicks |
Rate |
![]() |
No |
72430 |
543 |
0.75 % |
![]() |
No |
68020 |
560 |
0.82 % |
![]() |
No |
51808 |
257 |
0.50 % |
| Total: |
600161 |
7751 |
1.29 % |
and a detailed report for a specific ad:
| Ad: 145 |
![]() |
| img src | http://www.student.com/images/234ads/234xword.gif |
| Clickthrough URL | http://www.student.net/xword/ |
| alt text | The LA Times Crossword at WWW.Student.Net |
| Extra text | The LA Times Crossword at WWW.Student.Net |
| Run start date | 1997-09-02 |
| Run end date | 1999-12-31 |
| Daily Impressions | 230/296 |
| Remaining Impressions | 181236/300000 |
| Date |
Hits |
Clicks |
Rate |
| 04/27 |
164 |
2 |
1.22 % |
| 04/26 |
265 |
4 |
1.51 % |
| 04/25 |
240 |
2 |
0.83 % |
| 04/24 |
204 |
3 |
1.47 % |
| 04/23 |
183 |
1 |
0.55 % |
| 04/22 |
201 |
0 |
0.00 % |
| 04/21 |
193 |
2 |
1.04 % |
| 04/20 |
236 |
6 |
2.54 % |
| 04/19 |
267 |
1 |
0.37 % |
| 04/18 |
223 |
1 |
0.45 % |
| 04/17 |
127 |
1 |
0.79 % |
| 04/16 |
122 |
0 |
0.00 % |
| 04/15 |
213 |
1 |
0.47 % |
| 04/14 |
185 |
6 |
3.24 % |
| 04/13 |
205 |
3 |
1.46 % |
| 04/12 |
233 |
2 |
0.86 % |
| 04/11 |
238 |
2 |
0.84 % |
| 04/10 |
289 |
6 |
2.08 % |
| 04/09 |
162 |
0 |
0.00 % |
| 04/08 |
197 |
3 |
1.52 % |
| Total: |
4147 |
46 |
1.11 % |
Edit this Ad |