...composed of an indefinite, perhaps infinite number of hexagonal galleries...

© 1994-2015. David Sklar. All rights reserved.

Converting a PHP Extension to PHP 7

To dig under the hood of PHP 7 a bit, I decided to see how hard it would be to update my Boxwood extension to be compatible with PHP 7.

Boxwood is something I built at Ning a few years ago to do efficient multi-word replacement in text. We used it to “bleep out” naughty words that Network Creators didn’t want to see on their networks. By building a trie of all the words to replace, it’s able to do a single pass through the text that might contain naughtiness and make replacements speedily.

It’s not a terribly complicated PHP extension but it exercises a few Zend Extension API features such as function calls (obviously), parameter parsing and type checking, resources, module globals and hash table traversal.

Guided by the handy instructions at I forked ning/boxwood over to davidsklar/boxwood, created a php7 branch and got to work. You can see the complete diff here.

Total time to make all the changes was about 40 minutes, and that includes some unrelated-to-PHP-7 housekeeping to remove warnings that gcc didn’t care about when I originally released boxwood but clang/LLVM (what I’m using now) complains about.

The interesting changes are all in php_boxwood.c.

First, I had to change how the boxwood resource is handled. Boxwood uses a PHP resource to represent a collection of words to bleep. The resource is created with boxwood_new(), words get added to the resource by boxwood_add_text(), and then boxwood_replace_text() does the replacement. The resource type is now zend_resource (instead of zend_rsrc_list_entry) and there’s a new syntax for creating a resource. Additionally, to retrieve a resource from PHP’s internal resource list when it’s passed as an argument to a userspace function, I had to change to use the zend_fetch_resource() function instead of the ZEND_FETCH_RESOURCE() macro.

Next, there were some changes to string handling in function arguments and return values. Instead of receiving a string argument with s in the zend_parse_parameters() argument specifier string, I use S. (That’s a capital S instead of lowercase s.) This puts the passed in string into a zend_string structure (instead of separate variables for the character data and length). The val member of the struct has the character data.

After that, I had to update the hash traversal in boxwood_replace_text(). The code became much simpler. Instead of tracking hash position myself and using a for loop with verbose increment and condition steps, I use the dainty ZEND_HASH_FOREACH_VAL() macro which conveniently iterates for me, plopping each hash value in a zval for my use.

The other little cleanup was due to the disappearance of IS_BOOL – the one place I used that I now have to test for IS_TRUE and IS_FALSE separately.

All in all an easy adventure. Next I think I’ll see how it goes getting it to work with HHVM’s ext_zend_compat.

xsane segfault on OS X Mavericks

In attempting to use xsane with my ancient and swell CanoScan LiDE 35 on my less ancient and swell MacBook Pro running OS X Maverics, I was getting segfaults that lldb told me were happening in libcrypto:

(lldb) bt
* thread #1: tid = 0xbeb58, 0x00000001002548fd libcrypto.1.0.0.dylib`EVP_PKEY_CTX_free + 14, queue = '', stop reason = EXC_BAD_ACCESS (code=1, address=0x1000000000)
  * frame #0: 0x00000001002548fd libcrypto.1.0.0.dylib`EVP_PKEY_CTX_free + 14
    frame #1: 0x0000000100248a73 libcrypto.1.0.0.dylib`EVP_MD_CTX_cleanup + 127
    frame #2: 0x00000001001129cc libnetsnmp.25.dylib`sc_hash + 437
    frame #3: 0x0000000100110b93 libnetsnmp.25.dylib`hash_engineID + 92
    frame #4: 0x00000001001108d3 libnetsnmp.25.dylib`search_enginetime_list + 44
    frame #5: 0x0000000100110cb6 libnetsnmp.25.dylib`set_enginetime + 60
    frame #6: 0x000000010011051d libnetsnmp.25.dylib`init_snmpv3_post_config + 150
    frame #7: 0x0000000100113bbe libnetsnmp.25.dylib`snmp_call_callbacks + 480
    frame #8: 0x00000001035ddbdd`mc_network_discovery + 118
    frame #9: 0x00000001035da4b2`attach_one_config + 635
    frame #10: 0x00000001035d58aa`sanei_configure_attach + 169
    frame #11: 0x00000001035da02b`sane_magicolor_get_devices + 96
    frame #12: 0x000000010001e97d libsane.1.dylib`sane_dll_get_devices + 176
    frame #13: 0x0000000100003cbf scanimage`main + 1751
    frame #14: 0x00007fff90aac5c9 libdyld.dylib`start + 1
    frame #15: 0x00007fff90aac5c9 libdyld.dylib`start + 1

Various brew reinstall incantations and searching around for similar bugs/solutions proved unfruitful. However, grep -i snmp /usr/local/etc/sane.d/* turned up references in kodakaio.conf and magicolor.conf. Commenting out the net autodiscovery line in magicolor.conf solved the problem.

Yes, this is one of those “making a note of it so it gets indexed by a search engine and I (or others) can find it later” posts.

Dating Platform

Everyone who fancies themselves a club promoter / Yenta-in-training / Gladwellan connector should be operating their own dating app. Where is the platform for the turnkey creation of dating apps/web sites? It brings the tech, you bring the style, invites, and spark that makes the trendy bar succeed while the lame bar next door fails.

These guys have a solution for $2 (plus $150 if you don’t want to compile the source code yourself!) Anybody tried it?

David at the O'Reilly Software Architecture Conference

I’m excited to be giving a talk at the upcoming O’Reilly Software Architecture Conference this March in Boston. I’ll be speaking about “How To Talk To Non-Engineers”. Come learn how to make nice to business people, designers, product managers, and all those other alien species who seem to want the impossible!

Optimizing for Understanding

The Raft consensus algorithm, which I ran into via Cockroach, has an interesting design constraint:

After struggling with Paxos ourselves, we set out to find a new consensus algorithm that could provide a better foundation for system building and education. Our approach was unusual in that our primary goal was understandability: could we define a consensus algorithm for practical systems and describe it in a way that is significantly easier to learn than Paxos? Furthermore, we wanted the algorithm to facilitate the development of intuitions that are essential for system builders. It was important not just for the algorithm to work, but for it to be obvious why it works.

(From In Search of an Understandable Consensus Algorithm (Extended Version) by Diego Ongaro and John Ousterhout).

Why I Stopped Using Blue Mail (Type Mail)

I came across Blue Mail when I was looking for better-than-the-default email clients for my Android device. Pretty interface, handled my IMAP settings well, nifty turn-any-message-into-a-reminder functionality. (The name of the app has since been changed to “Type”.)

No support for aliases or identities was a bummer – the address I want my messages coming from does not exactly match the hostname of my mail server. But when I emailed the support folks about it, I got a quick reply about it coming soon.

Fast forward a few weeks when I was noodling around and wondering what sort of network traffic my phone was doing when I did routine tasks. I ran a tcpdump on my home router to capture some traffic and loaded it up into Wireshark to investigate.

Most of the traffic looked familiar: IMAP and SMTP to my mail servers, HTTP to some web hosts I browsed. But there was a connection to port 10101 on an address that resolved to an AWS host. The payload was garbled – probably TLS. What was it?

This handy StackExchange page gave me the info I needed to find out.

A quick brew install android-sdk and android update sdk --no-ui --filter 'platform-tools' later, I could fire up adb shell and grep for 2775 (hex of 10101 base 10) in /proc/*/net/tcp6 to find the culprit. Another StackExchange page helped me map the UID to the process name. Which was – Blue Mail.

I asked Blue Mail support why the client was connected to this host/port and they said “Blue Mail uses AWS currently for its proxy / push services, which are secured and encrypted.” Then I asked them if I could disable by changing the “Push or Fetch” setting to “Fetch” in my account settings.

This is where things went off the rails a bit. Instead of saying “Yes” or “No, we need to do this for Blue Mail’s awesome features like storing reminders,” I got some enthusiastic but evasive responses about how a client-only solution can’t do things like send scheduled emails when my device is turned off and that “Blue Mail is a modern Email service that will feature dozens of such capabilities”.

I appreciate that the (anonymous) developers of this app have big plans for their service (and that they claim not to store my emails on their servers) but the combination of their evasive responses, no available information about who is actually developing the app (the domain is registered via Domains By Proxy), and an unknown amount of my info flowing to places I don’t control means no more Blue Mail for me.

(Update on February 27, 2015 to include the new name of the app.)

I'm Back!

As you can see by looking at the date on my most recent blog post before this one, it’s been a while. After eight years at Ning and then fifteen months taking a break, I’m excited to be jumping back into freelancing and consulting.

I’ll be focusing on helping clients with software engineering, distributed systems, and engineering culture problems. And perhaps writing something interesting here more than once every four years.

Fast Multiple String Replacement in PHP

At work, we added a language filter to Ning Pro last month. It lets Network Creators have naughty words (for the Network Creator’s definition of “naughty”) replaced with * characters.

A straightforward way to do this in PHP is to pass an array of words to look for and their replacements to a function like str_replace() or str_ireplace(). Or, similarly, use a regular expression that gloms the search terms together (and potentially checks word boundaries.) There are assorted WordPress plugins that work like this.

The problem with this approach is that it’s really slow. Especially if you have a lot of words you’re looking for. The amount of time it takes to do the search and replace grows in proportion to the number of words you’re looking for. This is particularly unfortunate because usually, none of the words are ever found!

For our language filter, we took a different approach. We’ve packaged it up into a PHP extension called Boxwood and releasing it today as open source. (Find it on github:

With Boxwood, you can have your list of search terms be as long as you like – the search and replace algorithm doesn’t get slower with more words on the list of words to look for. It works by building a trie of all the search terms and then scans your subject text just once, walking down elements of the trie and comparing them to characters in your text. It supports US-ASCII and UTF-8, case-sensitive or insensitive matching, and has some English-centric word boundary checking logic.

Take it for a drive and let us know what you think!

PHP Microbenchmarking

I just posted on the Ning code blog about the PHP microbenchmarking framework we released:

I'm pleased to announce the release of ub, a PHP microbenchmarking framework. You can download it from

The goal is to make it as easy as possible to compare the runtime of alternative approaches to the same problem, such as different regular expressions, or different methods for string or array manipulation.

The source distribution contains a README with some documentation and a bunch of sample benchmarks.

For normal use, it is rare that two similar, but different approaches produce appreciable differences in runtime. (Inefficient regexes and bloated call stacks aside.) The payoff from this kind of benchmarking is really on operations that happen hundreds or thousands of times in a request, or are happening on hundreds or thousands of servers. At that point, shaving off small amounts of runtime performance can really make a difference.

I am looking forward to beef up the set of included benchmarks -- contributions are welcome!

ZendCon 2008: Static and Dynamic Analysis at Ning

I presented my Static and Dynamic Analysis at Ning talk today at the 2008 Zend/PHP Conference. The conference is much bigger this year than last, very exciting to see all the different things people are doing.

The slides from my talk are available at .