23 Mar 2015
To dig under the hood of PHP 7 a bit, I decided to see how hard it would be to update my Boxwood extension to be compatible with PHP 7.
Boxwood is something I built at Ning a few years ago to do efficient multi-word replacement in text. We used it to “bleep out” naughty words that Network Creators didn’t want to see on their networks. By building a trie of all the words to replace, it’s able to do a single pass through the text that might contain naughtiness and make replacements speedily.
It’s not a terribly complicated PHP extension but it exercises a few Zend Extension API features such as function calls (obviously), parameter parsing and type checking, resources, module globals and hash table traversal.
Guided by the handy instructions at https://wiki.php.net/phpng-upgrading I forked ning/boxwood over to davidsklar/boxwood, created a php7 branch and got to work. You can see the complete diff here.
Total time to make all the changes was about 40 minutes, and that includes some unrelated-to-PHP-7 housekeeping to remove warnings that gcc didn’t care about when I originally released boxwood but clang/LLVM (what I’m using now) complains about.
The interesting changes are all in php_boxwood.c.
First, I had to change how the boxwood resource is handled. Boxwood uses a PHP resource to represent a collection of words to bleep. The resource is created with
boxwood_new(), words get added to the resource by
boxwood_add_text(), and then
boxwood_replace_text() does the replacement. The resource type is now
zend_resource (instead of
zend_rsrc_list_entry) and there’s a new syntax for creating a resource. Additionally, to retrieve a resource from PHP’s internal resource list when it’s passed as an argument to a userspace function, I had to change to use the
zend_fetch_resource() function instead of the
Next, there were some changes to string handling in function arguments and return values. Instead of receiving a string argument with
s in the
zend_parse_parameters() argument specifier string, I use
S. (That’s a capital
S instead of lowercase
s.) This puts the passed in string into a
zend_string structure (instead of separate variables for the character data and length). The
val member of the struct has the character data.
After that, I had to update the hash traversal in
boxwood_replace_text(). The code became much simpler. Instead of tracking hash position myself and using a
for loop with verbose increment and condition steps, I use the dainty
ZEND_HASH_FOREACH_VAL() macro which conveniently iterates for me, plopping each hash value in a
zval for my use.
The other little cleanup was due to the disappearance of
IS_BOOL – the one place I used that I now have to test for
All in all an easy adventure. Next I think I’ll see how it goes getting it to work with HHVM’s ext_zend_compat.
20 Feb 2015
In attempting to use xsane with my ancient and swell CanoScan LiDE 35 on my less ancient and swell MacBook Pro running OS X Maverics, I was getting segfaults that lldb told me were happening in libcrypto:
* thread #1: tid = 0xbeb58, 0x00000001002548fd libcrypto.1.0.0.dylib`EVP_PKEY_CTX_free + 14, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x1000000000)
* frame #0: 0x00000001002548fd libcrypto.1.0.0.dylib`EVP_PKEY_CTX_free + 14
frame #1: 0x0000000100248a73 libcrypto.1.0.0.dylib`EVP_MD_CTX_cleanup + 127
frame #2: 0x00000001001129cc libnetsnmp.25.dylib`sc_hash + 437
frame #3: 0x0000000100110b93 libnetsnmp.25.dylib`hash_engineID + 92
frame #4: 0x00000001001108d3 libnetsnmp.25.dylib`search_enginetime_list + 44
frame #5: 0x0000000100110cb6 libnetsnmp.25.dylib`set_enginetime + 60
frame #6: 0x000000010011051d libnetsnmp.25.dylib`init_snmpv3_post_config + 150
frame #7: 0x0000000100113bbe libnetsnmp.25.dylib`snmp_call_callbacks + 480
frame #8: 0x00000001035ddbdd libsane-magicolor.1.so`mc_network_discovery + 118
frame #9: 0x00000001035da4b2 libsane-magicolor.1.so`attach_one_config + 635
frame #10: 0x00000001035d58aa libsane-magicolor.1.so`sanei_configure_attach + 169
frame #11: 0x00000001035da02b libsane-magicolor.1.so`sane_magicolor_get_devices + 96
frame #12: 0x000000010001e97d libsane.1.dylib`sane_dll_get_devices + 176
frame #13: 0x0000000100003cbf scanimage`main + 1751
frame #14: 0x00007fff90aac5c9 libdyld.dylib`start + 1
frame #15: 0x00007fff90aac5c9 libdyld.dylib`start + 1
brew reinstall incantations and searching around for similar bugs/solutions proved unfruitful. However,
grep -i snmp /usr/local/etc/sane.d/* turned up references in kodakaio.conf and magicolor.conf. Commenting out the
net autodiscovery line in magicolor.conf solved the problem.
Yes, this is one of those “making a note of it so it gets indexed by a search engine and I (or others) can find it later” posts.
28 Jan 2015
Everyone who fancies themselves a club promoter / Yenta-in-training / Gladwellan connector should be operating their own dating app. Where is the platform for the turnkey creation of dating apps/web sites? It brings the tech, you bring the style, invites, and spark that makes the trendy bar succeed while the lame bar next door fails.
These guys have a solution for $2 (plus $150 if you don’t want to compile the source code yourself!) Anybody tried it?
05 Jan 2015
I’m excited to be giving a talk at the upcoming O’Reilly Software Architecture Conference this March in Boston. I’ll be speaking about “How To Talk To Non-Engineers”. Come learn how to make nice to business people, designers, product managers, and all those other alien species who seem to want the impossible!
07 Nov 2014
The Raft consensus algorithm, which I ran into via Cockroach, has an interesting design constraint:
After struggling with Paxos ourselves, we set out to find a new consensus algorithm that could provide a better foundation for system building and education. Our approach was unusual in that our primary goal was understandability: could we define a consensus algorithm for practical systems and describe it in a way that is significantly easier to learn than Paxos? Furthermore, we wanted the algorithm to facilitate the development of intuitions that are essential for system builders. It was important not just for the algorithm to work, but for it to be obvious why it works.
(From In Search of an Understandable Consensus Algorithm (Extended Version) by Diego Ongaro and John Ousterhout).
14 Oct 2014
I came across Blue Mail when I was looking for better-than-the-default email clients for my Android device. Pretty interface, handled my IMAP settings well, nifty turn-any-message-into-a-reminder functionality. (The name of the app has since been changed to “Type”.)
No support for aliases or identities was a bummer – the address I want my messages coming from does not exactly match the hostname of my mail server. But when I emailed the support folks about it, I got a quick reply about it coming soon.
Fast forward a few weeks when I was noodling around and wondering what sort of network traffic my phone was doing when I did routine tasks. I ran a tcpdump on my home router to capture some traffic and loaded it up into Wireshark to investigate.
Most of the traffic looked familiar: IMAP and SMTP to my mail servers, HTTP to some web hosts I browsed. But there was a connection to port 10101 on an address that resolved to an AWS host. The payload was garbled – probably TLS. What was it?
This handy StackExchange page gave me the info I needed to find out.
brew install android-sdk and
android update sdk --no-ui --filter 'platform-tools' later, I could fire up
adb shell and grep for
2775 (hex of 10101 base 10) in
/proc/*/net/tcp6 to find the culprit. Another StackExchange page helped me map the UID to the process name. Which was
com.trtf.blue – Blue Mail.
I asked Blue Mail support why the client was connected to this host/port and they said “Blue Mail uses AWS currently for its proxy / push services, which are secured and encrypted.” Then I asked them if I could disable by changing the “Push or Fetch” setting to “Fetch” in my account settings.
This is where things went off the rails a bit. Instead of saying “Yes” or “No, we need to do this for Blue Mail’s awesome features like storing reminders,” I got some enthusiastic but evasive responses about how a client-only solution can’t do things like send scheduled emails when my device is turned off and that “Blue Mail is a modern Email service that will feature dozens of such capabilities”.
I appreciate that the (anonymous) developers of this app have big plans for their service (and that they claim not to store my emails on their servers) but the combination of their evasive responses, no available information about who is actually developing the app (the domain is registered via Domains By Proxy), and an unknown amount of my info flowing to places I don’t control means no more Blue Mail for me.
(Update on February 27, 2015 to include the new name of the app.)
01 Oct 2014
As you can see by looking at the date on my most recent blog post before this one, it’s been a while. After eight years at Ning and then fifteen months taking a break, I’m excited to be jumping back into freelancing and consulting.
I’ll be focusing on helping clients with software engineering, distributed systems, and engineering culture problems. And perhaps writing something interesting here more than once every four years.
29 Sep 2010
At work, we added a language filter to Ning Pro last month. It lets Network Creators have naughty words (for the Network Creator’s definition of “naughty”) replaced with * characters.
A straightforward way to do this in PHP is to pass an array of words to look for and their replacements to a function like str_replace() or str_ireplace(). Or, similarly, use a regular expression that gloms the search terms together (and potentially checks word boundaries.) There are assorted WordPress plugins that work like this.
The problem with this approach is that it’s really slow. Especially if you have a lot of words you’re looking for. The amount of time it takes to do the search and replace grows in proportion to the number of words you’re looking for. This is particularly unfortunate because usually, none of the words are ever found!
For our language filter, we took a different approach. We’ve packaged it up into a PHP extension called Boxwood and releasing it today as open source. (Find it on github: http://github.com/ning/boxwood.)
With Boxwood, you can have your list of search terms be as long as you like – the search and replace algorithm doesn’t get slower with more words on the list of words to look for. It works by building a trie of all the search terms and then scans your subject text just once, walking down elements of the trie and comparing them to characters in your text. It supports US-ASCII and UTF-8, case-sensitive or insensitive matching, and has some English-centric word boundary checking logic.
Take it for a drive and let us know what you think!
04 May 2010
I just posted on the Ning code blog about the PHP microbenchmarking framework we released:
I'm pleased to announce the release of ub, a PHP microbenchmarking framework. You can download it from http://github.com/ning/ub.
The goal is to make it as easy as possible to compare the runtime of alternative approaches to the same problem, such as different regular expressions, or different methods for string or array manipulation.
The source distribution contains a README with some documentation and a bunch of sample benchmarks.
For normal use, it is rare that two similar, but different approaches produce appreciable differences in runtime. (Inefficient regexes and bloated call stacks aside.) The payoff from this kind of benchmarking is really on operations that happen hundreds or thousands of times in a request, or are happening on hundreds or thousands of servers. At that point, shaving off small amounts of runtime performance can really make a difference.
I am looking forward to beef up the set of included benchmarks -- contributions are welcome!
16 Sep 2008
I presented my Static and Dynamic Analysis at Ning talk today at the 2008 Zend/PHP Conference. The conference is much bigger this year than last, very exciting to see all the different things people are doing.
The slides from my talk are available at http://www.sklar.com/files/static-dynamic-analysis-zendcon-2008.pdf .