sklar.com

...composed of an indefinite, perhaps infinite number of hexagonal galleries...

© 1994-2017. David Sklar. All rights reserved.

Optimizing for Understanding

The Raft consensus algorithm, which I ran into via Cockroach, has an interesting design constraint:

After struggling with Paxos ourselves, we set out to find a new consensus algorithm that could provide a better foundation for system building and education. Our approach was unusual in that our primary goal was understandability: could we define a consensus algorithm for practical systems and describe it in a way that is significantly easier to learn than Paxos? Furthermore, we wanted the algorithm to facilitate the development of intuitions that are essential for system builders. It was important not just for the algorithm to work, but for it to be obvious why it works.

(From In Search of an Understandable Consensus Algorithm (Extended Version) by Diego Ongaro and John Ousterhout).

Why I Stopped Using Blue Mail (Type Mail)

I came across Blue Mail when I was looking for better-than-the-default email clients for my Android device. Pretty interface, handled my IMAP settings well, nifty turn-any-message-into-a-reminder functionality. (The name of the app has since been changed to “Type”.)

No support for aliases or identities was a bummer – the address I want my messages coming from does not exactly match the hostname of my mail server. But when I emailed the support folks about it, I got a quick reply about it coming soon.

Fast forward a few weeks when I was noodling around and wondering what sort of network traffic my phone was doing when I did routine tasks. I ran a tcpdump on my home router to capture some traffic and loaded it up into Wireshark to investigate.

Most of the traffic looked familiar: IMAP and SMTP to my mail servers, HTTP to some web hosts I browsed. But there was a connection to port 10101 on an address that resolved to an AWS host. The payload was garbled – probably TLS. What was it?

This handy StackExchange page gave me the info I needed to find out.

A quick brew install android-sdk and android update sdk --no-ui --filter 'platform-tools' later, I could fire up adb shell and grep for 2775 (hex of 10101 base 10) in /proc/*/net/tcp6 to find the culprit. Another StackExchange page helped me map the UID to the process name. Which was com.trtf.blue – Blue Mail.

I asked Blue Mail support why the client was connected to this host/port and they said “Blue Mail uses AWS currently for its proxy / push services, which are secured and encrypted.” Then I asked them if I could disable by changing the “Push or Fetch” setting to “Fetch” in my account settings.

This is where things went off the rails a bit. Instead of saying “Yes” or “No, we need to do this for Blue Mail’s awesome features like storing reminders,” I got some enthusiastic but evasive responses about how a client-only solution can’t do things like send scheduled emails when my device is turned off and that “Blue Mail is a modern Email service that will feature dozens of such capabilities”.

I appreciate that the (anonymous) developers of this app have big plans for their service (and that they claim not to store my emails on their servers) but the combination of their evasive responses, no available information about who is actually developing the app (the domain is registered via Domains By Proxy), and an unknown amount of my info flowing to places I don’t control means no more Blue Mail for me.

(Update on February 27, 2015 to include the new name of the app.)

I'm Back!

As you can see by looking at the date on my most recent blog post before this one, it’s been a while. After eight years at Ning and then fifteen months taking a break, I’m excited to be jumping back into freelancing and consulting.

I’ll be focusing on helping clients with software engineering, distributed systems, and engineering culture problems. And perhaps writing something interesting here more than once every four years.

Fast Multiple String Replacement in PHP

At work, we added a language filter to Ning Pro last month. It lets Network Creators have naughty words (for the Network Creator’s definition of “naughty”) replaced with * characters.



A straightforward way to do this in PHP is to pass an array of words to look for and their replacements to a function like str_replace() or str_ireplace(). Or, similarly, use a regular expression that gloms the search terms together (and potentially checks word boundaries.) There are assorted WordPress plugins that work like this.



The problem with this approach is that it’s really slow. Especially if you have a lot of words you’re looking for. The amount of time it takes to do the search and replace grows in proportion to the number of words you’re looking for. This is particularly unfortunate because usually, none of the words are ever found!



For our language filter, we took a different approach. We’ve packaged it up into a PHP extension called Boxwood and releasing it today as open source. (Find it on github: http://github.com/ning/boxwood.)



With Boxwood, you can have your list of search terms be as long as you like – the search and replace algorithm doesn’t get slower with more words on the list of words to look for. It works by building a trie of all the search terms and then scans your subject text just once, walking down elements of the trie and comparing them to characters in your text. It supports US-ASCII and UTF-8, case-sensitive or insensitive matching, and has some English-centric word boundary checking logic.



Take it for a drive and let us know what you think!

PHP Microbenchmarking

I just posted on the Ning code blog about the PHP microbenchmarking framework we released:


I'm pleased to announce the release of ub, a PHP microbenchmarking framework. You can download it from http://github.com/ning/ub.

The goal is to make it as easy as possible to compare the runtime of alternative approaches to the same problem, such as different regular expressions, or different methods for string or array manipulation.

The source distribution contains a README with some documentation and a bunch of sample benchmarks.

For normal use, it is rare that two similar, but different approaches produce appreciable differences in runtime. (Inefficient regexes and bloated call stacks aside.) The payoff from this kind of benchmarking is really on operations that happen hundreds or thousands of times in a request, or are happening on hundreds or thousands of servers. At that point, shaving off small amounts of runtime performance can really make a difference.

I am looking forward to beef up the set of included benchmarks -- contributions are welcome!

ZendCon 2008: Static and Dynamic Analysis at Ning

I presented my Static and Dynamic Analysis at Ning talk today at the 2008 Zend/PHP Conference. The conference is much bigger this year than last, very exciting to see all the different things people are doing.



The slides from my talk are available at http://www.sklar.com/files/static-dynamic-analysis-zendcon-2008.pdf .

PHP + Emacs

Here are some links that are related to the Editor/IDE panel I’m on today at the 2008 DC PHP Conference:

"Little Bobby Tables" vs. the US Government

Chris often cites this xkcd cartoon in security talks, since it’s a) funny and b) a good example of SQL Injection.



I was curious to see what sorts of shenanigans one can get away with in a legal name. I’m still waiting to hear back from the NYC agency that issues birth certificates but here’s what the US Social Security Agency told me:


The maximum number of characters to be shown on the Social Security number (SSN) card for the first and middle name is 26; the maximum number of characters for the last name is 26. Full names will not be reduced to initials unless the combination of first and middle names exceeds 26 characters. The only acceptable characters are alphas, hyphens, and apostrophes. The SSN card will be printed as entered into the enumeration system, but the SSN record will not display the hyphens/apostrophes.



So with hyphens and apostrophes you might be able to get away with a little syntax error mischief, I suppose.



Turns out the SSA has really detailed public documentation of all their procedures.

Prolific Name-Sources

I had never heard of Haskell Curry before I read the preface to SICP just now, but it occurs to me that his distinction of having each of his names (first and last) turned into a term used in the discipline he studied (Haskell the programming language and Curry the operation) is sort of like how Glenn Seaborg, working at LBL, could have a letter addressed to him using element names (seaborgium / lawrencium berkelium / californium / americium.)



Who else can you think of that falls into this admittedly fuzzy-edged bucket?

Let a thousand string concatenations bloom

How many different ways (in PHP) are there to concatenate the string values in two variables and put the result in a third variable?



Here are a few to start:

  • $alice = $bob . $charlie;
  • $alice = "$bob$charlie";
  • $alice = sprintf('%s%s', $bob, $charlie);
  • ob_start(); echo $bob, $charlie; $alice = ob_get_clean();
  • $alice = implode('', array($bob,$charlie));



I would say something like “Of course, this entire exercise is just for fun and in practice is totally useless,” but whenever I start out thinking that something actually productive eventually emerges. So perhaps this is totally useless, perhaps not!