How I Found A 55 Year Old Bug In The First Lunar Lander Game

Update: This kinda blew up! Featured in Hacker News, Ars Technica and PC Gamer, among others.

Just months after Neil Armstrong’s historic moonwalk, Jim Storer, a Lexington High School student in Massachusetts, wrote the first Lunar Landing game. By 1973, it had become “by far and away the single most popular computer game.” A simple text game, you pilot a moon lander, aiming for a gentle touch down on the moon. All motion is vertical and you decide every 10 simulated seconds how much fuel to burn.

I recently explored the optimal fuel burn schedule to land as gently as possible and with maximum remaining fuel. Surprisingly, the theoretical best strategy didn’t work. The game falsely thinks the lander doesn’t touch down on the surface when in fact it does. Digging in, I was amazed by the sophisticated physics and numerical computing in the game. Eventually I found a bug: a missing “divide by two” that had seemingly gone unnoticed for nearly 55 years.

Landing with Maximum Fuel

To use the least fuel while landing, you need to land in the shortest time possible. Initially you maximize your speed by keeping your engine off, then at the last possible second you burn full throttle, reducing your speed to zero just as you touch the surface. The Kerbal Space Program community calls this a “suicide burn”, since getting the timing right is hard and it leaves no room for error.

With some trial and error and a bit of (manual) binary search, you can find the schedule that just barely has you landing. You burn nothing for 70 seconds, then 164.31426784 lbs/sec for the next 10 seconds, then the max 200 lbs/sec after that:

The game considers a perfect landing to be less than 1 MPH, but here we land at over 3.5 MPH and are told we “could be better.” Yet burn even 0.00000001 more lbs/sec, and you miss the surface entirely, ascending at 114 MPH:

How did we go from a hard landing to not landing at all, without a soft landing in between?

Physics Simulation: One Smart Kid

I expected to see the simple Euler integration that’s common in video games even today. That’s where you compute the forces at the start of each time step, then use F=ma to compute the acceleration, then assume the acceleration is constant over a time step of \Delta t seconds:

\Delta v = a \Delta t
\Delta x = v \Delta t + \frac{1}{2} a (\Delta t)^2

Because the mass is changing over the timestep, the acceleration will change too, so assuming it’s constant is only approximate. But surprisingly, Jim used the exact solution, the Tsiolkovsky rocket equation, with a Taylor series expansion for the logarithm. He also used some algebra to simplify it and reduce the amount of round off error. Very impressive for a high school senior in 1969. When I asked him about it:

“I was skilled at calculus at the time and familiar with concepts like a Taylor series, but also my recollection is that my father, who was a physicist, helped me in the derivation of the equations.” – Jim Storer, personal communication

The rocket equation is what gives rise to the suicide burn being optimal, and the five terms he uses of the Taylor series, where the argument is at most 0.1212, makes it accurate to over six decimal places. So that’s not the problem we’re looking for.

Assumptions Go Out The Window When We Hit The Ground

The rocket equation works well until you hit the ground. In general, collisions between solid objects is a really hard part of dynamics engines, and what separates the great ones from the good ones, as I discovered when contributing to Open Dynamics Engine as a postdoc at MIT.

And this is where the Lunar Landing Game faces its biggest challenge. Imagine the lander descending at the start of a 10-second turn but ascending by the end. Simply verifying that it’s above the surface at both points isn’t enough. It might have dipped below the surface somewhere in between. When this happens, the program has to rewind and examine an earlier moment.

An obvious place is to look at the lowest point of the trajectory, where the velocity is zero. For the rocket equation, it turns out, there’s no closed form expression for that lowest point that involves only basic mathematical functions.1 So instead, we can use the physicists favourite technique, and take only the first few terms of the Taylor series. If you use only the first two terms of the logarithm, the problem simplifies to a quadratic equation and you can use the good old quadratic formula from high school. And the approximation should be pretty good over the space of a 10 second turn, accurate to within 0.1% or so.

But that’s not what Jim did. His formula has a square root in the denominator, not the numerator. It also had an error 30 times bigger.

How To Know When You’ve Hit Rock Bottom

What could he possibly be doing? I stared at this for a long time, wracking my brain for any other approach to approximate the bottom of the trajectory that would still only use addition, subtraction, multiplication, division and square root. Taking only the first term of the logarithm would give an approximation worse than the quadratic, but wouldn’t involve a square root. Taking a third term and we need to solve a cubic, which in general would need a lot more code and in our case it doesn’t seem to be of any special form that has a simple solution2. There are many approximations to \log(x), but the non-Taylor ones involve advanced functions like x^x that are hard to invert.

Until I looked a little more closely at his square root. It’s of the form:

\sqrt{ \left( \dfrac{b}{2} \right) ^2 - a c}

Which looks a awful lot like the quadratic formula where we’ve divided by 4 inside the square root. It has to be related. But why is it in the denominator? Did he find a quadratic in 1/t ? No, because t can be very close to zero, so his formula would need to be approximated over a wide range of very large values, and a quadratic isn’t good for that. Did he make a quadratic approximation to log, then substitute T=1/t, solve for T, then substitute back? Playing around with that, I re-discovered an alternate form of the quadratic formula with the square root on the bottom! And indeed, this matches the formula in Jim’s code.

And this alternate form has a key advantage. Because once we discover that we’ve hit the ground, we need to go back and find the time where we hit the ground. And we do the same thing, truncating the Taylor series to give us a quadratic. And in that case, the original form has a divide by zero when the leading coefficient is zero. That happens when the thrust from the rocket engine exactly balances the force of gravity. And that’s probably common for many players, who hover over the surface and gently descend. And they don’t need to be exactly equal. If the thrust is close to the force of gravity, you can get catastrophic cancellation in the numerator, then the tiny denominator blows up your errors. This alternate form is much more numerically stable, and in fact works just fine when the quadratic term is zero, so we actually have a linear equation and not quadratic. It’s amazing to me that Jim would either somehow re-derive it, or learn it somewhere. I don’t think it’s taught in textbooks outside numerical computing, and I don’t think it’s even common in physics.

Let’s Double Check The Derivation

But if his formula is equivalent, then why is the approximation error 30 times higher? Deriving the formula ourselves, we get:

\text{Let } W \equiv \dfrac{1 - \frac{MG}{ZK}}{2}

t=\dfrac{MV}{ZK \left( W+\sqrt{W^2 + \dfrac{V}{2Z}} \right) }

Which is identical to Jim’s code, except … he’s missing the 2 in the denominator inside the square root! It was probably a simple error, either when deriving the formula or typing it into the computer. After all, the computer algebra system MACSYMA had only started a year before, and wouldn’t be available at a high school, so he would have had to do everything using pencil and paper.

With this bug, he consistently underestimates the time to the lowest point. He compensates for this two ways: adding 0.05 sec, and then re-estimating from his new, closer location. And this explains why it misses the time of landing: the first estimate is while the lander is above the surface and still descending, then the second one is after reaching the bottom and ascending again, which takes less than 0.05 sec.

If we add the missing factor of two and remove the 0.05, what happens? Now the best we can do with a suicide burn is:

Our velocity is down to 1.66 MPH, almost three quarters of the way to the perfect landing at 1 MPH. It’s not perfect because we’re still only using the first two terms of the Taylor series. Also, once you’ve decided your lowest point is under the surface, you then need to find the time where you first hit the surface, which involves a similar approximation. Another iteration would help, although with the bug fixed we overestimate the time, so we’d need to go back in time, which might mean we have to pick the other solution to the quadratic. You could simplify that by using only a single term from the Taylor series, and is what’s done in Newton’s method. You could then stop when the magnitude of the velocity is below some threshold, and use the altitude there to decide whether or not you landed. But this is all more work, and the game was fun to play as it is, so why complicate the code?

It’s also possible to land gently, you just need to end your 14th turn with a low altitude and velocity, and then use low thrust in your 15th turn, landing somewhere after 150 seconds. It’s just the theoretical full-thrust-on-landing suicide burn, that takes around 148 seconds, that eludes us.

CAPCOM We’re Go For Powered Descent

Overall, this is very impressive work for an 18 year hold high school student in 1969 on a PDP-8. This was long before Computer Science was taught in high school. The numerical computing aspects, such as iteratively improving the estimate using Newton’s method or worrying about catastrophic cancellation, weren’t well known back then. I didn’t learn them until I was studying for a Ph.D. in robotics.

It is also surprising that, as far as I can tell, this bug has been there for almost 55 years and nobody has noticed it before. That’s probably because, even with the bug, it was a fun game, both difficult and possible to land gently.  The quest to not just win, but find the optimal strategy, can certainly lead to trying to understand small discrepancies. I suspect everybody else was just happy to play the game and have fun.

  1. It needs the Lambert W, the inverse of x e^x ↩︎
  2. For example, ax^3-bx^2+bx+d=0, where the 2nd and 3rd coefficients have the same magnitude but different sign, can be factored in the form a(x-r)^3 + s = 0, which has solution x = \sqrt[3]{r-s/a}. ↩︎
Posted in Uncategorized | 23 Comments

Easy Quantum Entanglement: Simplest Description Of A Classically Impossible Result

Entanglement is perhaps the purest form of the strangeness of quantum mechanics. Bell’s Theorem is perhaps the purest demonstration of this, producing a result that is squarely at odds with classical notions of reality. Below is the simplest description of quantum entanglement and Bell’s theorem that I’ve come across, and I haven’t seen it descried in blog form anywhere, so I thought I’d share. And it doesn’t require any physics knowledge at all. As Feynman may have said, “If you think you understand quantum mechanics, you don’t understand quantum mechanics.” This is my best attempt at explaining why.1

The Experiment

Three people, creatively named 1, 2 and 3, are a few light-minutes from each other and from a central source.

Each person has a measuring device. It has a knob that they can switch between A and B, and a pair of lights, labeled + and -. Once a minute, something from the source is sent to all three people. Of course, it takes several minutes to get to them. Before it gets there, each person independently sets their knob to either A or B at random. When the thing arrives, the device measures it and one of the two lights turns on. The person then records both the knob position (A or B) and which light lit up (+ or -).

How do they set it “at random”? They could just guess a random A or B each time2. Or they could flip a coin, and if it lands heads they pick A. Or they could use a random number generator in a computer. Or they could use a little circuit that switches between A and B once every nanosecond, and stops when they press a button. Nobody can time their presses to within a nanosecond, so the result is effectively random. Or you could use everybody’s favorite random process, and get a radioactive sample that has a 50% chance of decay in, say, every 10 second period. Then choose a 10 second period, and if it decays in that time, set it to A, if not, B. Our crowd source their choices, by having a website where people can vote A vs B and take the one that gets the most votes. Or use a telescope to observe a distant star that flickers randomly, whose light left the star millions or billions of years ago.

And so, once a minute, they make an entry in their log, until they have thousands of entries. Then they bring the entries together to compare them, looking something like this:

TimePerson 1Person 2Person 3
12:00A+B-B-
12:01A+A-B-
12:02B+B+A+
12:03A-B+B-

Then they analyze the results. Each person notices, when the knob is set to A, they get a plus 50% of the time, and therefore minus the other 50%. Same when set to B. And when you compare any two people’s results, say person 1 and person 3, no matter what their knob settings, you all readings equally. That is, whether the two knobs were set to AA, AB, BA or BB, you get ++, +-, -+ and — each 25% of the time.

So far, this isn’t very interesting. The device could be ignoring whatever came from the source, ignoring the knob and just returning some arbitrary result. You don’t need quantum mechanics to explain that. However, when one observer chooses A, and the other two choose B, you notice a pattern: there are either one or three plusses. There are never two pluses, and never all three minuses. So the only observations we see, when one knob is A and the other two are B, are:

And each are equally likely, happening 25% of the time. So when 1 observes A+, and 2 observes B+, then we know that if 3 is set to B the result will be +. Even if 3 is set to A, we know that if it had been set to B, we would have gotten +.

In fact, if person 1 measures A and gets +, then we know that 2 and 3’s B measurements would be the same, either both + or both -. Similarly, if person 2 measures A and gets +, then we also know that 1 and 3’s B measurements would be the same. So, in minutes when persons 1 and 2 both measure A and get +, we know that 2’s B must be the same as 3’s B, which must be the same as 1’s B. So we can conclude that all 3 B’s must be the same, which mean’s 3’s A measurement must be +. Just look at the right hand table above.

So we have a prediction: in the minutes when all three knobs were set to A, and the first two both measured +, then the third one must measure +! So we race to the data and what do we see? The exact opposite! Whenever the first two are A+, A+, and the last one is A, it’s always A-! Every. Single. Time.3

So what’s going on under the hood?

If you think about it, this is quite strange. Are all thee Bs the same when we don’t measure them? If so, then how could the third A be -? If not, then how does person 3’s device know what’s going on with 1 and 2’s devices?

There are a couple possibilities:

  1. Local Realism: The three particles decide, before they leave the source, which light they’ll trigger for both A and B. For example, perhaps the particle going to 1 is in a state where, when the knob is A it gives +, and for B gives -. We can write that as A+B-. Same for the other two. The problem is, our observations above have ruled that out.
  2. Superdeterminism: Maybe the source somehow knew how the knobs would be set, even though they weren’t set until several minutes after the particles left. Perhaps the source knew some previous state of the three people and their devices, and could 100% accurately simulate everything, including the psychology of a person who chooses A vs B, the biology and physics of a coin flip, the psychology of the crowd during the crowd sourcing, and the light that left the star millions of years ago and hasn’t reached the source yet. Or perhaps the source is deterministic too, and the entire universe has been scripted down to the tiniest detail, including what the knobs will be set to and where the particles will end up. And it’s not just deterministic, but coordinated, so that the particle that leaves the source is somehow in just the right state for whatever the switch will be set to. Seems unlikely.
  3. Spooky action at a distance: Perhaps, when the first person makes a measurement, some influence travels instantly to the other two particles, changing their state. One problem with this is, thanks to Einstein’s special theory of relativity, some other observer would see the influence happen backwards in time. In other words, the properties of the other two would be set before the measurement, in some reference frames.
  4. It Simply Is: Quantum Mechanics rejects all three of the above. The particles aren’t in any definite state, of either A or B, until they are measured. The act of measurement, including which property A or B is being measured, affects the system as a whole.

So faster than light communication then?

Unfortunately no. We can’t use this to communicate information. If person 1 measures along A and gets +, that means 2 and 3 will get the same along B. If person 1 gets -, they’ll get the opposite. But even if 2 and 3 move their devices next to each other and compare notes in real time, they can’t tell what setting person 1’s knob was set to. In fact, even whether or not person 1 makes a measurement can’t be determined by just looking at person 2 or 3. And this isn’t just in our experiment, it’s a general theorem of quantum mechanics, called the No-communication theorem.

Digging Deeper

The strangeness of quantum mechanics comes from combining three things:

  1. That states can be modeled with probability amplitudes, complex numbers. When a particle can be in multiple states at once, called superposition, we add the probability amplitudes, resulting in interference. The Born rule states that the probability of observing the particle in a given state is the square of the magnitude of the probability amplitude. A great demonstration of this is the double slit experiment with electrons.
  2. Complementarity, which includes and Heisenberg’s Uncertainty principal. Mathematically, that some pairs of observables fail to commute.
  3. Entanglement. Superposition where each state in the superposition is a multi-dimensional state, so that the state of one dimension depends on the state of another dimension.

Footnotes

  1. It’s a slight simplification of Sidney Coleman’s description in Quantum Mechanics In Your Face (and the transcript). I’m very grateful to Mark Weitzman for pointing me to it, and in general for all his help in answering questions during my journey so far in quantum mechanics. ↩︎
  2. People are actually bad at producing random sequences this way. For example, people tend to alternate choices more often than the 50% of time you’d get if each were truly random. But for our purposes we don’t need it to be completely random. If you don’t like the idea of each person trying to choose randomly, just ignore that and focus on one of the other methods. ↩︎
  3. For those who do know quantum mechanics, the source is sending three spin one-half particles in the state | \uparrow \uparrow \uparrow \rangle_z  - | \downarrow \downarrow \downarrow \rangle_z and A and B correspond to \hat{\sigma}_x and \hat{\sigma}_y. ↩︎
Posted in Uncategorized | Leave a comment

Deciphering my Dead Mother’s Cipher

My mom passed away in 2020, and in cleaning out her house I found her old diaries. Seventy five years ago, in 1948 at the age of 15, she wrote:

Ah, a puzzle. I like a good puzzle.

Some quick googling revealed it was a pigpen cipher. That’s where you take a 3×3 grid, like a Tic Tac Toe board, put a letter in each square, then use the square or “pen” that a letter is in to represent that letter. Of course, that only works for 9 letters. So you do it again, this time putting a dot in each square. That leaves 8 letters, so now you form an X, put a letter in each “pen” of the X, with or with out a dot.

And using that, I quickly decoded the first name, and got … JKLL LXNPNKON. Fail. She must be using a different way to map letters to pig pens. But what is it?

Well, in earlier entries she had crushes on hockey players and politicians. In fact, at age 14 she wrote “I like to imagine I am P.M. [Prime Minister] torn between 4 loves – 1. Cabinet Minister, 2. brother of first husband, 3. baseball big shot, + 4. French diplomat!” Two people in particular were Ontario Premier George Drew and Toronto Maple Leafs star Bill Ezinicki. And Bill fits the first cipher! Well, mostly. It has the right number of letters, the two Ls use the same symbol, and all three Is in Ezinicki use the same symbol. But the I in Bill uses a different symbol. Still it all fits too well to be a coincidence.

Another oddity is that every symbol has a dot. A standard pigpen cipher has 13 different shapes, then repeats those again with a dot inside to give 26 symbols. But if you only use the ones with the dot, you only have 13 symbols, so each symbol must represent two letters. You can see the L in Bill uses the exact same symbol as the E in Ezinicki.

I is the 9th letter, the last letter in the first pass when filling up the grid. What pattern means the center is filled last? A spiral inwards!

Well … that doesn’t work for B, E or C, the letters in Bill Ezinicki that are in the first nine of the alphabet. Back to square one.

So let’s look at the second name. Who’s wife would Ezinicki be running around with? It’s a dream so it could be anybody, but a reasonable guess is another team mate. The roster for the 1948 Toronto Maple Leafs shows nobody with a nine letter last name. Curses, foiled again.

With only 13 symbols, each symbol encodes multiple letters. So, just because the same symbol appears in two different locations doesn’t mean they’re the same letter. Think of the L and E in Bill Ezinicki. But the opposite is true, we hope: if two different positions use different symbols, they have to be different letters. So in that nine letter last name, the last seven letters all have to be different. And the first two letters have to be different than any of those last 7, save the seventh letter.

So let’s look at a list of common last names. The U.S. Census has a list of surnames occurring more than 100 times anywhere in the United States. It has 151,671 names. If you restrict to nine letter names, that leaves 14,424, which is still a lot. The most common is Rodriguez. And while the last seven letters of that name are all different, the first letter is the same as the fourth, namely R, but they use different symbols.

Making sure different symbols represent different letters leaves 2,160 names, the most common of which is Dominguez. There’s no pattern that I can see in the most common names.

So we need more information. The last letter uses the same symbol as the I in Ezinicki. Of course, it doesn’t have to represent an I, but what if it did? Here are the 10 most common surnames:

GRABOWSKI
SZYMANSKI
JABLONSKI
CZARNECKI
DABROWSKI
STEFANSKI
ZEBROWSKI
GODLEWSKI
TARNOWSKI
CANGELOSI

Now we’re getting somewhere! If it ends in an I — a big if — the last three letters are probably SKI. However, there’s no real pattern in the earlier letters. So once again, we need to look at the problem differently.

Let’s try putting the known letters in pens, and see if we can see any patterns. We won’t add the SKI since that depends on an assumption, but we’ll do it for all of Bill Ezinicki:

Well, the B is in the upper left, the same spot as A in the standard encoding. And it seems most natural for a pattern to start in the upper left with A. If A & B are the two letters mapped to the upper left, that suggests C & D are next, in the upper middle, then E & F in the upper right, and so on.

And that mostly works! The I and L in Bill are in the wrong place, although interestingly, they’re just missing a top line in their symbol. But all the other letters fit. If we use this with the second name, we get:

W A K K Y | S S A M O W S K I
X B L L Z | T T B N P X T L J

Perhaps you can see it by just looking at the above, but if not, some simple filtering of the surname list gives us: Stanowski. Wally Stanowski played for the Leafs the season before, 1947 – 1948.

Also, as I was researching this post, I found a variant of the pigpen cipher where the location of the dot distinguishes the two letters: left for the first, right for the second. And if you look at, say, the ST at the start of Stanowski, you’ll see the first dot on the left, second on right.

It’s fun exploring my Mom’s old diaries, which span over 30 years of her life, up to when I was 9 years old. She never used a cipher again, but I’m glad I got to figure out the dreams of a 15 year old girl in 1948. I’m sure she never pictured her son using a computer to crack it. The existence of the first computer, ENIAC, was only revealed two years earlier in 1946. And it would be over four more decades until the World Wide Web was opened to the public, and another decade until Wikipedia and search engines made it possible to find old Maple Leafs rosters and learn about Pigpen ciphers.

Posted in Uncategorized | Leave a comment

My Favourite Interview Question To Assess Design

When interviewing higher level software engineers, it’s good to assess not just ability to write code, but also to design systems. To accomplish that, I used to ask people to design something, such as an elevator control system.  However, that leaves some gaps.  For example, if you ask them to design a distributed storage system, they may decide to use a two phase commit.  But do they remember why?  How do simpler schemes fail?  Also, when designing with a group of engineers, are they able to give useful feedback to other people?

So instead, I give them naive designs, and ask them to list the designs’ pros and cons of each.  For example:

I want you to help the team design a distributed storage system.  I’ll play the role of the team, suggesting designs, and I’d like you to give feedback on the pros and cons of the designs.

Let’s design a distributed file system. Companies like Facebook and Google don’t use centralized storage, instead they have lots of computers each with commodity disks, and if one fails, they can continue serving requests without losing data.  They have an API that makes the disks look like a single block device with a single, unified address space.

So let’s say the team comes to you with this design:

def write(int address, byte data[4096]):
     Store the data on the computer where the
     write takes place.

def read(int address):
    Lookup the address on the computer where
    the read was issued.
    If we find it, return the data.
    Otherwise, broadcast a message to all
    computers, "please send me the data for
    this address".
    If any computer has it, return that data.
    If no computers have it, return "not found."

 

I then draw a diagram with four computers, and work through an example where one of the computers writes a block, then another computer reads it.

It’s interesting to see what people list as pros and cons. Many talk about the high cost of broadcasting to all nodes. Few catch the potential data inconsistency. The simplicity is a pro, but lack of redundancy is a con.

When they point out that there’s no redundancy, I suggest writing to two machines independently. If both succeed, then we return “success,” otherwise, we return “fail.” Can they spot the problems? Can they point out pros like that disks will be fill up evenly, or will they focus only on cons?

I find that asking people to critique designs gives me a better idea of how they understand a system. Do they just throw around buzz words and boxes, or can they think through the details? Do they have a sense for where race conditions might hide? (Hint: multiple computers touching the same data; things finishing in a non-intuitive order; nodes failing.) Can they actually work through those examples to find the problem? Can they understand that every design has pluses and minuses, or only focus on the parts they don’t like?

And I can also use it as a background for asking: Do you facilitate discussions and encourage team members to come up with their ideas? Or do you see generating designs and architectures as your job alone?

Posted in Uncategorized | Leave a comment

A Few Things I’ve Learned About Agile

Many years ago I worked on code that integrated tightly with 3rd party software. The APIs were not always documented, and when they were, the documentation sometimes had omissions or was just plain wrong. In a situation like this, what to do?

Management’s idea was to involve experts, people who were already familiar with the 3rd party software, in fact some had worked for the company that made it had a hand in creating it. However, the software was vast and nobody could understand more than a fraction of it. Once again, QA could only test the basics during development, thorough testing would have to wait until just before release. And that’s when we’d discover that our current design wouldn’t work and would need significant revision.

Lesson learned: When there are lots of things you can’t know, iterative development is the key to validating designs and getting a predictable schedule as soon as possible. It’s more important to get something working and testable first, then refine it, than it is to build it “production quality” from the start.

At another job, the team would divide up each project into independent components, and have each person work on a different component. We’d intend to do design reviews, and sometimes even do them. Then, after a few weeks, when it was time for a code review, someone would often point out a simpler way to achieve the same goal, or where more defensive programming was needed throughout the work. Only now it was a lot more work to change it. In other words, each part is exposed to the entire genius and foibles of the individual programmer who worked on it.

So we kept saying we needed more reviews and tried various things to encourage that. The problem is, when developers had a choice of working on their own code or going over someone else’s, they usually chose their own. That’s the task that was on the backlog that everybody else was tracking. Interrupting a co-worker to say “can you show me what’s changed since the last time I stopped by?” felt unusual.

The one thing that worked — and worked well — was for people to trade off tasks that lasted around a day. That way, I need to understand the code you wrote yesterday since I’m modifying it today. Plus, your change probably touched code that I’ve worked on and have opinions on. And discussing those opinions is exactly what we needed.

Lesson learned: It’s important to have multiple developers thinking about the details, and it’s really hard to thoroughly consider the details of something you’re not working on.  So explicitly resist developers urge to work on independent parts of a system.  Instead, assign people interdependent tasks that each take around a day.

 

None of this replaces good coding practices such as always checking return values (even if with just an assert), automated tests, readable code, etc. In fact, it’s designed to encourage those, to encourage people to share such best practices early and often.

Posted in Uncategorized | Leave a comment

Higher order functions are (sometimes) more natural

I had a debate with some friends at work the other day, about whether the higher-order function map just adds needless complexity to a language. They considered it a kind of “advanced” function, which surprised me. They said that by not having a variable that holds the current item, it was too concise and cryptic. So I did an experiment.

Daddy: I have a number i, it starts at 1, increases each time and stops and three, and for each i, take Oreo i and eat it.

5 year old: Oreo i? Oreos?

He walked away puzzled. A few days later, I tried again:

Daddy: For each Oreo o, eat o.

5 year old: ??

Again he just walked away without understanding what I’d said.

Daddy: Eat the Oreos.

5 year old: Sure! Yay!

In natural language, at least, we often don’t introduce a term to stand for the thing we’re talking about. Introducing it can just make your expression round about and clunky, giving you lots of noun and verb phrases to decode. Strunk’s Elements of Style says “Vigorous writing is concise.” The same can apply in programming.

Posted in Uncategorized | Leave a comment

Writing To A Binary Stream In C/C++: Harder Than It Should Be

Suppose you want to write some code to communicate using a binary protocol. You would think C (or C++) was a natural language for this. After all, low level data manipulation is one of it’s strengths. But it’s surprisingly hard to get this right — and surprisingly easy to write code that works now, but might stop working in the future.

Here’s a simplified version of the header:

struct request_header {
    uint8_t magic;  // Always 0x80
    uint8_t opcode; // e.g. get, set, add, etc.
    uint16_t key_length;
    uint32_t body_length;
};

How do you format a packet to send? Here’s the most straightforward way:

void write_get_header(uint8_t *buffer, uint16_t key_length)
{
    request_header *header = reinterpret_cast(buffer);
    header->magic = 0x80;
    header->opcode = 0x00; // GET
    header->key_length = key_length;
    header->body_length = key_length;
}

This is exactly what memcached itself does. The problem is, it’s not guaranteed to do what we want, because it uses undefined behavior. The reinterpret_cast is valid (if buffer happens to be aligned to the alignment of request_header, more on this later), but dereferencing the pointer leads to undefined behavior. Note that casting the other direction — casting an existing request_header to uint8_t * — lets you dereference, which is handy when you’re reading from the wire. But when formatting data to send, you’re out of luck.

On x86 with common compilers (g++, clang++, I assume MSVC) this compiles, runs, and does the right thing. So what’s the problem? If you change optimization settings, or upgrade you compiler, it could stop working, because according to the C++ language spec, the compiler is allowed to generate code that does just about anything, including going into an infinite loop or reformatting your hard drive. And this stuff can actually happen. There are contests for Craziest Compiler Output due to Undefined Behavior, some of the source files look quite reasonable, yet don’t do what you expect. Here’s a good introduction, and What Every C Programmer Should Know About Undefined Behavior.

If the compiler won’t warn you, even with -Wall -Wextra, and the code runs just fine, how can you detect undefined behavior? In general there’s no reliable way. Valgrind finds some of it, but because it only works on the compiled output, it won’t find this. Clang has an “undefined behavior sanitizer,” which doesn’t catch the reinterpret_cast, although it does catch the unaligned writes (see below).

But low level data manipulation like this are strengths for C and C++. So there must be a standard idiom for this, right?

The memcached server itself has this problem when it prepares to send replies. It uses the “union” of the struct with a char array of the same size:

union {
    struct {
        uint8_t magic;  // Always 0x80
        uint8_t opcode; // e.g. get, set, add, etc.
        uint16_t key_length;
        uint32_t body_length;
    } request;
    uint8_t bytes[8];
} request_header;

The idea is to take your uint8_t *, cast it to a request_header *, then write to it through the field. That’s supported in C99, but unfortunately not in C++.

And even in C, it’s only supported with caveats. The struct can have padding, and the amount of padding is up to the compiler and the target architecture.  This doesn’t lead to undefined behavior, but it does lead to incorrect behavior: putting data in the wrong place.  If you restrict yourself to processors with byte addressing and power-of-two word size you’re probably ok, because of the way the fields are laid out: the total size of all fields before the uint16_t, assuming no padding, are a multiple of 16 bits, and before the uint32_t, 32 bits.

To tell the compiler you don’t want padding, you can use #pragma pack But now you’re not using ISO standard C, meaning your code isn’t portable to other architectures. And if #pragma pack actually changes the layout of your struct, accesses to it are unaligned, pretty much by definition. What’s the problem with that?

Even on x86_64, uint16_t has an alignment of 2, which means it’s address needs to be even. If we start our header at an odd address, because the previous request had an odd-lengthed key, we’ll be writing through an odd address which is — you guessed it — undefined behavior, and on some popular architectures can be hundreds of times slower, or can simply seg fault. And even on x86, a compiler is free to assume that the lower order bit of the pointer is zero and store other data there if it likes.

In practice, on x86 such writes turn into movl assembly instructions, which will do the right thing with un-aligned addresses. But again, we’d rather not have to rely in intimate knowledge of the compiler and processor. We’d like to say what we mean in valid C/C++, so the compiler can exploit all that. C/C++ is designed for low level stuff like creating network messages, right? So what’s the right idiom?

The official way is to use memcpy() and trust the compiler to optimize it away:

void write_get_header_with_memcpy(uint8_t *buffer, uint16_t key_length)
{
    uint8_t magic = 0x80;
    memcpy(buffer, &magic, sizeof(uint8_t));  // Not needed for uint8_t, but I like the consistency.

    uint8_t opcode = 0x00;
    memcpy(buffer+1, &opcode, sizeof(uint8_t));  // Not needed for uint8_t, but I like the consistency.

    uint16_t key_length_network_order = htons(key_length);
    memcpy(buffer+2, &key_length_network_order, sizeof(uint16_t));

    uint32_t body_length_network_order = htonl(key_length);
    memcpy(buffer+4, &body_length_network_order, sizeof(uint32_t));
}

Because memcpy needs an address, we need to create temporaries, so every line turns into two. If we’re not worried about padding problems we can use offsetof(), otherwise we need to compute the offsets ourselves. This is pretty verbose and error prone.

Helper functions can make things a little better, e.g.:

void write_uint16_t(uint8_t *buffer, uint16_t value)
{
    int16_t value_network_order = htons(value);
    memcpy(buffer, &value_network_order, sizeof(uint16_t);
}

And you can introduce constants for the offsets, to make things a little more readable. But it’s still harder to read than the struct version.

You could also write the entire header in a local variable, then memcpy() the entire thing to the buffer, and hope the compiler optimizes out the copy. I tried this, and in simple example code, it worked! But in production code, with argument checks and returning the address of the next byte to write to, neither gcc nor clang were able to optimize away the memcpy().

So in the end, people mostly use the struct, assume there won’t be any extra padding as long as the total size of all fields before a given one is a multiple of the field’s size, with the union (most often, because it works in C) or the reinterpret_cast, and that writing to unalinged memory won’t cause any problems. This is exactly what memcached itself does.

Posted in Uncategorized | 3 Comments

In Defense Of Implicit Code In C++

tl;dr: When C programmers start using RAII in C++, they’re less productive at first because they don’t think of return; as cleaning up and returning, just returning. They blame the language, but they just need to adjust their mental habits a little. The problem isn’t C++ (it’s got lots of other problems), it’s just that they’re trying to write C in another language. Then you can keep the resource handling code separate from your main logic, you’ll get it correct a lot sooner, etc.

At work we’re having a debate on what language we should use to implement our software, C or C++. There has been a lot written about why C++ sucks, and it really is a bloated language with a lot of traps for programmers at all levels of experience. But I believe that when it comes to writing good, efficient, system-level code, using the right set of C++ features can make your code better. And Resource Acquisition Is Initialization is one of those.

In RAII, the compiler inserts implicit calls to destructors, so you don’t need to remember them, and therefore can’t forget them. But this can lead to confusion, because now it’s harder to know what a particular line of code does. People with a C background look at a statement like “return;” and think “that returns from the function,” not “it cleans up and returns.”  And until they change that mental habit, RAII is murky, confusing and full of gotchas.

Like many things in life, the implicit thing has advantages and disadvantages. Being explicit has the benefit that every line of code is “self contained.” To know what it does, you only need to look at that line of code. At a previous company, we had some horrible implicit code, where a[n] = b[n] created all sorts of temporaries and did all sort of magic under the hood, because operator[] and operator= were overloaded. That style of code was promoted by those who were quick to perceive the benefits of abstraction, but slow to realize its costs. To understand the performance of that one line — where a and b were rows of a matrix — you had to find and understand 5 different classes. That was just too much for most people, and so a lot of sloppy code was written, which we spent months trying to understand and speed up.

So there’s a cost to implicit, but what problems does it let you avoid? With explicit resource management, there’s a convention that when a function allocates some resource, then later encounters an error, it should free the resource before returning. With C, you can’t tell the compiler that explicitly, so you have to do the work of putting in the free() call in all error paths. This means you might forget to put it somewhere, or if someone adds a new error handler they might forget to call free(), or if they reorder the code they might overlook a call to free() they should have added, etc. So while explicit makes it easy to see what a line of code does, it doesn’t make it easy to see if there’s anything missing.

It also means the code that implements separate concerns are mixed together, making it harder to get any one of them correct. The logic for allocating and freeing is mixed in with the main work of the function, as well as error checking. If you’re looking at the code and thinking about the typical, non-error case, it’s easy to overlook a problem with error detection or resource management. So when I’m writing or reading the code, I find it hard to understand all aspects of the code in a single pass. Instead, after I’ve read the code once, I need to do a separate pass of “now did they remember to call free() everywhere?” With practice, you can keep a stack in your head of all the things allocated up to this line of code, and a mental checklist of all the error conditions you might want to check. But that’s a set of mental habits that we need to develop, and before we develop them, we get a lot of leaks and missed error checking.

An alternative is to specify all the steps in one place, e.g. a class definition, then have the compiler insert them for you. That’s what happens with std::unqiue_ptr, or a class to lock and free a Mutex. So it means that when variables go out of scope, things happen that aren’t explicitly listed in that line of code, so return ; can perform arbitrary computation, as can }. That means you need to change the way you look at such statements, because new code conventions develop. But just because those code conventions are new, and different from what you’re used to, doesn’t mean they’re bad. Any tool takes a little getting used to, and until you do, there will be confusion and gotchas. I’d say the mental habits of the “implicit” approach are different — and easier for people to learn. Which is why I like the implicit approach in moderation.

So it’s true you have to learn something about the order of constructors and destructors to get std::unique_ptr right. But it’s actually pretty easy, because it’s designed to do the obvious thing as much as possible.

Posted in Brain Rental | 1 Comment

Finding A Job You’ll Love: Negotiating An Offer

So you did well in an interview and they’ve made you an offer. And now you just have to decide which offer to accept. Right?

Well, not quite. You can always go back to your top choice and ask for more. And now, when they’ve decided that they want you but before you’ve accepted, is the best time. Once you’re employed, you’ll have very little leverage. So, even though your value to the company goes up dramatically in the first couple years, as you learn more of the details of their code and systems, you won’t get much of a raise.

So you’d better negotiate now. Most companies will ask how much you’re currently making and offer you a small increment on top of that. They’ll even ask you what your “salary expectations” are, meaning they’ll ask you for the smallest increment that you’ll accept! I’ve heard it said that you should respond “I hope to be paid fairly” and otherwise refuse to give them a number. Maybe you can explain to them that you feel the process is unfair, and that you won’t give them a number any more than they would if you asked “What’s the most you’re willing to pay me?”

I’ve never done that, because I’ve been worried that I would come across as difficult to work with, or just generally being obstructive. But as much as possible, I try to follow the principals in Getting to Yes. You should all go out and read that now: You’re guaranteed to have to negotiate something at some point in your life, such as buying a house or a used car. The short version is:

  • Emphasize objective criteria. Look for salary surveys on the web. The tricky part is finding a job description that accurately describes your job, but you should be able to get a range of possibilities. Here’s a tip: most organizations believe they do a thorough job of interviewing people, and thus end up with people who are above average. So once you’ve found the salary range, you should be able to say “I think my background and skills match this position more than most of these people, so I think a fair salary is somewhere in the upper end.”

    Emphasize that you want to be paid fairly, not that you’re on the fence about going there. It’s not about “would I still work for you if you didn’t increase the offer,” it’s “I want to be paid what I’m worth.” In a market based economy, “what I’m worth” is more or less what the market is willing to bear, so…

    Get an offer from another company. This is part of why you can’t just let a recruiter do the whole search for you, they’ll only get an offer from a single company. When another company makes you an offer that’s a little more than what you’re making now, go back to the first company and say “If you offer me a little more than that, I’ll be happy to work for you.”
     

  • Invent options for mutual gain. Salary is the big one that everyone focuses on, but its often hard to get approval for more than a small increase in salary. Look at the benefits that other companies are offering. How many holidays, vacation days and sick days? Will they pay for parking or a subway pass? Do they have a 401(k) match? If you’ll be taking a hit on any of those, compared with either your current job or other companies you’re applying to, bring it up. If the company can’t address them directly (e.g. they can’t institute a 401(k) matching program just for you), you can use them to argue for greater salary.
     
  • And with that, and the info in the rest of the “Finding A Job You’ll Love” series, I hope you’ll be a little better and finding and landing that great job. Good luck, and please let me know if you found it useful!

Posted in Brain Rental | 6 Comments

Finding a Job You’ll Love: The Interview

It’s best if you can do some prep for the interview. Try to understand what the company’s product is. It’s amazing how cryptic a web site can be, especially if it’s enterprise software in some industry you don’t know. Still, make an attempt, and ask about anything that’s confusing. The fact that you’ve understood some basic things about their product, and that you want to understand more, will set you apart. Of course, it will also help you understand what you’d be doing if you worked there.

Then you should try to read up on the sorts of questions people like to ask. For some reason, people ask algorithms questions far out of proportion to how often they come up in an actual job. Pick up a copy of The Algorithm Design Manual
and read the first section (chapters 1-10) during lunch or in the evening. Even if you only get half way through, you’ll be better prepared than most.

The interview is lopsided: you’ll spend almost all your time answering their questions and only have a little time to ask yours. That’s ok. If you’re still uncertain after the reverse phone screen and the in-person interview, you can always ask to talk to someone on the phone another day.

Keep in mind that they’re trying to figure out whether to hire you. So while it’s good to mention how great this job would be for you, you want to emphasize how great you would be at this job. They don’t really care that you’ve always wanted to learn Lisp and this job would let you do so. They care that you can design & implement software, quickly understand complexity, etc. With that in mind, ask yourself “what are they really trying to get at with this question?” For example, if they ask about a particular job on your resume, feel free to say that another job is probably more relevant and ask them if they’d rather hear about that.

Generally you’ll meet 5-6 people for an hour each, and there are two great questions you should ask everyone. After the initial small talk, ask “What does someone need to be really successful at this position?” Not only will many interviewers think this is a good question for a candidate to ask, but they’ll tell you what they’re looking for. For the rest of the hour, when answering questions, try to answer with stuff related to what they’re looking for. Also, write down the answer. You’ll want to refer to it much later when you’re trying to decide between the different places you interviewed at.

Save the second question for near the end of the hour. “What is the best and worst thing about working at X?” Again, you should write down the answer, because you’ll have 5 or 6 of them from every company you interview at. Both the best and worst thing are important to consider.

Beyond that, here are some good questions to ask:

  • What are the day-to-day tasks I’d have?
     
  • What day-to-day tasks do you typically do? What have you been doing over the past month?

For a manager:
These were covered in the Reverse Phone Screen post.

For the tech lead:

  • How is scheduling done? How are estimates and deadlines created?
     
  • How are tasks divided up among programmers?
     
  • Who does the overall design/architecture?
     
  • Who does the design/architecture of individual contributor’s tasks?
     
  • How much time do leads spend with their programmers?
     
  • Do you do code reviews? How are they done?
     
  • How much do you look over people’s check ins?
     

Good luck, and try to be relaxed and friendly. But most of all, focus on the technical puzzles they give you. If you got into programming because you enjoy that kind of puzzle solving, hopefully the puzzles will at least be interesting.

Posted in Brain Rental | 1 Comment