Logs of Our Fathers
At USENIX in Anaheim, back in 2005, George Dyson treated us to a fantastic keynote speech about the early history of computing. You can catch a videotaped reprise of it here, on the TED site. I highly recommend it - there's lots of interesting and quirky stuff. I managed to talk him into giving me a copy of his powerpoint file, and subsequently tracked him down and am re-posting this material with his permission.
This is the original syslog. It's one of those thread-bound kids' class notebooks, and it contains the run-time log of John Von Neumann's programmable computer from Princeton's Institute for Advanced Study. For those of us who are used to dealing with system log data, it's not a whole lot different from what we're dealing with today.
Von Neumann's system logs were, like today's logs, time-ordered and free-form. Date/time on the left hand side, then details. Then, as today, there are no "standards" in the reporting format. It's just information. When I think of logging standards like RFC 3195 I now wonder "how many notebooks would that take, with all the XML and useless overhead?"
The log message at the bottom of the sheet above reads "Machine ran fine, code isn't. Difficulty indescribable." There are a couple things here that are interesting. To this day, we refer to programs as "code" - if you've ever wondered where the term comes from, now you know. It is, literally, from the dawn of computing. Another thing this log entry shows is what may be the first case of finger-pointing between the hardware guys and the software guys. I think that every coder - at some point or other - hits a subtle problem they can't understand, and says "the compiler must be screwy!" or that there's a hardware problem. Here we see the origin of the tradition.
Nowadays we can be pretty confident that, when we tell our computers to work on huge numbers, that they will mostly do it. There are stories, of course, of flaws in hardware. I remember back in the early 90s when there was a bug in the Intel -486 processor's math unit that made it sometimes give wrong answers. One of our old-school hardware engineers, when I worked at DEC, told me a funny story about a VAX system that had a floating point math accellerator card; the card wasn't seated in the backplane correctly and, as it slowly fell out from fan vibration, the least significant bits of the return values turned to zeroes because the contacts were no longer touching.
In Von Neumann's day, computers' competition was mechanical calculators - things like the Marchant Calculator that Richard Feynman tells stories about in his wonderful "Los Alamos From Below" talk - machines that were fairly slow. Some of the calculations Feynman's team completed for the Manhattan Project (specifically problems regarding expected bomb yield) sometimes took days to complete, with teams of calculator-grinders passing stacks of results from one station to another. Feynman used the word "program" to describe the order in which the sub-calculations of a larger calculation were performed.
How do you make sure the machine is right? Nowadays, we take it for granted thanks to the idea of regression testing: if you run a calculation on the same machine 5 times, you should get the same results - and if you run the same calculation on a different machine, you should still get the same results. With modern hardware, engineers can let the regression test run overnight and cross-test billions of different problems. With Von Neumann's machine, it was a different situation: there was only one computer and if it didn't give the same answer from run to run, it might take a week for a human to figure out what the right answer actually was. The log entry above: "this now is the 3rd different output - I know when I am licked" is some poor operator who had run the same program three times - and gotten three different answers. Ouch!
Our poor operator from the previous slide is now trying to regression-test a result in which he got two different results. "Since there is no way to verify against PO#1 without doing it over again, I will invest that much time." ("PO" means "Program Output") How do you resolve that? You run it again and see which result you start getting over and over. Unfortunately, he got two different results repeatedly.
I've done syslog analysis of one sort or another since the late 1980s, and there have been a few times when I've found log entries that reek of desperation. This one tops them all. The operator must have been one truly frustrated human being.
Nowadays, overclockers have it easy. You can buy a motherboard that comes with software drivers that allow you to fiddle with the CPU clock speed, or memory bus speed, etc. In Von Neumann's lab, the "clock" was a mechanical device and the speed of calculation depended on physical properties of the machine: go too fast and contacts might not touch correctly, or a belt might fly off. One way of regression testing the system was to do a calculation with the clock speed on "low" and see if it gave the same results.
Bugs and Mice
Grace Hopper's "bug" was a moth that got caught in the mechanisms of a Mark II computing machine in 1947. Von Neumann's lab had a "mouse." "Mouse climbed into the blower behind regulator rack, set blower to vibrating: result no more mouse and a !!! of a racket." This shows you the importance of timing on terminology - a mere 6 years' difference and Hawley and Englebart might have been making "bugs" for desktop input, while programmers were trying to thrash all the "mice" out of their code. Actually, the term "bug" is much older than Grace Hopper's usage - she was just joking. Thomas Edison was known to use "bug" in the 1870s; it may have been a term used by telegraphers.
Can you imagine finding a syslog message like:
kernel: /dev/rst0a emitting tar-like substance
In case you can't read it, it says: "IBM machine putting a tar-like substance on cards."
I think "The smell of burning v-belts is in the air" speaks for itself; it certainly evokes a ruined afternoon.
When I was a newly-minted programmer/system administrator at Gould in 1987, we suffered an air conditioning failure on a friday night and I came in early on monday to discover my computer room was about 115 degrees F inside. The power supply on our big Gould PowerNode had wisely shut itself off, but one of our Sun-2 fileservers baked to death. After that, I got an air conditioning thermostat and wired it up across an RS-232 loopback cable, with a process that would look for that serial port coming ready and sending warnings into syslog. I'm glad I didn't have to figure out how to monitor a V-belt, though.
The Hell With It
I can't think of any applications that leave syslog messages like this one.
I have tremendous sympathy for the poor operators of the early machines. Someone might make a programming error that had you spending an hour running around checking for current levels in your system.
State of the Art SIM
Circa 1950, apparently, the way logs were aggregated was - well - they stack fairly neatly.
Von Neumann's computer ran until midnight, July 15, 1958. This was also reflected in the log: a final scrawled signature by the operator.
What's the point of all this? I suppose it could best be summarized by the French expression "Plus ca change, plus c'est la meme chose" - the more things change, the more they stay the same. Von Neumann's computer had a tiny fraction of the power and memory of the desktop supercomputers we're used to, nowadays. Our available storage is orders of magnitude greater - but the syslogs? There's really not much difference, is there?
Special thanks to The Archives of The Institue for Advanced Study, and George Dyson.