Someone recently replied to me with comments by Dave Cutler, designer of the kernels for Windows NT/XP(...) and (Open)VMS:
“I have a couple of sayings that are pertinent. The first is: ‘Successful people do what unsuccessful people won’t.’ The second is: ‘If you don’t put them [bugs] in, you don’t have to take them out.’ I am a person that wants to do the work. I don’t want to just think about it and let someone else do it. When presented with a programming problem, I formulate an appropriate solution and then proceed to write the code. While writing the code, I continually mentally execute the code in my head in an attempt to flush out any bugs. I am a great believer in incremental implementation, where a piece of the solution is done, verified to work properly, and then move on to the next piece. In my case, this leads to faster implementation with fewer bugs. Quality is my No. 1 constraint – always. I don’t want to produce any code that has bugs – none.
“So my advice is be a thinker doer,” Cutler concluded. “Focus on the problem to solve and give it your undivided attention and effort. Produce high-quality work that is secure.”
Let me give some senior developer feedback on these statements.
Developing kernels is a highly technical job. I can imagine that you have to think hard before writing any code. Once you figure out a proper solution, you can invest some time into writing high-quality code. But remark that no code is written bug-free the first time!
Now contrast this with a consumer facing application. The biggest challenge there is not stability or technical structure. The biggest challenge is to provide a useful, user friendly application. You cannot just think this up in your head and hope all is fine. The "no plan survives contact with the enemy" very much applies here. There is no point in investing into code stability when that code has a high chance of getting thrown away. Get the "useful and user friendly" figured out very fast, using an iterative approach. And only then start focussing on stability and all the rest. Plus, sometimes it makes more sense business wise to shift focus more towards features instead of stability.
Anyway, each project is different. So when you see advice, make sure you first check if it applies in your case or not.
> ... when you see advice, make sure you first check if it applies in your case or not.
I wish everyone understood this. There are four things you need to do to successfully apply advice:
1. Discover it (lots of ways to do this, even more now in the age of the internet).
2. Determine if it is good or not (you can rely on the wisdom of the crowds, other people's judgement based on qualifications, books, references, etc).
3. Determine if it applies to you (you have to make this call, requires thinking about the context of the advice giver and your situation).
4. Actually apply the advice. This requires making the changes suggested and is intrinsically individual.
Step 1 happens pretty naturally, and we're well set up for step 2 (though of course people get fleeced by snake oil salesmen all the time). Step 4 is intuitive.
Correctness and pleasantness are entirely distinct. In most cases incorrectness has little or no effect on commercial success whereas unpleasantness will sink a product instantly.
There are some areas where incorrectness is also highly unpleasant, kernels and compilers being two examples.
This is correct, yet dangerous to say this without mentioning a timeline. I would say do anything (that works) to get customers to pay. Then do everything (good enough) to get them to stay.
Hmm. There's definitely things to be learned picking up things that other people won't touch, but I would also qualify it more:
"Successful people choose their battles"
You can spend your entire career fixing other peoples' crappy bugs because nobody else can be bothered. That's not really going to make you successful, it's just going to make you a good bug fixer.
Choose problems that other people don't want to touch because they're hard or challenging, or because they're a core, valuable part of something.
That's what's brought me succeess ;) (and headaches)
The problem of advice like these is that the "modern" Scrum Master would demolish this approach in favor of a predictable pace with shallow depth. DC could only become successful in environments without any Agile/PMP certified PM.
I find that at Staff+ level you do get the ability to do these things again. Generally you are not beholden to the PM’s and sprints any more. Most of the work you do cannot be categorized as feature factory stuff.
So, coming from a mechanic/blue-collar perspective the idea that everything has to be your job is an insanely dangerous practice. We absolutely have a "not my job" mentality because there are countless instances where poorly trained/inexperienced diagnostic technicians and day laborers have caused catastrophic loss of life and property damage. Is there no room in coding for inexperience? what is the price of a job half done poorly?
"Successful people do what unsuccessful people won’t." is, in my industry, a neat way to start the story of how you lost your hand.
In tech, the consulting industry is full of "job half done poorly". Consultants overpromise and underdeliver. They take the money and run. Yes, it somewhat works. If you have a bug, no one can help you because their code is a giant ball of spaghetti that requires a Rosetta Stone to decipher. Which they will not give you without you paying more money. Within five years, you are likely to ditch that train wreck and hire a bunch of new consultants to rewrite what will probably be another slow burning dumpster fire.
Unfortunately, in many places, “that’s not my job” has been weaponized by people to avoid pulling their weight or to fill in a critical gap. Often in very low-stakes situations.
“That’s not my job” for industrial/manufacturing is absolutely critical and should not be downplayed. But refusing to teach the newbie next to you how to use the Jira ticket system because it’s “not your job” is counterproductive.
in software, this is where "catastrophic" data leaks and security vulnerabilities come from, an engineer or team deciding to roll their own version of a security or cryptography library without having the proper experience to even know what they don't know.
The NT kernel is incredibly high-quality work. It was developed in 1993 and 28 years later is still powering billions of devices. Windows did have a lot of security problems, granted, but many (if not all) were in the upper layers - of which there were many.
He did the very best that could be expected in the early '90s.
It was exemplary coding, and very little of its fundamentals have changed. I have some assertion on this from an interview in the late aughts - most of the problems were in layers above the kernel.
That being said, even VMS is not without flaws (we still run it for our manufacturing floor):
No, the best that could be expected in the early 90’s is formal proofs of security enforcing multiple independent levels of security (MILS) allowing the execution of arbitrary, malicious programs on the same systems handling TOP SECRET data in complete safety as achieved by TCSEC Class A1 certified systems contemporaneously.
In contrast, OpenVMS only ever achieved C2, security enhanced VMS B1 [1], and the NT kernel C2 [2]. NT kernel based systems subsequently maximally achieved EAL4+ indicating independent verification that it is adequate to protect against “casual, and inadvertent attacks”, but failed to demonstrate resistance against attackers with a “moderate” attack potential.
So, no, it is not the very best that could be expected in the early 90’s.
If you are not asking that rhetorically, then the Gemini Trusted Network Processor (GTNP) using GEMSOS achieved a Class A1 certification in 1995 [1] and included ethernet device support in its specification. This is adequate to implement a TCP/IP stack in a unprivileged context. My quick read of the report does not allow me to confidently assert that they do simultaneous ethernet device multiplexing which would allow a trivial implementation of multiple independent data streams over the same link, but they do appear to provide at least a generic time-partitioned device multiplexing solution which would allow multiple programs to transmit and receive serially rather than just being bound to a single program at boot time. This is adequate for a large number of use cases and removes the TCP/IP stack from the TCB and thus requires a lower degree of scrutiny as its failure modes do not cause whole system failure.
If you are asking rhetorically, then what NT kernel, NT kernel derived, NT kernel component, or even any code associated with the NT kernel in any context has been evaluated to Class A1, Class B3, EAL6, EAL7, or equivalent demonstrating proofs of security adequate for usage in high assurance systems? Arguing the NT kernel has achieved less security as a tradeoff to allow it to solve a broader use case is only valid if they have demonstrated the ability to actually make a tradeoff by achieving high security in a narrower use case nominally or at least qualitatively similar. If they can not actually demonstrate the ability to achieve the alternative, then they are not making a tradeoff, they are choosing the only option they can do.
And this is what they have demonstrated. At no point have they ever demonstrated a equivalent, or even qualitatively similar level of security at any scale within orders of magnitude of the scale demonstrated by the Class A1 systems. And it is not like they have not tried. They have attempted numerous times to certify, demonstrating a desire to succeed and at least meaningful effort to do so, and have failed to certify at anything more rigorous even in highly constrained configurations to the extent that they have given up.
To use a analogy, this is like having two energy companies and one of them says, "We have chosen a fusion energy solution that does not generate net energy because we believe that only fusion will be adequate when we have a interstellar civilization that needs to operate in the interstellar void." while the other has a working, economically efficient, solar energy solution that generates energy on the Earth today. Asking, "Can you point to a solar energy system that could generate adequate energy in the interstellar void?" as a counter argument is just plain silly because the existing fusion energy solution also does not work there and in fact does not work anywhere in any context.
Not only was I not asking rhetorically, but I saw the Gemini TNP and am aware that it never had an evaluated TCP/IP stack, only, as you said, the potential for someone to write one.
For about 4 months I was in charge of a Harris NightHawk running a B1 version of SVR3, which was two and a half pains in a half-pain glass.
Saying he doesn't want to produce any code that has bugs is not the same as doing it. Given how accomplished he is I'm sure he is well aware that sometimes bugs slip through.
A lot of NT hardening happening over last ~20 years was effectively enabling access controls that were set too lax previously or even disabled, all due to performance issues (GDI being moved in-kernel, with all the security issues, was also due to performance problems of NT3.x)
The NT kernel is quite nice and solid, the problem is that not everything built on it followed all of the design rules.
>I don’t want to produce any code that has bugs – none.
And there's the reverse thinking: let me intentionally introduce some bugs, so when I will be tasked with solving them I can fix them in 5 minutes and take a good sleep for the rest of the sprint.
True to form, Cutler had a break with DEC, and they gave him a carte-blanche VAX "skunkworks" with Prism/Mica.
DEC eventually shut this down, which prompted his departure for Microsoft. This is unfortunate for DEC, as they eventually poured the company into their Alpha RISC processor, which did not live as long as DEC hoped. Prism might have been a superior design.
At this time, Microsoft was maintaining a UNIX kernel in their Xenix product, so they knew a good kernel engineer when they met one. Microsoft was the leading UNIX vendor in the early 80's.
Cutler famously disparaged the UNIX kernel (his notable saying was "Get a byte, get a byte, get a byte byte byte" to the tune of the finale of Rossini's William Tell Overture).
Microsoft dumped their Xenix onto SCO about this time.
What is more interesting to me was Cutler's involvement with Azure. He must have had some sway over CBL-Mariner, Microsoft's RPM-based Linux distribution.
Much of Cutler's earlier work is documented in the "Showstoppers" book:
> What is more interesting to me was Cutler's involvement with Azure. He must have had some sway over CBL-Mariner, Microsoft's RPM-based Linux distribution.
He was involved with Red Dog (the modified Windows host that powers Azure).
He's not involved with CBL-Mariner team at all to my awareness. Mariner is mostly about solving a supply-chain problem at Microsoft... we have a ton of internal teams all using different flavors of Linux and packages have historically come from all over the place. With CBL-Mariner we are basically trying to unify on that and own the package build and distribution portion as well. There isn't much reason for a kernel designer to be involved in that as its a well-understood problem (and entirely different domain) and we already have internal upstream Linux kernel contributors (which is how produce -azure supported kernels).
Apple didn’t end up on a BSD kernel. They started on Mach (from NeXT) and then made it more performant with XNU by not being so pedantic about microkernels.
"XNU was a hybrid kernel derived from version 2.5 of the Mach kernel developed at Carnegie Mellon University, which incorporated the bulk of the 4.3BSD kernel modified to run atop Mach primitives..."
This is from the link I provided: “The BSD portion of the OS X kernel is derived primarily from FreeBSD, a version of 4.4BSD that offers advanced networking, performance, security, and compatibility features. BSD variants in general are derived (sometimes indirectly) from 4.4BSD-Lite Release 2 from the Computer Systems Research Group (CSRG) at the University of California at Berkeley.”
After NeXTStep was adopted by Apple, they hired a bunch of the FreeBSD core developers and updated the BSD service and userland using FreeBSD. Apple was actually already messing around with Mach prior to that with MkLinux so there was some initial speculation that they might port to MkLinux rather than update NeXT's Mach+BSD hybrid.
The code updates are very limited, though. To the point that for one's own sanity it's better to assume userland is 4.3BSD unless marked otherwise (been burned by this myself in code that assumed every BSD had changes from ~1994 NetBSD)
MacOS X essentially started by updating the NeXT core to latest OSFMK distribution, which was hybrid system (bsd server integrated into kernel space) powering the BSD alternative to SystemV - OSF/1 (most famous of them being DEC OSF/1 aka Digital Unix aka Compaq Tru64). They applied bits of FreeBSD to the bsd server code and over time improved its concurrency, but considerable portion of XNU is code that is called by bsd server but not part of bsd server (IOKit, among other things)
> Cutler famously disparaged the UNIX kernel (his notable saying was "Get a byte, get a byte, get a byte byte byte" to the tune of the finale of Rossini's William Tell Overture).
I'm pretty sure that had to be in reference to STREAMS, because the original UNIX I/O model certainly is not byte-oriented. So basically, entirely irrelevant.
Berkeley was paid to port TCP/IP to unix and as far as I know, part of the deal was that the code should be available to quickly port TCP/IP to more systems including non-BSD.
Thus considerable portion of Operating systems at one point or another used BSD4.3-derived network stack to detriment of network API evolution (sometimes updated with later code), some over time developing their own implementation.
“Bias for action” is a nice catch phrase and not bad to get people out of paralysis.
The other successful thing to add onto it, is to tell people what you’re going to do (ex, a high level 3 step plan, not more than 5), then do it. If there are objections, ask [what can be done] to mitigate the issue instead of arguing the points.
Don’t “bias for action” and do something w/o setting expectations first.
I have another lost interview with Dave Cutler that Microsoft took down from their website about ten years ago. I can find and post it, if there is interest.
I found this as an MHT file, edited as best I can. Maybe I should get this to archive.org?
Date: Sat, 23 Oct 2004 09:40:05 -0700
From: Windows Contact Us <wincu@css.one.microsoft.com>
We apologize for the delay in our response.
I have attached "The Architects: First, Get the Spec Right" an interview
with Cutler and Mark Lucovsky.
Goldie, Microsoft.com Customer Support
___
The Architects: First, Get the Spec Right
Once upon a time ... there was NT OS/2.
Every month, Nadine Kano prowls the halls of Redmond to profile the real folks behind Windows 2000 development. This month: David Cutler and Mark Lucovsky, who helped guide the operating system from its infancy.
"Well, we were just about to leave," David Cutler says from behind his desk.
I'm five minutes late to my interview with Cutler and Mark Lucovsky, two of the original architects of the Microsoft Windows NT operating system.
As I wilt into the carpet, I realize Lucovsky must have mentioned to his colleague how nervous I was about approaching them. After all, who the hell am I to be talking with these guys? They are developers' developers two of the visionaries behind the operating system that began as NT OS/2 and has evolved into Windows 2000. Cutler refuses virtually all interviews with the press, but he and Lucovsky are willing to talk to me, a program manager from down the hall. They probably find my nervousness amusing.
Build it, ground up
I was a college senior when Bill Gates personally recruited Cutler, Lucovsky, and others from Digital to begin the NT OS/2 project at Microsoft. Their quest was to build, from the ground up, an advanced PC operating system, something far beyond MS-DOS. NT signified "New Technology."
One of my engineering classmates, David Treadwell, joined Microsoft to work on NT. I remember how excited he was to be a part of this obscure little development effort that none of us really understood. Remember that 1989 was before Windows 3.0. In 1989, Windows was still a nonentity.
"For example, we went with a client-server design, though at first things like context-switching rates and cache misses were too high," Lucovsky recalls. "Things were slow. But we didn't let ourselves get concerned with the size of memory. Not everything can be at the top of the list. We consciously put performance lower on the list than extensibility and didn't pay close attention to memory footprint until (version) 3.5."
We talk about how the operating system evolved with each release, from memory optimizations and Power PC support in versions 3.5 and 3.51 to kernel-mode GDI and User, plus a new user interface in version 4.0. "The basic, internal architecture has not changed, except for Plug and Play," Cutler says.
"We wanted a good design from the beginning so that, ideally, people could put anything on top of the system once it was there. We focused on the underpinnings. We wanted to minimize the need to add to the base or to tear up the base, because doing those sorts of things creates a lot of bugs. If moving forward you don't have to touch the basic code except to make small tweaks, then you know you got it right."
Some things must wait
At the same time, Cutler admits, "Nothing is ever architecturally correct." Needs evolve, and it takes time to build an operating system. Although support for distributed computing and clustering were part of the original vision, features such as the Active Directory haven't come to fruition until Windows 2000. "If we tried to give customers everything with the first release, we would never have finished it," Lucovsky says.
Cutler elaborates on this philosophy. "If what we desire is to have a mature operating system, then we need to achieve revolution through evolution, through incremental improvements. Within five iterations of an operating system like Windows NT, you see a big difference."
Both Cutler and Lucovsky see taking advantage of every opportunity to increase quality as the top priority for Windows.
Reliable is cool
"I'd much rather see the most reliable and usable operating system than the most whizzy-bang operating system," Cutler says. "To increase reliability we have to make choices. For every 10 bugs we fix, we may introduce three more. But do you want to ship with 10 bugs, or do you want to ship with three?"
"Do you want one more new feature," Lucovsky concurs, "or do you want to fix more bugs?
"When the Internet was first catching on, it was OK if your browser crashed once in a while. But these days, if you go to an e-commerce site and you hit the Buy' button, things had better work. When you're dealing with a leading-edge piece of technology, you can play fast and loose with it. But as the technology matures, playing fast and loose isn't acceptable anymore. This is characteristic of the maturing process for a product like Windows. People will put up with more from the bleeding edge."
"What I think is cool," Cutler interjects, "is that the system doesn't crash, and it doesn't lose my work, and it has functionality. I could care less that the visuals are flashy if my 32-gig hard drive goes away."
Communicating quality
"And if you're a consumer," Lucovsky responds, "you want even better reliability." He concludes: "Quality is the most important vision that everyone working on this product needs to share. It isn't always easy to communicate how we're going to do this, particularly as the team gets bigger."
The growth in the number of people working on the project over the last 10 years has other downsides, Lucovsky notes. "When you have a bigger group, quality problems become especially detrimental to productivity. Say it takes 10 developers and testers to fix one bug. Whoever put that bug in there just caused 10 people to lose time. We're working to make sure our development tools keep up with the growth in our system and our team. We're streamlining the process in ways that will make a dramatic difference in the way we build the code."
"If we want to stay competitive," agrees Cutler, "we have to invest money in tools and mechanics as well as features. We need to put guidelines on paper so that people stay good at planning. Simple things like writing a good spec are basic to software engineering."
Museum quality
It all comes back to the spec.
As I leave Cutler's office, I wrap the NT/OS2 spec in my jacket and head back to my building. I have three computers with lots of whiz-bangy, new fangled things running on them, but for a few hours it's the spec that holds my fascination. As a Microsoft geek, I feel like I'm holding a piece of history. And it turns out I am. This fall, the spec I borrowed for a time will join the Information Technology collection at the Smithsonian Institution's National Museum of American History in Washington, D.C. It's another good reason to write a spec.
I sputter a bit as I look at the clock on Cutler's desk, smile, and take a seat.
To begin the interview, I want to set the context. What was the team's initial vision for an operating system?
"We had five or six major goals," says Cutler. He pulls a copy of Helen Custer's book Inside Windows NT from his bookshelf and flips through the pages. Portability, reliability, extensibility, compatibility, performance. I think that's right. Let me see."
He goes back to the bookshelf to retrieve a thick black binder. The label on its spine says "NT OS/2 Design Workbook." He flips through some pages.
"Here," says Cutler, casually handing me the volume. "Why don't you borrow this? As long as I get it back," he continues. "I think it's one of the only copies left."
Four inches of spec
Inside the binder, separated by a dozen neatly arranged tabs, are 4 inches of documents that make up the original specification. Dated 1989 and 1990, they bear names like Kernel Chapter, Virtual Memory, I/O Management, File System, and Subsystem Security. Page 1 of the introduction, written by Lou Perazzoli, reads: "The NT OS/2 system is a portable implementation of OS/2 developed in a high-level language. The initial release of NT OS/2 is targeted for Intel 860-based hardware, which includes both personal computers and servers..."
I try to keep my expression casual as I set the volume on the table next to me. I know these guys find reverence irritating. I hope it's not raining. How will I get the spec back to my office? Doesn't this binder belong in a museum? God, I hope it's not raining.
"Do you think you achieved these goals?" I ask.
"We certainly achieved extensibility and portability," Lucovsky says. "We tested ourselves by not doing the x86 version first. We did the RISC (Reduced Instruction Set Computing) stuff first. It would have been so easy to drop the RISC support; everyone in the company wanted to. But the only way to achieve portability is to develop for more than one platform at a time. It cost us a lot to keep portability alive, but we did, and that has made it easy for us to respond to things like Merced," he says, referring to the 64-bit chip from Intel.
No embedded semantics in the kernel
At every step, Cutler and Lucovsky explain, the team prioritized design. They knew that the code they were building had to last for years. This meant thinking ahead, understanding, for example, that hardware would evolve perhaps drastically.
"We tried to create a system that had a good, solid design, as opposed to one that would run optimally on hardware of the time," Cutler explains. "When we started, we were working on 386/20's. At the time that was a big, honking machine. Since our design had to be portable, we didn't allow people to optimize code in assembly language, which is hardware specific. This was hard for the Microsoft mentality at the time. Everyone wanted to optimize code in assembler."
The original vision kept the operating system nimble. "We didn't embed operating-system semantics into the kernel," Cutler explains. "So when we switched from OS/2 to Windows, we didn't take a major hit. If we had built OS/2 threading or signals into the kernel, we would have been in trouble. Instead we built the OS in layers and created subsystems to handle OS/2, Windows, and POSIX."
Not everything can top the list
But the original vision also required tradeoffs. The team's engineering philosophy was to focus on one major area at a time. "That's why we wrote a spec," says Lucovsky. "The way we see it, write down what you're going to do, and then execute on it. Don't stand around dreaming, telling yourself, 'Wouldn't it be nice if' We spelled out what we planned to do right there," he says, pointing to the spec sitting next to me, "and we stuck by what we said we would do.
Delegate the prioritization by using a template similar to this one.
> "Hey, I noticed [this thing that I think might be a problem]. Would you like me to [do a specific action that would probably fix it] or move onto something else?"
Yes but quality, naturally enough, takes more time. And while this might be great for organizations with plenty of buffer, for any start-up it has to be that "good today beats perfect tomorrow".
There is a difference between quality and overengineered features nobody asked for.
Nobody is suggesting we "do everything in the ideal way". We're suggesting taking 5 minutes to think about the thing you're writing in the next 30. It produces less buggy code, and likely, be able to complete the task with less iteration because you made the mistakes in your head and changed plans before you wrote them out.
I close out 30 tickets a week, other teams at our company are lucky if they close out that many as a team. I'm not trading quantity for quality at all, I'm simply choosing to do the things that matter and I ignore everything else.
Buggy code can be a real timesink for startups...the failures often cannot be ignored too long, they just sit and mature and compound some interest before they have to be fixed anyway.
And writing bug free/low bug code is about experience level and how your development methodology is -- not about spending more wall clock time. (I.e. for the quality you may spend more on salaries per hour, but not more hours -- probably a very good idea for a startup IMO)
I believe that in many situations high quality code is a LOWER net investment than low quality code.
We can't and shouldn't all aim to produce the Mona Lisa or Starry Night. Such expectations and perfectionism are unhealthy and often counterproductive. Sometimes you just need to showcase the product and fix the issues in beta. Going slow with low bug counts is nice sometimes and in theory, but certainly not every exception or edge case will be forseeable or preventable even with an incremental approach.
“I have a couple of sayings that are pertinent. The first is: ‘Successful people do what unsuccessful people won’t.’ The second is: ‘If you don’t put them [bugs] in, you don’t have to take them out.’ I am a person that wants to do the work. I don’t want to just think about it and let someone else do it. When presented with a programming problem, I formulate an appropriate solution and then proceed to write the code. While writing the code, I continually mentally execute the code in my head in an attempt to flush out any bugs. I am a great believer in incremental implementation, where a piece of the solution is done, verified to work properly, and then move on to the next piece. In my case, this leads to faster implementation with fewer bugs. Quality is my No. 1 constraint – always. I don’t want to produce any code that has bugs – none.
“So my advice is be a thinker doer,” Cutler concluded. “Focus on the problem to solve and give it your undivided attention and effort. Produce high-quality work that is secure.”
https://news.microsoft.com/features/the-engineers-engineer-c...
Unfortunately, I do not have this level of talent, but I do what I can.