Hacker News new | past | comments | ask | show | jobs | submit login
Finding and Fixing a Five Second Stall (the-witness.net)
95 points by jamesmiller5 on Dec 18, 2012 | hide | past | favorite | 37 comments



While I'm a web dev and not a Windows coder, this post is a great example of why I come back to HN.

> From experience, I’ve learned that it’s always best to fully understand a problem before you fix it. If you just patch over its symptoms but never figure out what the problem really was, it will often come back to haunt you.

Philosophical-level statements like this really help me as a dev, since my one of my biggest hurdles is getting traction for even simple 1-2 line refactors that, in my opinion, can simplify code and improve maintainability. Far too often, I hear "just get it done", and I like to be reminded with statements like this of why "just get it done" doesn't always result in a better bottom line for the company.

>For some reason, I took this concept to heart in a programming sense and have found it is a good rule to code by. My version of the static discipline, adapted for software, is that whenever you are making a modification to a piece of code, you should always leave it in a state of stability equal to or better than how you found it. And preferably the latter.

I try to do this where I can too, and again, appreciate the philosophical explanation. It prepares me to more thoroughly and calmly explain my own work style.

The author laments Windows development, and I can only contrast that with my experience on an open source web stack (Ruby on Rails, Backbone.js). The upside of working with open source is that I can crack open the gem (library) I'm using and investigate the logic myself. I could fix the bugs or extend the code to do what I want. That is pretty cool.

The downside of working with open source software is that documentation is horrid. As the author states, man-hours are finite, and people who write OSS typically don't want to spend their finite hours documenting their code. I just take these as tradeoffs. It seems like Microsoft invests heavily in documentation, but I guess the author's point is that they can still do better.


I invariably end up reading the library/framework/whatever sources as well, if only to clarify a particular point of documentation or something.

But, when bughunting, there's often the dilemma of whether to fix the bug at the source (and hopefully, submit a patch to upstream), or to work around it (which can lead to horrific hacks) in your app code.

With the former, you're stuck having to deploy custom packages with your fix in until/if it gets accepted into upstream and a new release is rolled, so all too often I've found myself doing both. I don't know what the solution is; even a perfect patch to a superhumanly responsive maintainer is going to take some time to merge and deploy.

On an entirely unrelated note, a superb way to drive yourself insane is to edit the currently installed system version of the package in question, and then forget that you did some time later.


I'm only about 1-2 years into web dev and OSS, and a couple of months into feeling like I can contribute meaningfully to a gem. With that said, this is my current ideal Github workflow:

- fork the gem to make changes/fixes - complete fix and point my app dependency to my branch - add testing and docs around my fix - submit pull request to project owner - point my app dependency back to original gem when merged

That all said, I consider the activity of a gem before I even bother submitting a pull request. If it hasn't been touched in a month, or if there are open pull requests from 1 month or older, I weigh heavily just writing a 'my team only' solution.

This is admittedly selfish, but I'm hedging my time and emotional energy. I don't want to put care into crafting something that I believe to be useful only to watch it sit around unused because someone else doesn't want to review my work.


> The downside of working with open source software is that documentation is horrid. As the author states, man-hours are finite, and people who write OSS typically don't want to spend their finite hours documenting their code.

Not universally true (although it probably is true in most cases). PostgreSQL has great docs and I've heard good things about Open BSD although I haven't looked myself.


Yes, I agree. I made sure to include 'typically' in my statement because I do see exceptions. Backbone.js has excellent documentation, as does Coffeescript and Underscore.js


Windows is saddled with baroque interfaces that are there only because they've always been there. Microsoft OS developers try to simulate or stub out old behaviors, in an effort to keep old app code running without breaking anything too badly.

Such a crust of backward compatibility code weighs heavily on a decades-old OS.


In fact, if the Windows guys had actually taken this backwards compatibility thing seriously, and failed to tidy things up by deprecating DirectInput, this problem probably wouldn't have arisen. DirectInput has a single simple flag to disable the Windows key :)

"DISCL_NOWINKEY - Disable the Windows key. Setting this flag ensures that the user cannot inadvertently break out of the application".

(I found this flag to work entirely reliably on Win2K and WinXP, and I'm pretty sure it worked on Win98 as well. I think it also kindly disabled a bunch of other things for you, leaving Windows with only Alt+Tab and Ctrl+Alt+Del. Which is basically exactly what you want for anything full-screen.)


Reading this really brought back memories of when I was writing Windows code. It really felt like half of any project was spent on working around the OS, rather than with it.


Now for a lot of developers half of any project is spent working around various browser quirks -- how far we've come!


Oh, the joys of Win32 programming.

Back in the day I wrote networking service that was occasionally failing under stress load with memory corruption. I was staring at rare dumps, re-reading code and documentation to no avail. Finally, after few days of debugging, I went ahead and located debug symbols and sources (oh, I happened to work at Microsoft at the time) for the build of Windows we used at the test lab, it was relatively arcane procedure back then (it got better later, but at the time it was either symbol server doesn't work, or sources don't match).

So I setup gflags with memory guards and windbg and started waiting. After few days of stress run it finally crashed again and there it was - comment in the code of the crashing library saying "OVERLAPPED can be deallocated at this time if completion ports are used, but we save this value to it here anyways for backward compatibility reasons with bla-bla." Glad that you told me guys, I guess now I have to rewrite it and refcount the OVERLAPPED! I still don't know how I could debug it without the source access. (Ironically, it also enlightened me on why service I implemented at the startup before was occasionally crashing as well).

And don't even start me on implementing SSL support in the service.


Is there some 'platform complexity' measurement metric along the lines of a 'code complexity' indicator? It would be useful for finding platforms/languages/environments with less pain and as a way to track improvements as well.

http://owenfi.com/post/38255106579/windows-keyboard-apis


You could probably take your normal metrics and compute them for the lines of code that specifically contain framework/platform code? Can't say how useful that will be, but I'd be interested to see how those numbers align with my feelings.


I had the opposite experience yesterday. A colleague and I were tracking down a rare race condition that became more likely with more threads. A signal handler was being re-entered, which the man page suggested shouldn't happen because by default the signal is blocked during handler execution.

Because this was Linux, we were able to read the kernel source to see what actually happens. It turns out that the signal is only blocked for the current thread, and Linux will deliver it to the first unblocked thread instead.

This misunderstanding also came from a less explicit API doc, although not outright incorrect. But because we had full stack source we didn't need guesswork, a good night's sleep or inspiration. Just curiosity and persistence.

There's a zen-like calm to knowing you can always, always follow a bug through the entire system, that everything happens for a reason and you can open an editor and see what those reasons are.

I will miss this deeply if I ever return to closed-platform development.


Did you submit a patch for the documentation to make this particular more clear?


Reminds me of this one job where we had to call LockWindowUpdate to get an Excel/VBA-based application to perform.

http://msdn.microsoft.com/en-us/library/windows/desktop/dd14...


LockWindowUpdate really isn't that unusual if you understand how painting works in Windows. The terminology behind it is weird and so is the fact that it's one window at a time, but it's not all that strange - it's an old enough primitive that you probably couldn't justify making it a per-window flag bit.


My favourite win32 api is this little gem for flushing a file output buffer to the physical disk: http://support.microsoft.com/kb/148505


I pity the poor tech writer who had to constantly write "Commode.obj" while maintaining a serious tone.


They are using a low-level hook to disable the expected behavior of a "get me out of here" key.

They deserve all the stalls they can get. The reason it was so complicated to get this behavior in the first place is because its heavily discouraged.

Not to mention that calling SetWindowsHookEx will flag you as a keylogger in every antivirus snakeoil in existence.


As any gamer in the last 15 years will tell you: the Windows key is the bane of our existence. It's a classic case of know your audience and customer, because if you've ever played a video game on Windows and are now developing one, you know that you should be disabling that key.

You mentioned expected behavior, which is good to abide by. But there's also preventing accidents. Your users are imperfect. It's the reason big red buttons also get a follow-up confirmation dialog.

Now, I'd definitely like to have a conversation about whether there should be a setting for disabling that override. But as for the default: it's pretty straightforward. When playing a fullscreen video game, the number of accidental occurrences of that button press far outweigh the number of intended uses. Pick your battles.


So heavily discouraged that it was in this Microsoft sample code linked in the OP: http://msdn.microsoft.com/en-us/library/windows/desktop/ee41...?


That code is offered as part of the extensive documentation that Microsoft provides on its various frameworks.

It by no means sanctions what still remains a terrible hack. A badly written hook could render a users system essentially unusable, and as I explained, its a very common heuristic for anti virus software.

The point remains: the fact that you have to resort to globally hooking all input to break the expected behavior of a key can not possibly serve as an example for bad documentation or bad API design. It in fact proves the very opposite: you have to work very hard to break what users expect.


Considering that neither party, those who don't want to enable the windows key during full-screen and those who always depend on the windows key to function normally, is served without an unstable and error prone method it would seem the API is failing both parties in some cases.

I think the author would agree with your last statement. By forcing the applications to handle this logic and break what some users expect the API isn't serving the users or developers well.


>> The great thing about programming on Windows is that it is the only commercially viable platform where you can ship software to users without getting approval from a giant bureaucracy <<

I kind of had the feeling I could write code for a Mac and ship it.

Wait, in some sense I have done so before with code for computing fractals I shared with other Mac users in my department. It worked perfectly in my Mac and their Mac.

Wait! I have downloaded and paid for apps straight from some developers websites (top of my mind, or top of my menu bar: Hazel, 1Password, Arq)

So I just stopped my reading there, sadly.


-1, sorry.

> So I just stopped my reading there, sadly.

This is a fascinating article, and you're derailing the conversation (the article is not actually about the topic you're discussing at all). I recognize I'm only derailing it further, but I feel it is good etiquette to explain downvotes.


No need to be sorry for it, I probably deserve it for this comment. Of course, I've seen worse things upvoted, loathed or worshipped in HN, but don't mind for a few -1.

Even with what I read before getting out I knew it wasn't about platforms at all, but after that line I just wasn't as interested as when I decided to check it.


Yeah. That was the line where the author lost most of his credibility with me. I mean, it's not like anyone makes you fill out forms to make packages available for download on Mac and Linux...

If the author meant "Gaming Platform" that makes a bit more sense.


I think it was a stab at Apple porting their app store to Mac OS, but I agree that it was worded poorly as well as out of place.


That would only make sense if Microsoft hadn't made the exact same choice with Windows 8, even down to the warning when running untrusted software.


> Windows is the only commercially viable platform where you can ship software to users without getting approval from a giant bureaucracy (well, perhaps I should say it used to be).

And there's the stab at Windows 8.


Rather than throwing the article away in righteous anger, consider what sort of context could be missing that would make the authors point somewhat plausible. Could it be he did not mean just any platform, but only specific type of platform? For example, server platform - MacOS isn't really a significant player there, so that checks out, Windows is a commercial player, so that checks out, there are no other commercial players, so that does not check out. Ok. So maybe he refers to gaming platforms? Xbox, Nintendo, Playstation, iOS, four commercial platforms that all come with burocracy, so that checks out, windows is a player, check, MacOS is not a significant player, so kind of check. It makes perfect sense if the author is professionally employed developing games - these platforms are where all the money is, so that's the only thing on his mind. See, not everything is a hostile dig at your favorite fruity platform.


It wasn't anger. I was genuinely curious to see what was about, and I don't have any problem with Windows (used it from 3.1 until XP) But after a long and windy work day with quite a lot of setbacks that line sent me right out.


The Mac is not a commercially viable gaming platform.


Agree, at least on its own right. But that wasn't what the line read, and it quite sent me off.


I suspect the author means viable for video games, not software. I don't have any proof of this, just the dearth of large-budget games make their way to Mac.


Yup, I guess that's the point, but dissing the whole platform because a kind of software product is not particularly buoyant...


In the header of the blog: "The Witness: an exploration-puzzle game on an uninhabited island".

So, yes. Games.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: