Hacker News new | past | comments | ask | show | jobs | submit login

First, let's look at what is an 'algorithm': The examples that got famous were the sorting and searching algorithms in Knuth's TACP. So, people quickly coded up bubble sort which has running time proportional to n^2 for sorting n items, and in practice n^2 grew too darned fast. So, people were eager for Shell sort, quick sort, heap sort, merge sort, etc. which all had running times proportional or nearly so to (n)log(n) in average case or even worst case. Then there was radix sort which could beat (n)log(n). So, people got excited about 'algorithms'.

So, what did we get? We wanted to sort (or work with trees, etc.). We knew how to do that manually. We could write the code, but our first cut code tended to be too slow. Heap sort, etc. are really clever. Stare at the code (typically are doing an informal proof of correctness by mathematical induction based on the definition of the natural numbers) and can confirm that it sorts, and with analysis as in Knuth can confirm the running time, e.g., worst case (n)log(n). So, net, there's not much question about what it does, that it actually sorts, and how fast it is. Great -- for sorting and searching.

Do I ever do such things? Actually, early in my project I wrote a bunch of heap sort routines, polymorphic, etc. I wanted to write some 'stable' versions but didn't get around to it this time. Also can write versions that sort by 'surrogate', that is, move only pointers and not the data. For some of the details of how Microsoft's Visual Basic .NET works with strings and instances of objects, there is some question about the value of sorting by surrogate! Then with the results, can have an O(n) routine that will take the 'permutation' of the pointers and move the data. Etc. I didn't write everything I could think of in sorting, but some of what I wrote is good and used in my code. And I got some polymorphic code that is only 2-3 times slower than a non-polymorphic version. So, yes, I did some work in algorithms, even some quite old ones.

Next I needed a 'priority queue' so borrowed the heap data structure from heap sort and, presto, got a nice priority queue routine. It's in my production code.

So, yes, I worked in algorithms. And for my project, that work is crucial in the sense that without it my software wouldn't work or, if I had used code I knew about from others, not work as well.

But now there is another issue in computing and software: There are old examples; since they are more explicit, let's start with those. For a positive integer n, suppose A is an n x n matrix of real numbers and we seek x, n x 1, to solve Ax = b where b is also real numbers and n x 1.

Now we need an 'algorithm'? Not really! An 'algorithm' as in sorting is not enough! Instead we need some mathematics prior to any sense of an 'algorithm'! Usually the mathematics we use is Gauss elimination, and a good chunk of a course in linear algebra makes clear just what that is doing and why it works.

Here's much of the difference: Looking at a description of code for, say, heap sort, it's easy to see that it sorts. Looking at a description of code for Gauss elimination, it's totally obscure why it solves Ax = b, that Ax = b has none, one, or infinitely many solutions, and if infinitely many how to characterize them. Instead of just the code, just the 'algorithm' for Gauss elimination, we need the math from, say, linear algebra. Similarly for a numerical solution to an initial value problem for an ordinary differential equation.

Then more generally, for some software, it may be that some math, new or old, is crucial for knowing what the software will do, much as the linear algebra was crucial for knowing what some Gauss elimination code will do. E.g., in Gauss elimination, turns out have some choice about the 'pivot' elements. Well, typically want to select the pivot that is largest in absolute value. Why? Some math! Next, do a lot of 'inner product' arithmetic. Then want to accumulate inner products in double precision. Next, when get a solution, want to do a little more work with double precision and do some 'iterative improvement'. And would like to know the 'condition number' to get some error bounds -- all in the code and all needing some math to know why it works.

So, net, for many good solutions to real problems, an 'algorithm' alone has not enough to recommend it and need some prior logical support from, say, some applied math. So, in code, algorithms are not everything that is important; in such cases, an algorithm alone does not have enough to recommend it.

In my project, some original math was crucial, and got converted into code. But to have any confidence in what the code will do, can't just stare at the code and, instead, need the math. That is, just an algorithm was not enough.

So, in my project there has been work in both algorithms and applied math. That work is crucial in that without it the project would flop or nearly so.

But, right, also is working through a lot of routine logic. E.g., the code for yesterday was to be clear at the start of the code for a Web page just where the user came from. There were three cases, the first two legal and the third a symptom of an attempt at hacking. So, I needed to work out the logic. For that I needed some answers to some of how Microsoft's ASP.NET worked. E.g., if transfer to page B from page A via in page A code server.transfer("B"), then in the code of page B, what is the value of Request.UrlReferrer? Use the TIFO method -- try it and find out. So, right, such 'logic' is routine but takes time.

Then where most of the time went! Right, to none of the above! Most of the time, likely over 80%, went, may I have the envelope, please? Yes, here it is (drum roll): The nominees for where most of the time went are (1) working through poorly written technical documentation, (2) software installation, (3) software configuration, (4) data backup and recovery, especially for the boot partition, (5) system security, e.g., fighting viruses, and (6) fighting bugs. And the winner is, (1) working through poorly written technical documentation! Heck, reading Knuth, Ullman, Sedgewick, etc. was fast, fun, and easy, but reading the documentation on the 'platform' was painful and wasteful.

So, the crucial stuff unique to the project took the least time. The rest of the work unique to the project took much more time but still was fast, fun, and easy. Work on (1)-(6) essentially independent of the project took maybe 80% of the time.

In particular, the most valuable work, that gives the project some potential, took the least time; routine work still unique to the project took much more time but still was fast; nearly all the time went for work not unique to the project.

Why? There is good/bad news: Some of the good news is that there is now so much infrastructure software that write less of own code and use code written by others. The bad news is that when using code written by others, also need to use their documentation, and for the industry so far writing good documentation is, in practice, essentially too difficult. E.g., I had to use just the TIFO method to determine what would be the value of Request.UrlReferrer. As far as I could tell from my 4000+ Web pages of documentation, the only way to know was the TIFO method. Then, of course, in my code I did put some documentation. I wish I didn't have to write my own documentation of the tools I am using from others.

Net, what's crucial and unique to a project may not take the most time. That is, the work that takes the most time may not be the most important work. And, then, can't measure importance to a project of some work by the time the work took.




Interesting, but I noticed you write about your project, singular. Have you had any other projects?


Sure. At one time I was working to support my wife and myself through our Ph.D. programs. The work was in applied math and computing on mostly US Navy problems. At one time, there was an urgent request: Evaluate the survivability of the US SSBN fleet under a controversial scenario of global nuclear war limited to sea.

Point: A claim is that if find an SSBN, then can kill it with a nuke -- likely true. Another claim is that can't find an SSBN -- not really true but close to true since a promising effort to find the SSBNs would be 'provocative' and raise alarms. Also, if find and sink one, then that's the start of nuclear war, and the other SSBNs will be free to fire. So, what is tough is to find all the SSBNs at one time and sink them all at once. But if in the special scenario, then could find and sink the SSBNs one at a time. Then how long would they last?

The Navy wanted their answer quickly, in two weeks. That was about right, since my wife already had us scheduled for a vacation cabin in Shenandoah starting the day after the due date!

So, I derived some math, wrote some software, and both the Navy and my wife got what they wanted on time!

My understanding is that my work was later sold to another group interested in US national security. I could tell you what that group was, but then I'd have to ...!

There have been other projects!




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: