Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Fundamental texts for serious web engineers?
95 points by BadassFractal on Dec 31, 2011 | hide | past | favorite | 36 comments
As someone who would like to be a one-man-army / technical co-founder in the world of web development, I've been recently trying to deepen my technical knowledge of the fundamentals. I realized my understanding of the basics was very simplistic and as far away from "hardcore" as imaginable. I spent years working in the industry, and made the giant early mistake of not asking "how does this actually work underneath?" for way too long, so now I'm embarking on the likely life-long journey to fix that.

I realize that it's impossible to gain truly deep knowledge into every aspect of web engineering, but I believe that given time one can still be proficient in most of its areas. More importantly, this knowledge should be abstracted from the "flavor of the month" technology, and it should enable one to quickly learn and adapt. Also I think this knowledge should be pragmatic, practical and highly relevant to the real world and business application.

For starters I began learning more about how programming languages work underneath, which so far Programming Language Pragmatics 3rd ed has been really good at explaining, at just the right level of detail.

I imagine that some of the other fundamental areas of understanding for a web developer would be:

- operating systems - networking - databases - security - distributed systems - UX

For these areas, and more if you can identify them, would you folks be able to recommend modern and pragmatic texts that would give one a solid level of depth? I realize that there would be plenty of overlap with a standard CS curriculum, the difference would be in a higher focus on practical application, rather than theory.




From my experience what people working as web developers most commonly lack theory-wise besides simply software engineering skills is in-depth knowledge of HTTP and it's guiding architectural principles (REST), so I would recommend:

http://www.w3.org/Protocols/rfc2616/rfc2616.txt (HTTP standard)

http://en.wikipedia.org/wiki/Representational_state_transfer

http://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm (Original publication introducing REST)

Not knowing this stuff makes it hard to write applications that perform well and are search engine friendly, so contrary to what people may think, it's often very practical knowledge. Ultimately you should also strive to understand what happens from the moment you type in the URL to the moment the page is displayed on the screen, so some understanding of DNS, basic routing, etc. You should also know how to use all the classic tools: ping, traceroute, host, dig, netcat, ngrep, nmap and so on and so forth, it will make it easier to understand the theoretical things and will very often come handy in practice. It's also a step towards another important goal - knowing your infrastructure, so probably UNIX, your web server of choice, your database server etc. You can go a long way by just reading the manuals and the user guides for those things.

Another aspect is knowing the modern browser well. You should learn some JavaScript, learn to use FireBug or Chrome Developer Tools, learn about the DOM, the security issues that the browsers have to deal with etc. Douglas Crockfords videos are a good way to learn this:

http://www.youtube.com/playlist?list=PL7664379246A246CB

http://www.youtube.com/playlist?list=PL5586336C26BDB324&...

There is a nice book about web security from Michal Zalewski, a well-respected security researcher currently at Google:

http://lcamtuf.coredump.cx/tangled/


The guy says he's worked in the industry for years; presumably he knows about REST and ping.

I read his plea in exactly the opposite way -- instead of learning more of (or about) these tools, he's after fundamentals.

In my opinion, the "fundamentals" are just decisions made by concrete people to address concrete concerns (at the time). Understanding them is understanding the motivations behind them. So it's as much a history lesson as it is technical.

Some of those people and decisions may be lost in time, like tears in the rain, but most have been documented. For a "networking" example, see this recent HN thread https://news.ycombinator.com/item?id=3407777 and the discussion there.


In the first sentence I think you are nitpicking. After all I pointed to those tools as an aid for understanding how things work "under the hood", which is what the OP is after as far as I understand. Even if you take "ping", (which I could have as well not included in the list, as the other tools are less obvious but still tremendously useful) almost everyone knows what it's for, but fewer people know various options it has and yet fewer people know how it really works, what protocol it uses etc. In a similar way, lots of people have used MySQL or PostgreSQL for years, but never read the manuals and have little understanding of all the various capabilities they have and the ins and outs of how they work. It's a blend of theory and practice.

I agree with the rest of what you said, and while I did not state it explicitly, I share the sentiment, in fact the Douglas Crockford videos I linked to give a quite detailed account of the development of the Web, including browsers, JavaScript, AJAX etc.


Excellent, then our posts complement each other :)

And sorry, I didn't mean to nitpick and your links are certainly useful; god knows what "years in the industry" means anyway.


I think it's really good advice. I was relegated exclusively to the backend for a while and so I had a terrible understanding of some of those concepts. Spending some time on Rails led to inevitable tangential learning of HTTP and REST, which really opened my eyes about the bigger picture.

Same thing about JavaScript, as you said, it's the language of the web and needs to be mastered. I was pretty abstracted from it for the longest time, but just yesterday a website I had to use was misbehaving and I managed to track down the issue right away with the Chrome tools. It's really empowering, although I'm still far from being proficient.


I'm probably atypical in that I came to the world of web engineering from the depths of low-level C++ systems development. But I can honestly say that once you've written a perf- and security- conscious file system driver, pretty much nothing in the web world will be confusing or surprising (except maybe design/UX).

Of course, this is terrible advice to take at face value and I won't suggest that you should go become a C++ master for great good.

But I stand by my theory that it really doesn't matter how you approach the learning. If you work on challenging problems in any area of computing, you'll realize that most of this stuff is just different people solving the same basic problems with different terminology.


ok, looking at programming language pragmatics it seems like you want fairly serious (university level) theory. so i don't understand why people are talking about regexps - that should be covered in a subsection (regular grammars) of one chapter of that book.

anyway, for databases i would suggest date's "sql and relational theory" - it shows how sql is related to a more elegant, cleaner, underlying theoretical approach and will get you to think in more declarative terms.

for general programming, sicp is worth reading if you haven't read it already.

for algorithms, either (or both - they are very different) cormen et al (lots of tedious detail - you need to choose which bits to read) or the algorithm design manual http://www.algorist.com/

you don't ask about numerical work, but for anyone interested, gershenfeld's "nature of mathematical modelling" is concise, broad, and has some very deep insights (the few pages on wavelets are the best i have read anywhere).

for security, scheneier's applied crypto is a pretty good intro (again, skip some bits, and remember it's old). and anderson's "security engineering" is the other classic (which i should read...).

for ai (again, you didn't ask) both norvig's books are good (the more modern one is aimed at modern statistical approaches that work, but the older one has a lot of good basic programming and lisp).


Speaking of Structure and Interpretation of Computer Programs (SICP), there's a good UC Berkeley course based on it up at http://www.academicearth.org/courses/the-structure-and-inter.... It's very long, but if lectures are more your learning style, you might find this very useful.


so i don't understand why people are talking about regexps

The author of the post made mention of the knowledge applying to web development and then said the following towards the end of the post:

the difference would be in a higher focus on practical application

Regular Expressions is kind of the tl;dr on the topic. They are practical to web development but require a deeper understanding of a particular area of computing.

I took his post to say, I want to go deep, I want it to be applicable to real world web development, and I want it to be targeted topics. I think regular expressions fit that bill. Sure there are others, and maybe some overlap but I personally mentioned regular expressions because it has a high degree of misunderstanding, but also a high degree of usefulness in web development.


One book many engineers have cut their teeth on for OS would be Andrew Tanenbaum's Modern Operating Systems, 3rd ed.

http://www.amazon.com/Modern-Operating-Systems-Andrew-Tanenb...


You could try the HN archives: http://www.hnsearch.com/search#request/submissions&q=eve...

Most of these links to really good papers. You should also read this about memory: http://lwn.net/Articles/250967/


I'd suggest reading this if you want to be a technical co founder. There's more to running a successful web app than building it.

http://www.amazon.co.uk/Web-Operations-Keeping-Data-Time/dp/...


This book is fantastic.


One of the most often neglected areas of study is regular expressions, they are looked at as a kind of black art by many developers, even senior developers. The reality is though regular expressions can save volumes of code when applied to certain pattern matching problems. I cannot stress enough how powerful they are. If you master regular expressions you will be ahead of 90% of developers in your ability to process text for patterns, which comes up as a common need in the web many times. The mastering series has a book on regular expressions and it has high ratings, but I will stop short of recommending it because I personally have not read it. Someone else may chime in on whether or not the ratings are valid for it.


Yes, also a basic understanding of how regex works under the hood is really helpful in cementing the concept (and also understanding things like why you can't properly parse HTML with them alone).

So I'm talking about stuff like NFAs (nondeterministic finite state automata) DFAs (deterministic finite state automata).These concepts are nowhere near as complicated as their names make them sound they are in fact generally quite intuitive, a reasonable grounding in basic set theory will help you here.

It will also give you a much better understanding of how programming languages are interpreted 'under the hood', in fact one of the first things any compiler/interpreter does to your code is essentially run it through a fancy regex engine.

This leads you on to making realizations not only about the performance implications of pattern matching in text, but they are also keys to solving a variety of problems in a simple way.

In reference to a specific text, I learned the concept from the early chapters of 'the dragon book' but I'm sure gentler introductions are available.


Excellent suggestion. Any literature on automata that you'd recommend?


He did mention the dragon book which is actually titled Compilers: Principles, Techniques, and Tools but is better known by the dragon title. It is a seminal work and still regarded today even though it is older. But it is by no means a light read, it will definitely give you a deep understanding of computing, but it's a little heavy if you just want to develop web apps.


I learned from reading Introduction to the Theory of Computation for a class... It's expensive though.


Well I think RE's are horribly over used. I think this comes from an unjustified fear about parsers or complete ignorance that they even exist by most programmers. For example lets write a RE to recognize and extract a phone number, here is your spec: http://en.wikipedia.org/wiki/Local_conventions_for_writing_t...

Have you screamed yet, for example in the US the exact same phone number can be written like so, some of the variants I have seen:

(516)-123-4567

1(516)-123-4567

1-516-123-4567

123-4567 old fashioned but valid

1.516.123.4567

now with a parser I can use RE for the simple bits that turn a text stream into bits of data and then hand it off to the to the parser to figure out if it is a good phone number. So the lexer, where the simple RE live, can turn a stream of text into a stream of tokens that the parser can reason about, for my US example above here is the list:

1: a string of digits called NUMBER and what they are

2: a ( called LP

3: a ) called RP

4: a - called DASH

5: a . called DOT

Then I can parse the 5 things above to figure out is it a valid phone number in nice readable and maintainable code.

Doing the above also makes you much more immune to changes in the RE libs behavior, I have gotten bit by greediness changes in perl before on bug fix releases because I did not follow my above advice. Another benefit is that in 6 months is that you can figure out what you did and so can the next guy


You raise a good point, I tend to like them for parsing but they are not silver bullet for sure. They do have their weaknesses, that being said, when I use a regular expression, I always document it for myself and other developers, even if it is redundant information I document it, due to the fact that they are not readable and I tend to not like magic code.


Don't get me wrong I like REs, its just I like them like salt in my food. It is very easy to ruin the food by putting in too much salt, but put in the right amount and delicious.


There's room for an introduction to regular expressions for non-programmers. Perhaps a "Learn regular expressions the hard way"?

(I just checked, there is an early LregexTHW. I'm going to be doing some reading.)

(http://regex.learncodethehardway.org/book/)


http://regex.learncodethehardway.org/ thats a good one on regular expressions. He also has a good one about SQL - it's incomplete, but what is done is very sweet. I'm also trying to go the same path as you. I know a bunch of java and python but when I look at the source code of just about anything I don't understand what anything is doing, and it's very frustrating, so I'm trying to build a very simple framework in python from scratch - but I've no idea how to go about it. I got a million texts but no idea how to connect them.


Learn by doing.

Get involved in some open source projects and start contributing.

Doing that has taught me more than any book has.


Well, the OP specifically asked for texts on the fundamental subjects of CS as they relate to web development.

Since you bring up a good point though, what would be useful open source projects to contribute to if you're specifically interested in exploring deeper issues?

For myself, I remember learning about networking (and especially coding the network layer in C) and also a bit about crypto by tinkering with World of Warcraft unofficial open source servers two years ago. Wish I had the chance to contribute something back before real life took over again.


I don't think you need to necessarily choose one or the other. I've gotten a lot out of texts that give you practical assignments that get you thinking about what you just learned.

I feel that learning in general is an iterative alternation between theory and practice. At the end of the day if you get both sides of that coin, you're doing fine.


This, one thousand fold.

A lot of projects on the "Get Involved" page or whatever, have a list of "prerequisite' knowledge. Don't let this intimidate you, often playing with the source is a much better way to learn about the subject than trying to research that topic, feel you understand it, then looking at the source.


I have been keeping a list of books I used to augment my CS Masters Degree courses on various topics, here are the relevant ones I have found useful for the topics you have listed:

--Computer Organization--:

Computer Systems: A Programmer's Perspective http://www.amazon.com/Computer-Systems-Programmers-Randal-Br...

I liked this much better than Computer Organization and Design by Patterson and Hennessy which everyone has encountered at some point. The developer-centric view was very cool.

--Computer Security--:

Kernel Exploitation: Attacking the Core http://www.amazon.com/Guide-Kernel-Exploitation-Attacking-Co...

Most 'hacking' books are goofy. This one is very good and doubles nicely as a hackers operating systems text.

Web Application Hackers Handbook http://www.amazon.com/Web-Application-Hackers-Handbook-Disco...

Very nice overview for web concerns.

--Operating Systems-:

Operating System Design and Implementation http://www.amazon.com/Operating-Systems-Design-Implementatio...

I don't agree with Tanenbaum's views on micro vs. monolithic kernels but this book is a great mix of theory and implementation.

Linux Kernel Devleopment http://www.amazon.com/Linux-Kernel-Development-Robert-Love/d...

I used this to get a feel for the monolithic implementations of topics covered by Tanenbaum.

--Networking--:

TCP/IP Illustrated Series. More than you would ever want to know.


If you're serious about fundamentals be sure to at least skim Fieldings's PhD thesis[1] which lays the ground work for HTTP, especially the chapter on REST. His ideas are powerful, clear and very influential, but have been misunderstood time and again.

[1] http://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm


You should read Richard Stevens Unix Network Programming for the networking part of that list.


I'm not sure what other people will recommend, but I don't personally feel that you need such a deep knowledge of fundamentals to be a good technical co-founder. More important is your ability to pick things up when you need to.

You don't need to be a decent DBA, a pen tester and a network admin to get your startup off the ground. You need to be able to wear all the hats, but by the time you actually need a master of each skill you'll have the revenue to employ them.

That said, these two paths may actually converge. If you're interested in learning and read a wide range of authors and topics, you'll probably be good at JIT learning stuff when you need to change hats.


Off the top of my head, an understanding of internet architecture is probably a fundamental thing to have. IP, TCP, HTTP, DNS, etc.

http://www.amazon.com/TCP-Illustrated-Vol-Addison-Wesley-Pro...


Link to the 2nd Edition version of this book:

http://www.amazon.com/TCP-Illustrated-Protocols-Addison-Wesl...

(Didn't even realize it had been updated until I checked!)


BTW, if you decide you want the Kindle version, there's a $10 promo code. I'm not sure if you need to be signed up to Amazon Student for this, so YMMV:

http://www.amazon.com/gp/feature.html?ie=UTF8&docId=1000...


Worked for me, thanks!


+1 to this. No matter what you end up doing in your career, having a good idea about what's going on the wire is never wasted.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: