Hacker News new | past | comments | ask | show | jobs | submit login

Modern Processor Design: Fundamentals of Superscalar Processors is a great book. It uses the P6 uarch (so Pentium Pro through Pentium III) as a case study to contrast against the PowerPC 620 and is a great place to start as it covers a lot of the details you're asking about. That P6 arch is the basis for modern Intel cores after they dropped P4/NetBurst when Dennard scaling hit a wall. Yes there's been updates, but the book is still basically on point.

Real quick overview for some of the archs I know (that happen to all be x86), cache lines coming in from the I$ fill into a shift register. Each byte of the shift register has logic that in parallel reports "if an instruction started here, how long would the instruction be or say IDK". That information is used to select the instruction boundaries, which are then passed to the full instruction decoders (generally in another pipeline stage). After the byte lengths recognized are consumed, new bytes are shifted in, and the process starts over. This separation between length detection and full decode lets you have 16 or whatever length decoders but only three or four full decoders. Additionally the rare and very complex instructions are generally special cased and only decoded by the byte 0 length/instruction decoders. And even then, sometimes even the byte 0 decoder takes a few cycles to fully decode (like in the case of lots of prefix bytes).

I imagine superscalar processors for other CISC archs have very similar decoders, maybe just aligned on halfwords rather than bytes if that's all the arch needs (like for 68k and s/390).




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: