Hacker News new | past | comments | ask | show | jobs | submit login

Some complicated pipelining optimization that claims to reduce the critical path of each pipeline stage, increasing the clock speed and/or decreasing the required voltage for a given clock speed, written in impenetrable patentese.

The easiest pipeline optimisation, which is basically free hardware-wise and is documented in the academic literature for years, is to move the computation of W + K + H to the previous pipeline stage, reducing the critical path. This is a trivial win because all the data is available already and H can be discarded afterwards, which also has the nice bonus of mapping more efficiently to certain FPGA hardware. It sounds like Intel have taken this and expanded on it to move even more things into earlier and later stages of the pipeline.

Or, as Intel puts it, "The precomputed (H.sub.i+K.sub.i+W.sub.i) may be stored in the 32-bit register 402 dedicated for H.sub.i. This optimization reduces the critical path for the computation of E.sub.i+1 by one CSA or approximately three logic gates." I implemented this back in 2011 in some open source FPGA mining code, and it was an old trick even back then. They don't seem to be citing any prior art, which is a bit dubious.




I take it you understand the specific optimization.

Is there a reason they keep talking about a '32-bit' this and that? or should that be '32 byte' as that would be 256 bits?


SHA-256 operates almost entirely on 32-bit values, because it's designed to be efficient to compute on general-purpose CPUs without special hardware support. The SHA-256 state consists of 8 32-bit integers, of which 2 are updated each round. (SHA-512 uses 64-bit integers instead for some reason.)




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: