..:: Sandy Bridge Microarchitecture Continued… ::..
Yet another large change brings a physical register file back to the die. Intel had previously done away with this feature of the NetBurst microarchitecture, but have brought it back to allow for the new 256-bit AVX operands. The way that Intel utilizes this physical register file is by passing pointer to operands through the OoO Engine stored within the file, rather than passing the data itself through OoO processing. This method allows for avoidance of unnecessary data transfers which each power, and saving die space. Die space is saved because much of the OoO circuitry no longer needs to accommodate up to 256-bit operands for the new AVX instructions. From the slide above, you’ll also see another benefit of the physical register file, increased buffer sizes and increased dataflow.
The execution units of the processor have also undergone changes to accommodate for the 256-bit AVX data sets. Sandy Bridge pairs two 128-bit execution units by joining the existing SIMD FP and SIMD INT within the three ports. The newly created 256-bit wide execution paths now allow for processing of the 256-bit AVX data, while not hampering 128-bit SIMD execution. Intel also allowed for the upper 128-bits of the execution unit to be power gated such that 128-bit data sets don’t operate with any additional power cost.
Due to the increased performance and AVX instructions, the processor load / store units needed to be revamped. For the Sandy Bridge microarchitecture, Intel made the load / store pipes symmetrical for dual use purposes. This doubles the load performance of the processor to handle what is effectively a doubling of the floating point calculation capabilities thanks to 256-bit data sets. The store functionality remains as one unit. As you can see from Intel’s own slides above, Sandy Bridge realizes a 50% improvement in L1 cache throughput because with the two load units, the processor can handle 48 bytes / cycle vs. 32 bytes before.