..:: Sandy Bridge Microarchitecture ::..
As you can see from the die overview image above, the first round of Sandy Bridge processors will max out at four physical cores, leaving the hexacore Gulftown Core i7-980X as the king for the next year. What’s surprising though, is the fact that the entire Sandy Bridge die, with integrated graphics, is merely 225 mm2. That’s smaller than Gulftown and the other 45nm Quad Core processors. Isn’t technology grand?
Sandy Bridge will be offered in a dual core configuration, as well as quad core. Hyper-Threading will be offered on a variety of products, but will be standard on the Core i7’s. One interesting point is that the standard Core i5 products will not have Hyper-Threading, even though the Core i3 products will. This is likely to add a performance differentiator between the i5 and i7 processors. The Level 3 Cache will roll in at 8MB for i7’s, 6MB for i5’s and 3MB for i3’s. All processors will support 16 PCI Express 2.0 lanes and a dual-channel DDR3 1333MHz memory controller. Finally, there will be two “versions” if you will of the integrated Intel HD Graphics. Only the Core i7’s will feature support for HD 3000, wile the Core i5’s and i3’s will have HD 2000. The main difference between the two is the maximum operating frequency and 6 vs. 12 shader units. HD 3000 tops out at 1350MHz, with HD 2000 topping out at 1100MHz. Finally, the TDP for the standard Core i7 and Core i5 processors is 95W while the Core i3’s are rated for 65W. Intel will also be offering Low Power processors that range from 65W down to 35W.
Now that we have a general overview of Intel’s Sandy Bridge offerings, let’s take a quick look at some of the interesting microarchitecture improvements made for Sandy Bridge. It’ll be quick and painless to keep you awake, I promise.
..:: Sandy Bridge Microarchitecture Continued… ::..
First off, we have the new Decoded Uop Cache. As incoming instructions are processed, they must first be decoded into a micro-op that the processor logic can utilize. Sandy Bridge caches these incoming instructions and as new instructions roll in, it checks the cache to see if they already have been translated. If the cache contains the instruction’s micro-op, it will serve the remainder of the pipeline and the front end decoding unit can be shut off. The result of this act is significant power savings within the processor. According to Intel, the micro-op cache can hold roughly 1500 micro-ops, and is part of the L1 instruction cache and the hit rate is around 80%.
Next up, we have a rebuilt Branch Prediction Unit. The BPU is an area of the microarchitecture that is constantly under refinement. The reasoning behind this is simple. If the BPU fails to take the correct branch, the entire pipeline has to be stopped and flushed. This is a big hit in both performance and in power, hence it is one of the most critical pieces of the microarchitecture. Intel’s new BPU stores branch addresses and a prediction history in a new method to allow for storage of a longer prediction history. A longer history and more branches means quite simply that the BPU in Sandy Bridge has a significantly better chance of taking the correct branch.