Friday, December 11, 2009

Intel 48-core cloud-datacenter-on-a-chip



The 45nm SCC has much in common with Polaris, which I described in detail in the article linked above. Like Polaris, the cores are arranged into 27 "tiles," but each SCC tile contains two cores, two L2 caches (one per core), and one router that connects the tile to rest of the SCC's packet-switched mesh network. Also like Polaris, there is some amount of granularity for doing software-controlled dynamic power management by altering the voltage and frequency of the cores. With SCC, each two-core tile can have its own frequency, and the voltage of tiles can be scaled in groups of four tiles. Taken together, these options let Intel scale the power consumption of the chip from 25W up to 125W.

The individual cores that make up SCC are considerably more substantial than those that made up Polaris. The Polaris cores contained some floating-point hardware suitable only for DSP-type applications. The SCC cores, in contrast, are full-blown x86 implementations, albeit incredibly simple ones.

If you divide the SCC's 1.3 billion transistors by 48 cores, then you get about 27 million transistors per core. But each core is actually smaller than that, because the 24 tiles contain networking hardware that looks like it takes up roughly 6.5 percent of each tile; so it's more like 25 million transistors per core. This is a little over half the transistor count of Atom (47 million), and a little more than the transistor count of the AMD K7 (22 million). So we're not talking steller single-threaded performance—but definitely enough to run an OS instance and do plenty of useful work.

In fact, if Intel wanted to commercialize the SCC tomorrow, many datacenters would probably eat it up if the price was right. We've covered the physicalization trend extensively here at Ars, and the SCC would be the perfect platform for it. In fact, it's likely that the best way to use the SCC would be to run one virtualized OS instance per core, so that you could do dynamic power management by shuffling the VMs from core to core in order to keep hot spots from forming on the chip.

The SCC runs a customized version of Linux, and it's likely that the part of the OS that Intel has most heavily modified is the network stack. A section of my previous Terascale coverage is worth reproducing in full, because it's relevant to this point:

[Intel technology strategist Thom] Sawicki said that part of the challenge that Intel sees with massively multicore computing lies in how to handle the IP network stack. If each core were to have its own IP address, then you wouldn't necessarily want all those cores to go through the entire network stack every time they want to talk to each other.

"Would it make sense to take a NIC/MAC-type functionality, where for an 80-core chip ten of those are an integrated NIC/MAC function, or am I better off trying to infuse some of the network functions and logic into each core?" asked Sawicki. "Where is the proper and best breakdown of what gets done today in a peripheral device vs. what part of the stack gets done in the CPU? We get to play with all these [options]," he said, "and that's the work under way in the lab. We'll start seeing over time what the answers are."


Because the cores in SCC can each run an OS instance, Intel and its partners can play with different networking arrangements on the software side and see what works best.

Speaking of partners, Intel is building 100 of these SCC prototypes to share with internal groups and external partners to further parallel programming research. Right now, some academics are making do with FPGA for this sort of research, so having real many-core silicon will be a big step up.

Latest Technology News and Press Release

Software news

GameCrazy - New Release Games - PC