Modeling energy distribution in SoCs is turning into more and more necessary at every new node and in 3D-ICs, the place tolerances involving energy are a lot tighter and any mistake could cause useful failures.

At mature nodes, the place there’s extra metallic, energy issues proceed to be uncommon. However at superior nodes, the place chips are working at greater frequencies and nonetheless consuming the identical or larger energy, much more present must feed into the chip in a a lot smaller space. Understanding the place the facility goes, the way it will get there, and what can disrupt the stream of electrons is turning into a serious problem.

“Energy density is growing, and due to that voltage drop is growing, as properly,” stated Rajat Chaudhry, product administration director at Cadence. “Transistors have to function at a decrease voltage relative to 10 years in the past, and which means there’s much less room to play with how a lot drop we are able to have on the facility community. We should be way more correct about our evaluation to verify the transistors are getting the suitable voltage so the chip operates on the proper frequency.”

That has an impression on timing evaluation, throughout which it’s anticipated that the transistors could have an acceptable voltage obtainable at their terminals. “For instance, you might do the timing whereas assuming the voltage doesn’t drop under 0.eight volts — or there’s a variation of perhaps 20 or 30 milliwatts round that.” Chaudhry stated. “If the voltage on the transistor terminals drops rather a lot, or goes a lot greater than that, it may trigger timing failures and useful failure of the chip.”

Others agree. “As transistors are being switched quicker, at very low voltages, designs can’t afford to waste any voltage from drop,” stated Marc Swinnen, director of product advertising and marketing at Ansys. “Up to now, if 200 millivolts was misplaced to drop on 5 volts, nobody cared. It was nonetheless a lot for the transistors to modify correctly. However when you have a fraction of a volt, you may’t afford to lose any of that in voltage drop alongside the way in which. Whereas transistors are switching quicker, metallic layers have additionally gotten thinner and narrower, so the resistance of the metallic layer has been going up considerably. All of a sudden, extra energy must be squeezed by longer wires which are thinner, so the voltage drop drawback has develop into more and more acute, and the facility provide community should be analyzed to find out the voltage drop.”

Modeling energy distribution was not thought of important within the early days of chip design.

“The ability provide consisted of energy and floor rails, and the transistors related between the facility and floor,” Swinnen stated. “Every row had a rail, and a double ring went across the chip the place all of the rails tied into — an influence ring and a floor ring. Every rail tied into the suitable one, and that was it. The ability rails have been large enough that it wasn’t an issue. Metals have been thick and broad. Distances have been brief. Voltage drop was not a problem you needed to take care of a lot.”

As chip sizes elevated, the rails between the rings turned so lengthy that the facility coming down the ring then needed to journey down the rail. However the center of the rail was far-off from the ring, and so voltages beginning dropping such that there’s a midway level the place the voltage is minimized. Whereas straps have been drawn to alleviate this challenge, over the previous decade, energy has emerged as an acute drawback.

“Energy went from being a non-problem to a minor drawback,” Swinnen stated. “Now it’s a serious drawback to the purpose that it’s one of many important sign-off instruments. The primary method for lowering energy on the chip is to scale back the voltage, and because the energy is proportional to the sq. of the facility provide voltage, reducing the voltage has a big impact on the facility throughout the board. It’s a easy direct method of reducing voltage as voltages have frequently gone decrease, even to ultra-low voltages that hardly scrape above half a volt.”

Modeling energy distribution on SoCs is crucial for IR drop evaluation, but it surely additionally is critical for thermal and timing work.

“Traditionally, engineers have margined their static timing evaluation with thermal distribution budgets and IR drop budgets,” stated Javier DeLaCruz, distinguished engineer and senior director of system integration at Arm. “Nevertheless, this further timing margining means much less efficiency, so utilizing extra correct thermal and voltage fashions are wanted to depart much less efficiency on the desk.”

Modeling on-chip energy distribution is very essential for 3D-ICs, famous Sutirtha Kabir, R&D director at Synopsys. “The priority is that designers have been already struggling to get energy as much as the transistors even for a single die design, and now they’re listening to they must push energy by all these stacked die, and there will probably be voltage drops all over the place.”

3D-IC has been mentioned for years, but it surely stays an rising discipline as a result of challenges of thermal dissipation, numerous kinds of noise, and complicated floor-planning.

“Single-die design has been carried out for 20+ years,” Kabir stated. “There’s a number of expertise and issues to look again at. 3D-IC remains to be very new. It’s not that the design crew has carried out 5 of those designs and so they know precisely learn how to construct it. Now, perhaps for the primary time, they’re nervous that, ‘I’m now going to take this energy design community early in my design, and it must be carried out within the context of the entire 3D-IC. I can’t simply say I’m going to design this for its personal energy design. It must be the facility demand for this IC plus no matter is over there. So if I don’t do even a back-of-the-napkin, actually first rate calculation very upfront, and have a prototype energy supply community design, I can’t return and repair this afterward.’ If modifications are made at a later stage, it’s going to impression one thing over there and one thing over right here, and also you’d have to return and re-design the complete energy community for the entire 3D-IC.”

Cadence’s Chaudhry agrees. “Beforehand, once you simply had the normal one chip in a package deal, the facility used to return from the board, then by the package deal and onto the chip,” he stated. “Now, there are a number of chips or chiplets packaged collectively, and generally they’re stacked on prime of one another. In lots of instances, the facility distribution doesn’t simply come by the package deal to the chip. It truly comes by the package deal, by one chip to the opposite chip. That provides one other stage of complexity of modeling {the electrical} traits of the facility community. It provides to the scale of the facility community, and now you could have a number of chips. That’s an space the place there will probably be extra innovation required, and it already is mirrored in business work with foundries.”

Affect on reliability
As a result of the facility distribution community carries present to all of the transistors, over time it causes electromigration. In impact, this drift of electrons can lead to structural modifications within the wires.

“The extra unidirectional present, which occurs within the energy community, the larger the electromigration challenge,” Chaudhry stated. “Evaluation is required to verify we don’t have electromagnetic defects associated to electromigration. We have to measure how a lot present every wire within the energy community will carry, and primarily based on the present density in that wire, how lengthy will we anticipate it may operate with out having some structural injury to the wire.”

Alongside of that, the continued shrinking of options requires thinner metallic layers, and at 7nm and under the upper resistance worth of those layers could cause a number of localized drop as a consequence of simultaneous switching.

“When there are a number of cells collectively, switching concurrently, they will trigger very excessive drops for a really brief length. These drops could cause useful failures, so now we have to additionally mannequin the switching,” Chaudhry stated. “At 28nm or 40nm, we’re extra involved in regards to the basic stage of the drop of the facility grid. However now, the large a part of the drop comes from the decrease stage of metals, and in a localized method. So we have to begin modeling the localized switching of the transistors, and we have to principally mannequin all over the place. On the similar time, we have to perceive at what time every cell switches, the way it’s switching concurrently with the others, and this requires greater accuracy about when one thing switches. We have to additionally cowl many extra situations, as a result of beforehand it was, ‘Let me work out the typical energy consumption of this chip.’ However now I want to start out modeling each native space of the chip, and the doable mixtures of switching. And provided that cells change in numerous components of the clock cycle, these cells that change concurrently are those which are going to trigger the drop, and you must be very cautious. It is advisable to mannequin once they change additionally, so the timing turns into essential, too.”

Complexity associated to simultaneous switching and native transient results turned significantly troublesome at 7nm and under.

“5nm was the purpose when the business realized it actually wanted to doing one thing about it, and the way in which we deal with it’s with vector-less modeling, the place completely different switching situations are modeled,” he stated. “The issue is there are infinite potentialities. Designers can solely get perhaps 10, 15 vectors, however you could have infinite potentialities. So then we’ve got to give you vector-less strategies, whereby the instruments give designers the flexibility to mannequin much more switching situations. Vector-less strategies have gotten extra advanced, and they’re permitting designers to mannequin extra of those switching situations.”

One other drawback is the scale of the facility distribution community. That is significantly problematic for AI chips, which are usually extraordinarily massive, with an enormous variety of nodes on an influence distribution community. In truth, some have as many as 100 billion nodes on an influence distribution community. To simulate that, the instruments should have the ability to deal with that capability and nonetheless full the evaluation in an inexpensive period of time.

“Taking a look at chip with 50 billion transistors, which means there are 50 billion floor and 50 billion energy factors to hook up with,” stated Swinnen. “On this community of the facility provide, every little piece of wire must be modeled as a resistor, so you find yourself with billions and billions of resistors. There are designs now with 60 billion to 100 billion nodes on {that electrical} mannequin, which must be lowered so it may be simulated.”

Advances in EDA simulators make that doable. They will simulate these designs to offer a point-by-point voltage map of precisely the place the present goes and the place the voltage is at each level.

“Electromigration comes together with this, and since electromigration is a reliability challenge it’s a must to know what the present flowing by all of the wires is,” Swinnen famous. “We simply calculated that from voltage drops, so we would as properly do electromigration, too. And since electromigration is very temperature-dependent, it means thermal-aware electromigration evaluation can also be wanted.”

All of that sometimes is outlined earlier than place-and-route. “The position assigns rows which are empty, and that’s the place the cells are going to go,” Swinnen defined. “You set the facility provide in, after which plot the cells and the remainder. The cells don’t simply go anyplace. They match on the construction that’s there. On the strategy planning stage, because the design isn’t positioned and routed but, you analyze primarily based on predictions. And at that time, it’s an optimization drawback for the entire chip. We’re discovering there are many levels of freedom, like how broad do I make my wires? What number of straps? What’s the pitch between the straps? You possibly can experiment with a number of points.”

AI/ML-driven optimization instruments could be useful right here. They will take a lot of variables, do some modeling on these variables, and produce a mathematical mannequin for the impression on every of these variables. Then, a Monte Carlo simulation could be run throughout all doable mixtures of those variables to establish the sensitivity of the variables. For instance, what’s the sensitivity of the pitch versus the width of the wire versus the scale and variety of vias? These instruments can take a really advanced, multi-dimensional optimization drawback and crunch it into an optimum resolution.

Nevertheless, issues aren’t at all times clean crusing. “Let’s say you’ve carried out place-and-route and also you’ve been refining your timing,” Swinnen stated. “You’ve carried out a number of work to get your timing to shut. You do a voltage drop examine, and you discover some voltage drop points. It is advisable to repair it, however the repair is commonly very disruptive to your timing, so it’s at all times been tough. How do I repair IR drop with out disrupting my timing as a result of it comes late within the stream? The tendency has been to keep away from IR drop by paying the value with over-dimensioning the facility provide. However since energy provide evaluation is determined by the exercise of the circuit, if there’s nothing switching, there’s no energy being drawn. So identical to in energy, exercise is central to this. The place do you get your exercise from? We’ve got the identical points as we’ve got with energy evaluation. So we’ve got vectored exercise. The person can present an inventory of vectors, or there’s a vector-less method, the place we calculate it ourselves underneath the hood.”

3D-ICs muddy the waters, and designers should pay attention to by silicon vias (TSVs) that go from the again to the entrance of the chip and take up important actual property.

“You can not do a place-and-route, you can’t place macros the place TSVs are,” stated Kabir. “And TSVs are what carries the facility from the again of the chip to the entrance. This implies if I’ve stopped my energy design, together with my placement, I won’t have room to do placement and routing afterward. Then, any individual will come again to me later and say I’ve to punch up TSVs by your macro, and that’s not going to work.”

Along with static, you even have to fret about dynamic and switching energy, he stated. “One thing within the PCB card truly could find yourself frying your chip. And should you don’t take that into consideration and do a system-level energy integrity and sign integrity evaluation, the possibilities are good that your system could fail.”

Lastly, all of this modeling must be carried out earlier within the design stream. “Beforehand, you’d design the chip, and this energy distribution examine was sometimes what we name sign-off proper earlier than you tape out the chip. You be sure every thing’s high quality and also you get few errors and repair them,” Chaudhry stated. “Now, as a result of this drawback is turning into so localized, and should you don’t clear up this challenge actually early on as a part of the design, you possibly can find yourself with tons of errors close to the tape out date, and also you received’t have time to repair them.”

That is one other space the place the EDA ecosystem is innovating as a part of the implementation instrument stream, incorporating extra energy distribution evaluation as a part of the implementation course of.

The identical is true for 3D-ICs. Early energy modeling and evaluation are important, as a result of as soon as the stack of die is about, it sometimes can’t be modified and not using a full redesign.

Source link


Please enter your comment!
Please enter your name here