Nvidia Blackwell GB202 GPU Rumored to Feature 384-bit GDDR7

imaginary_num6er@alien.top · 1 year ago

Nvidia Blackwell GB202 GPU Rumored to Feature 384-bit GDDR7

Masters_1989@alien.top · 1 year ago

Couldn’t care less unless it’s priced much lower. (Same goes for lower-end SKUs.)

DevAnalyzeOperate@alien.top · 1 year ago

I honestly don’t know how well a 24gb 5090 will move, no matter how fast it is. I feel like the gamers will go for stuff like 4080 super, 4070 ti super, next gen AMD. For productivity users, there’s 3090, 4090, A6000.

Maybe I’m wrong and the card doesn’t need to be very good to sell because GPUs are so burning hot right now.

soggybiscuit93@alien.top · 1 year ago

or productivity users, there’s… A6000.

A6000 is a lot of money. For productivity users for say, Blender, you can get the same 48GB of VRAM and more compute for a lower cost if you go with dual 4090’s.

JuanElMinero@alien.top · 1 year ago

GDDR7 memory chips will be in production with either 2 or 3 GB sizes, which means 36GB of VRAM on 384-bit bus could be a possibility for next gen.

jigsaw1024@alien.top · 1 year ago

I could see Nvidia releasing both 24GB and 36GB cards at the top, but charging a large premium for the extra VRAM similar to how Apple does for extra RAM above base.

The BOM difference on 24GB to 30GB, even for new product like GDDR7 should be less than $50 easy, but I can see Nvidia adding $300+ for the 36GB model for the extra profit.

rorschach200@alien.top · 1 year ago

Why actually build the 36 GB one though? What gaming application will be able to take advantage of more than 24 for the lifetime of 5090? 5090 will be irrelevant by the time the next gen of consoles releases, and the current one has 16 GB for VRAM and system RAM combined. 24 is basically perfect for top end gaming card.

And 36 will be even more self-canibalizing for professional cards market.

So it’s unnecessary, expensive, and canibalizing. Not happening.

Flowerstar1@alien.top · 1 year ago

Gaming applications didn’t take advantage of the 24GB when it debuted on the 3090 and they still don’t do for the 4090 now. That’s not what drives these decisions.

rorschach200@alien.top · 1 year ago

The bus width needed to be what it needed to be. That left 2 possibilities - 12 GB and 24 GB. The former was way low for 4090 to work in its target applications. 24 it became.

This is exactly what drives these decisions.

What do you think drives them?

ResponsibleJudge3172@alien.top · 1 year ago

Of course not, memory bandwidth matters much more. We always say this, and now people finally see proof with 4060ti

soggybiscuit93@alien.top · 1 year ago

36GB is certainly a possibility. VRAM demand is high across multiple markets. Currently you can get a 24GB 4090 or 48GB A6000 Ada. There’s certainly a possibility of seeing 36GB 5090 and 72GB A6000 Blackwell (B6000?)

FloorEntire7762@alien.top · 1 year ago

Don’t think so. Rtx Titan from 2018 much faster than ps 5 gpu from 2020. I suppose next gen console gpu will get rtx 4070 level perfomance or slightly above. Ps 4 had hd 7850 perfomance in 2013 so…

DevAnalyzeOperate@alien.top · 1 year ago

36gb of vram on the 384-bit bus would be fantastic, yet I’m somehow sceptical when Nvidia sells the 48gb A6000 for a $6800 MSRP. Even without benefits like nvlink, a 36gb card ought to cannibalise Nvidia’s productivity cards quite a lot. I don’t think Nvidia would actually be TOTALLY opposed to this if they could produce enough 5090’s to not sell out of them since it would help entrench Nvidia’s CUDA moat, but I don’t think Nvidia is going to be capable of pulling that off.

It’s not impossible we see 36gb 5090 and 72gb a7000 or whatever. I’m just not holding my breath especially when AMD doesn’t seem to have much in the pipeline to even compete with a 24gb model.

Dangerman1337@alien.top · 1 year ago

Can see 5090 30GB and 5090 Ti 36GB and then a 72GB Quadro.

ZaadKanon69@alien.top · 1 year ago

32GB 5090 and 24GB 5080 is the most realistic configuration.

Also expect both of them to have ridiculous prices. $1500+ for the 5080 and $2500 FE MSRP for the 5090 wouldn’t surprise me. AMD is skipping high-end for 1 generation so their competition will likely be a $1000 5070Ti. The 7900XTX or a refresh of it will be AMD’s flahship until RDNA5. They have their valid reasons for that but it’s very bad news for Nvidia customers, as much as they like to bash AMD.

Nvidia also wants to protect their way more expensive professional lineup so especially the 32GB 5090 will be priced to the moon.

lusuroculadestec@alien.top · 1 year ago

Rumors have a 5090 with GDDR7 and a 384-bit bus. Micron has GDDR7 modules on their roadmap as 2GB and 3GB. This means that the memory configurations for 2GB modules will be 24 or 48GB, and with 3GB modules it will be 36GB or 72GB.

32GB would imply it’s a 256 or 512-bit bus, neither of which are very likely for a xx90. I could see them maybe going as low as a 320-bit bus for 30GB. Even 33GB with a 352-bit bus is more likely.

The 5080 will be another thing, 24GB would imply a 256-bit bus with 3GB modules. Nvidia has been all over the map with the xx80 memory width, so it will be anyone’s guess. If they prioritize memory bandwidth and use a 320-bit bus, a 20GB card is most likely.

GDDR prevents having arbitrary memory sizes.

ZaadKanon69@alien.top · 1 year ago

A 20GB 5080, probably at an even higher MSRP than the 4080 due to a lack of competition, would be criminal… There was supposed to be a 20GB 3080 for crying out loud. And games will def go over 16GB before next next gen so 4080 owners will face a VRAM bottleneck and then their upgrade option is a $1500 5080 omg.

I heard the 512-bit rumor and thought Nvidia was FINALLY fixing their VRAM issue across their entire product stack… Sigh.

Keulapaska@alien.top · 1 year ago

32GB 5090 and 24GB 5080 is the most realistic configuration.

A 32GB 5090 would mean 512bit bus, which this rumor is saying it will not have, contrary to the original rumor. So it’ll either be 24GB or 36GB(48GB also, but that’s probably not happening) as there will be 24Gb GDDR7 modules as well in addition to 16Gb ones(idk if mixing different capacity memory modules is a thing though to get like 30GB out of 384 bit bus, I’m just guessing no). Maybe they do both versions, or leave the 36GB as a potential 5090ti for later on, who knows.

ZaadKanon69@alien.top · 1 year ago

It’s practically guaranteed, there is no other realistic configuration.

I also expect the 32GB 5090 to launch at $2000-2500 MSRP and the 24GB 5080 at $1500+ because AMD is skipping a generation and the 5070Ti will likely match AMD’s offering at $999. The 7900XTX or a refresh of it will remain their top card until RDNA5.

Next gen Nvidia prices are going to be absolutely bonkers, worse than now. And people will buy them anyway… Especially because many have skipped the 4000 series hoping things would improve. With RDNA5 and more production capacity from new fabs prices will likely improve but that’s at least 3 years in the future.

redstern@alien.top · 1 year ago

Nvidia for Geforce 5000: You get 8GB, and you will deal with it. $2500 not please.

JuanElMinero@alien.top · 1 year ago

Am I reading those Cuda core projections right?

GA102 to AD102 increased by about 80%, but the jump from Ad102 to GB202 is only slightly above 30%, aside from no large gains going to 3nm?

Might not turn out that impressive after all.

bubblesort33@alien.top · 1 year ago

I would bet on not a significant upgrade as well. Performance per transistor in Ada vs Ampere actually went down, mostly because they spend so much transistor budget on cache. 170% more transistors for 65% more performance, if you assume the full potential AD102 die is 15% faster than a 4090. But even before AMD and Nvidia started playing with massive caches, you could always relatively accurately predict how fast a GPU would be based on transistor count.

Cache is not shrinking in die area with new nodes. If this thing has 128MB, and the rest of the die stays at 600mm^(2), the area dedicated to logic would be smaller than AD102. Unless they start stacking cache that is. Would not shock me, if most, or all the L2 is on a 2nd layer.

I’ve heard 60% more logic density going from 5nm to 3nm, but who knows how optimistic those numbers are as they are probably best case scenario. Can’t but imagine a real GPU application would at maximum reach 40-50% more logic density, and I can’t imagine a performance uplift higher than that, unless they make it a 800mm^(2) die which I don’t believe.

capn_hector@alien.top · 1 year ago

GA102 to AD102 increased by about 80%, but the jump from Ad102 to GB202 is only slightly above 30%,

Maybe GB202 is not the top chip, and the top chip is named GB200.

I mean, you’d expect this die to be called GB102 based on the recent numbering scheme, right? Why jump to 202 right out of the gate? They haven’t done that in the past, AD100 is the compute die and AD102, 103, 104… are the gaming dies. In fact this has been extremely consistent all the way back to Pascal, even when there is a compute uarch variant that is different (and, GP100 is quite different from GP102 etc) it’s still called the 100.

But if there is another die above it, you’d call it GB100 (like Maxwell GM200, or Fermi GF100). Which is obviously already taken, GB100 is the compute die. So you bump the whole numbering series to 200, meaning the top gaming die is GB200.

There is also precedent for calling the biggest gaming die the x110, like GK110 or the Fermi GF110 (in the 500 series). But they haven’t done that in a long time, since Kepler. Probably because it ruins the “bigger number = smaller die” rule of thumb.

Of course it’s possible the 512b rumor was bullshit, or this one is bullshit. But it’s certainly an odd flavor of bullshit - if you were making something up, wouldn’t you make up something that made sense? Odd details like that potentially lend it credibility, because you’d call it GB102 if you were making it up. It will also be easy to corroborate across future rumors, if nobody ever mentions GB200-series chips again, then this was probably just bullshit, and vice versa. Just like Angstronomics and the RDNA3 leak, once he’d nailed the first product the N32/N33 information was highly credible.

scytheavatar@alien.top · 1 year ago

It is already leaked, GB200 is a chiplet design that will be exclusive for server customers. GB202 will be used for the 5090.

Qesa@alien.top · 1 year ago

Well the 512 rumour was kopite, and this is also kopite saying he misinterpreted a 128MB L2$ to mean 512

ZaadKanon69@alien.top · 1 year ago

It’s because AD102 is already a huge monolithic die, a little over 600mm2, with 814mm2 being the theoretical limit. In short: the bigger the die size the lower the yields, there will never be a gaming GPU much bigger than 600mm2 because then you’re looking at terrifying prices.

A node shrink only helps so much, and you don’t want a GPU sucking 800 watts either. The wider memory bus also takes up extra space. The 5090 is still monolithic so a ~30% improvement in performance sounds plausible.

Even worse, AMD (underatandibly) is “skipping” RDNA4 high-end to both maximize AI production and give their engineers more time to get a chiplet design with multiple graphics chiplets working well.

RDNA5 will likely be high end again, but until then, the 7900XTX or a refresh of it will likely remain the fastest AMD card.

Which means next gen Nvidia pricing will go through the roof. We’re probably looking at a $999 16-20GB 5070(Ti) that matches AMD flagship performance so I wouldn’t be surprised if a 24GB 5080 will be priced at $1500-1750 MSRP as the gaming flagship and the hybrid 32GB 5090 $2500-3000 MSRP. Remember the 3090Ti launched at $2000 despite gaming competition from the $1000 6950XT.

… And people will buy them. RIP GPU prices for the next 3 years, at least.

Qesa@alien.top · 1 year ago

It’s highly likely to be a major architecture update, so core count alone won’t be a good indicator of performance.

Eitan189@alien.top · 1 year ago

It isn’t a major architecture update. Nvidia’s slides from Ampere’s release stated that the next two architectures after Ampere would be part of the same family.

Performance gains will be had by improving the RT & tensor cores, using an improved node, probably N4X, to facilitate clock speed increases at the same voltages, and by increasing the number of SMs across the product stack. The maturity of the 5nm process will allow Nvidia to use larger die than they could in Ada.

2GisColorless@alien.top · 1 year ago

lmao

rorschach200@alien.top · 1 year ago

by improving the RT & tensor cores

and HW support for DLSS features and CUDA as a programming platform.

It might be “a major architecture update” by the amount of work that Nvidia engineering will have to put in to pull off all the new features and RT/TC/DLSS/CUDA improvements without regressing PPA - that’s where the years of effort will be sunk - and possibly large improvements in perf in selected application categories and operating modes, but a very minor improvement in “perf per SM per clock” in no-DLSS rasterization on average.

JuanElMinero@alien.top · 1 year ago

True, completely forgot that there wasn’t a very large overhaul last gen.

ResponsibleJudge3172@alien.top · 1 year ago

‘Ampere Next’ referred to datacenter lineup, which ended being the biggest architectural change in datacenter GPUs since Volta vs GP100. And Ampere Next Next, referred to datacenter Blackwell, which is MCM so again a big change

Baalii@alien.top · 1 year ago

You should be looking at transistor amount if anything at all, “cuda cores” is only somewhat useful when looking at different products within the same generation.

ResponsibleJudge3172@alien.top · 1 year ago

Still very accurate if you know what to look for.

For example, the reason why Ampere vs Turing CUDA cores scale different will let you predict how an Ampere GPU scales vs Turing GPU.

It’s also why we knew how Ada would scale linearly except with 4090 that was nerfed to be more efficient

ResponsibleJudge3172@alien.top · 1 year ago

I guess people don’t dig into white papers to learn about how and why the architectures perform as they do

rorschach200@alien.top · 1 year ago

GA102 to AD102 increased by about 80%

without scaling DRAM bandwidth anywhere near as much, only partially compensating for that with a much bigger L2.

For 5090 on the other hand we might also have clock increase going (another 1.15x?), and proportional 1:1 (unlike Ampere -> Ada) DRAM bandwidth increase by a factor of 1.5 due to GDDR7 (no bus width increases necessary; 1.5 = 1.3 * 1.15), so this is 1.5x perf increase 4090 -> 5090, which has to be further multiplied by whatever u-architectural improvements might bring, like Qesa is saying.

Unlike Qesa, though, I’m personally not very optimistic regarding those u-architectural improvements being very major. To get from 1.5x that comes out of node speed increase and the node shrink subdued and downscaled by node cost increase, to recently rumored 1.7x one would need to get (1.7 / 1.5 = 1.13) 13% perf and perf/w improvement, which sounds just about realistic. I’m betting it’ll be even a little bit less, yielding more like 1.6x proper average, that 1.7x might have been the result of measuring very few apps or outright “up to 1.7x” with “up to” getting lost during the leak (if there was even a leak).

1.6x is absolutely huge, and no wonder nobody’s increasing the bus width: it’s unnecessary for yielding a great product and even more expensive now than it was on 5nm (DRAM controllers almost don’t shrink and are big).

ZaadKanon69@alien.top · 1 year ago

More memory bandwidth does not translate 1:1 to more performance. The GPU core is by far the most important. Even at 4K the current 1TB/s memory bandwidth is sufficient and overclocking the core is what gets you the most performance.

We’ve also seen that the 128-bit 4060Ti 16GB with its pitiful bandwidth can utilize its full 16GB VRAM without any issues at 1440P.

So if you’re trying to estimate performance gains, the core is where you should look for now, especially if Blackwell keeps the increased L2 cache (Ampere’s cache was measured in kilobytes, it was a radical change and it definitely worked well for AMD with RDNA2 too). Unless you’re doing 8K gaming the extra memory bandwidth will have minimal impact.

bubblesort33@alien.top · 1 year ago

compensating for that with a much bigger L2

If I understand this article and what kopite7kimi said correctly, it sounds like a 33% cache increase, which he assumed meant a 33% memory controller increase. So 128MB, which they derived 512 bit from originally. That’s not that huge of a jump of cache compared to the current 96 it seems to me.

GDDR7 is supposed to start at 32Gbps, but there is also some claims of 36 Gbps. If you average to the cache (33%) and memory speed (60%) increase we’re talking maybe 45% more effective bandwidth.

ResponsibleJudge3172@alien.top · 1 year ago

Its expected to be like Ampere, Ampere was 17% increase in SMs (rtx 3090ti vs rtx Titan) but the SM itself was improved such that they yielded about 33% improvement per SM in ‘raster’ and massive improvements in occupency for RT workloads. So 3090ti ended up 46% faster in ‘raster’ vs rtx Titan.

The TPC and GPC of Blackwell are rumored to be overhauled with a more hesitant rumor about the SM also being improved.

hackenclaw@alien.top · 1 year ago

it will be the same 24GB of vram, use only for 5090. 5080 most likely gonna use GB203.

I wonder will Nvidia repeat 5060 with 8GB vram again or they going back to 192bit bus.

dog-gone-@alien.top · 1 year ago

Wonder how expensive these cards will be. The cost is getting out of hand and after all these years, I am not excited about new GPUs.

Jeep-Eep@alien.top · 1 year ago

Any chance the next RDNA will sport GDDR7?

ZaadKanon69@alien.top · 1 year ago

Maybe, but it would be unnecessary because RDNA4 caps out at midrange so GDDR6 would suffice. AMD decided to cut RDNA4 high-end to produce more AI chips and earn more money, and give their engineers more time to get multiple GPU chiplets on 1 card working for RDNA5, which is where the real performance boom is at. RDNA3 still has 1 graphics die, only the memory controller and cache is on separate chiplets.

So the 7900XTX will remain AMD’s flagship until RDNA5.

There might be some kind of refresh of the 7900XT(X) with slightly better performance and efficiency, maybe those would use GDDR7 if possible and economical.

The good news for current owners is the 7900 cards have plenty of VRAM to last until RDNA5. The bad news is there will be no competition fir the 5080 and 5090 so expect even higher MSRPs than the 4000 series. $2500 MSRP for a 32GB 5090 wouldn’t surprise me. And $1500 for the 5080, the “gaming flagship”.

If you were waiting for next gen hoping value would improve vs the 4000 series… I hope you have even more patience.

The moment I heard the news about RDNA4 high-end being scrapped and the monster chiplet design was moved to RDNA5, as well as the high AI demand and lack of production capacity I pulled the trigger on a 7900XT because next gen is going to be absolutely bonkers on the Nvidia side and nothing better will be released on the AMD side other than software improvements, maybe a refresh of the 7900 cards but that’s it. This card with 20GB VRAM will last me until RDNA5/RTX6000.

Jayz2cents also made a video a while ago voicing his opinion to buy a GPU now cause it’s only gonna get worse in the coming years. A situation arguably worse than the crypto boom, combined with a lack of competition for 80 and 90 series and Nvidia’s Apple approach… Bad news.

Intel won’t have a truly viable product for general gaming within this timeframe either. Even today their drivers are lightyears behind both AMD and Nvidia with performance all over the place depending on each individual game. And Intel too is making AI chips based on GPUs. The consumer GPUs are like a proof of concept.

Jeep-Eep@alien.top · 1 year ago

It might be a way to improve perf either straight up or by cutting wattage and even with the scrapped big models - assuming that a possible AI crash doesn’t have those tapeouts pulled from storage - they likely have RDNA 4’s GDDR7 controller taped out if the line was to use it.

ZaadKanon69@alien.top · 1 year ago

It would also increase cost, and Navi43/44 don’t need that extra performance. They will probably be slower or at best the same speed as a 7800XT. On the flipside they’ll be dirt cheap too, probably with a good chunk of VRAM, so really good low-midrange cards that actually work in games unlike Intel.

Unless AMD is already going for multi-graphics chiplets and they just slap on four Navi43 chiplets to create a flagship. That would be pretty epic and is an option still on the table, they already have a proper functional chiplet design fir their AI cards. But I think we won’t see that until RDNA5.

Wfing@alien.top · 1 year ago

The current lineup doesn’t even use GDDR6X and it’s selling like shit. There’s just no way.

imaginary_num6er@alien.top · 1 year ago

I thought Nvidia has an exclusivity deal with Micron to ban AMD from using GDDR6X?

2GisColorless@alien.top · 1 year ago

Nah, it’s just expensive + power hungry