@MohammedSF i can understand the idea that you are trying to say and tell the guys about ingeneral you are right but actually the assumptions and information that you used to do so is not accurate and wrong in some places, i'll try to exlain it better
MohammedSf wroteIf the performance of a single GTX 980 chokes when going down to PCIE 2.0 4x from 2.0 8x, but not from 16x to 8x, then the external throughput of the card should be in the 8GB/s range, ok
Well your logic is right but its not quite the case, yes the GTX980 chockes when going down from from 8x to 4x but that doesnt mean it needs the 8GB/s.. you forgot an important thing in Pcie bus and that is LATENCY, when you increase the bandwidth on a bus the latency will drop down regardless if you are using the full 8GB/s or just 500MB/s, inother words if you are sending 500Mb on a 8GB/s line is slower than sending them on a 16Gb/s line. and PCi express has much more problems than that where not going to talk about it but the size of "packets" you are sending even matters, so you cant quite assume that it is in the 8GB/s range it may be in the 1GB/s range but due to latency in the slower bus the performance degraded.
MohammedSF wroteAlso, that means PCIE 2.0 16x bus is even enough for 2 GTX 980 cards in 8x/8x sli (8GB/s for each card), but what happens if you go tri-sli ?
On a PCIE 2.0 8x/8x/4x configuration the throughput is (8GB/s+8GB/s+4GB/s=20GB/s) which is impossible since PCIE 2.0's max bandwidth is 16GB/s, that means either card #1 & #2 are sharing 12GB/s + card #3 @ 4GB/s, or card #1 runs @ 8GB/s + card #2 & #3 are sharing 6+2GB/s or something...
In both ways you have lost performance and card #3 is choking !
actually PCIe max bandwidth isn't 16GB/s the 16 lanes bandwidth is 16GB/sec but there are more than 16 independent lanes on the system the cpu has 16 lanes yes but the southbridge offers independant lanes too, so no cards won't be sharing bandwidth but rather the one on only 4 lanes with suffer from bandwidth/latency issues.
MohammedSF wroteSomewhere on your board, your latest i7 4790k is connected to the external bus @ 5GT/s (37.25GB/s) while somewhere else the memory controller connects the external bus to dual-channel DDR3-1600mhz RAM @ 25GB/s..
Can you see the bottleneck ? 37.25GB/s vs. 25GB/s ?!
]The performance loss isn't that huge right, you know why ? because a single 4790k processor doesn't benefit from such bus while 2 or more definitely does.
I dunno where did you get that info but the i7 4790k is only connected to the DMI at 5GT/s(Intel terrible marketing number) which translates to around max 5GB/s and not to 32GB/s like the 5GT/s of Pci express,do u see how companies make use of some terms that the "Tech Authorities" around the internet relate to something good lol Smart Intel engineers.
What they mean is that the DMI bus tranfers data at 5GT/s but the DMI has 4 deferential pairs which moves the number to 20GT/s in total sounds great 20GT/s is always better and it is exactly equal to 20Gbit/s because of how this bus works, but this bus uses an encoding scheme that we are not gonna talk about now,it has around lets say 20% overhead which moves the number to 16Gb/s and you divide by 8 which gives us 2GB/s multiply by 2 gives us 4GB/s...ohh well it's 4GB/s bus hehe, and not 37.25GB/s everyone on the internet talks about. smart Intel but not on me =D
anyway also the CPU is connected to the PCIexpress lanes via a ~3-4Ghz QPI bus(no information about Haswell non Xeon cpus also a bad move from intel) translates to somewhere around ~25-30GB/s, and to the memory with a 25GB/s bus. so what you wrote about the bottleneck makes no sense since we are talking about two independent mediums. and the last sentence also isn't true since the 4790k is more than capable of making use of this bus since this Bus is the same one that passes the data around the Cpu not only memory stuff. anyway i hope it is clearer now, if you read the Pci express technical papers you will notice lots of marketing nonsense in how it works and numbers companies provide.
edit: btw if you use single or dual channel ram you are still using a 25.6GB/s bus and not like what you said 12.5GB/s the difference is in something else
edit2: just realized that QPI in haswell is still 3.2Ghz and haven't been upgraded like server Haswell CPUs, simply the 25.6 GB/s memory bus is the same as QPI lol intel says memory bus is 25.6GB/s and its exactly the bandwidth of a 3.2Ghz QPI which dates back to Nehalem days.