In November, 2021, MSI interviewed Phison Chief Technology Officer Sebastien Jean to learn more about the future of SSDs for gaming and specifically about the work Phison is doing on its new M.2 PCIe Gen5 cards. The lively conversation covered everything from SSD controller design, performance improvements, and heat management, to supply chain challenges and best use cases for Gen5, Gen6, and beyond.
We compiled some of the highlights in the following except. Enjoy.
MSI: Let's start with the basics. What is a storage controller and what is its function inside an SSD?
Sebastien: SSDs are typically comprised of three major parts: the controller, the cache-which is very fast memory-and the 3D NAND flash itself, which is where all the data is ultimately stored.
An SSD controller has several components to it. The first one is a processing element (Phison typically uses the ARM Cortex-R5). If you think about it, an SSD fundamentally is a pipeline processing unit. It's like an assembly line. When you receive a command, you first have to decode it, understand what it is, and then you start to make decisions. Each part of this assembly line looks at the command for very short periods of time-fractions of microseconds. Depending upon the number of NAND that you have in the SSD, you could have anywhere from twenty to three or four hundred points of parallelism. When you get to really advanced enterprise SSD drives, you need deep queues to keep them busy to reach their full one, two, or three million IOPS potential.
The next component of the controller is called an ECC (error correction code) engine for correcting errors on the NAND. NAND is affected both by heat and time, so bits can start to flip. Phison has developed a very strong EEC engine to correct all of these naturally occurring errors, which is a fundamental requirement in any SSD drive.
MSI: So, the ECC is not on the flash memory itself?
Sebastien: The ECC bits that were encoded and attached to the data are on the flash memory, but the flash in and of itself does not have that ability to process them-that would make the flash too expensive. So, the ECC unit is built into the storage controller. The storage controller has SRAM-one or more CPU micro cores, and accelerators such as ECC, cryptography, and other things along those lines.
MSI: You mentioned using ARM R5 processors. If you upgraded to faster ones, would that help with performance?
Sebastien: Replacing the core with something substantially larger won't necessarily increase performance. Larger cores have larger silicon area, which makes the chip and the SSD as a whole more expensive. So, fundamentally, when you build an SSD, you want to look at it as a balancing act based on the workload or application it's serving. You can make an SSD with four very large cores, for example, or you can build one with many smaller cores. The advantage of having multiple processors in an SSD is that you're not running each process with its own sliver of time and doing a lot of scheduling. That may be fine for a CPU on a PC, but it's wasted overhead on an SSD. So, as I said earlier, we have multiple points of parallelism with 20 cores, each doing one little job, thus moving the work of servicing the command forward.
MSI: We're talking today about one of your flagship models, the PCIe Gen4 going to 7 GB/s. Your controller is made on 12 nanometers. If you went to 10 or even 7 nanometers, can you get more throughput?
Sebastien: Yes and no. Phison makes SSD controllers for a specific bandwidth and IOPS target, so some of our medium-level controllers may be designed for 5 GB/s with more advanced ones designed for 7 GB/s. Going to a newer, faster process node is not necessarily the way to go. Again, it depends on the application and what we're trying to do. Fundamentally, it comes down to balance to deliver the best technology at a reasonable price.
But to answer your question, switching process nodes would do a couple of things for you. The chip would get smaller, however we don't have a size constraint in this particular situation. It allows you to clock the internal frequencies a bit faster, but we don't need to go faster to meet the current performance nodes. It does reduce your leakage. All silicon transistors leak a little bit of current, even when you're not toggling, so going to a smaller process node helps that considerably. The converse of that is that if you jump on the latest process node, and you don't really need it, all that you're doing is buying cost, which makes your SSD more expensive for no additional benefit. If you're making a CPU running at 2 GHz, which is often the case when using a small number of large/fast cores, then yes, you need the latest process node. With SSDs, advances aren't always about the latest process node. It's about smart design, such as tackling the problem through parallelism and slowing down your frequency. You can jump prematurely to the latest process node and not advance your technology.
MSI: Would M.2 SSD performance improve if you used DDR4 or DDR5 memory?
Sebastien: Phison makes both DRAM and DRAM-less SSDs, but the SSDs using DRAM try very hard not to use DDR memory. The fast path doesn't actually go through the DDR; it goes through the SRAM and then the data path. The DDR is used for logical-to-physical address translation, so the actual table bandwidth requirement is smaller than the SSD bandwidth, so DDR4 or DDR5 doesn't make a big difference. For a long time, we were fine with DDR3. Our current models are using DDR4 and models we have in development are moving to DDR5. At a certain point, it makes sense to go with the dominant process node in order to keep costs down. If you stay too long on the old node and supply goes down, the prices go up.
MSI: The E16, your first PCIe Gen4 controller, was a big step forward. Then you introduced the E18. Was that the plan or was the E18 controller introduced to address technical challenges to really get those speeds out of the SSD?
Sebastien: That's an interesting question. Both the E16 and E18 were planned. What we were trying to do at the time was meet a time-to-market requirement. Gen4 was very much a question mark at the time. We're talking to the CPU vendors long before products come to market and they are always enabling new technologies that may or may not get turned on in their next products, even if it's potentially in the silicon. So, there were a lot of questions inside the industry whether or not Gen4 would be turned on. When we got positive confirmation that AMD (first to market with Gen4) had committed to Gen4, then we committed and brought the E16 to market to coincide with the platform release. Not only were we first to market with a Gen4 SSD, we were the only Gen4 SSD on the market in the client space for about 18 months. By the time our competitors launched with their first generation Gen4 products, we were releasing our second generation. So, yes, the E18 is saturating the interface in large part thanks to all the lessons we learned from the E16, which was our first ASIC in that space.
MSI: You're pretty much running into the boundaries of what Gen4 is capable of. Is there any room for improvement, for instance, are you working on areas such as IOPS performance?
Sebastien: You can always improve the IOPS, but that has a direct impact on ASIC cost. In the client space where the E18 sits, the current generation is where it needs to be. We could double IOPS but I'm not sure the platforms could take advantage of it. We're pretty much approaching the raw speed limit of what Gen4 can do, but there are other ways to optimize and maybe still get more out of it.
MSI: Does that mean there will be an E19?
Sebastien: There is an E19 but it is targeting the DRAM-less space, which is geared more toward value than performance. From a gamer perspective, I think what you're asking is will there be an E18 plus or super turbo? The short answer is no-Gen4 is saturated in terms of what is possible with the technology. But, what we're going to continue to do with the E18 and Gen4 is percolate that technology down into other swim lanes. Because the E19 is targeting a lower cost point, it doesn't saturate the interface. So, the refresh will add a couple more GB/s and improve IOPS. So, in that space there is still room to grow with the current technology as we continue to move from 12 nanometers to our next process node, which is 7 nanometers.
MSI: Do you see direct storage accelerating the Gen4 market?
Sebastien: I think it accelerates computing in general, especially in the client space. Fundamentally, direct storage is a read-only interface that bypasses layers of NTFS. NTFS is the standard file system for Windows that is focused on reliability and has been around for a very long time. It has many layers within it, and each layer adds a little bit of latency. What direct storage does is give you a direct pipeline to the SSD with as little processing as possible. The OS is adding as little latency as possible and it's a read-only interface. So, workloads that are read-only that need fast I/O, which are primarily games in the client space, will benefit from direct storage. That low-latency pipeline is essentially where compute is going for games, as well as for phones, tablets, and PCs.
MSI: How far in advance is Phison working on these new technologies?
Sebastien: The SSD itself takes between six and eighteen months depending on how much we're changing inside of it. But the technology that goes into enabling the next silicon process node usually takes two to three years before the ASIC actually becomes a product. So, we're well aware of Gen6 and we've started work on designing the low-level components that will enable a Gen6 SSD but let me talk about why Gen5 is so interesting from a computational perspective.
DDR4 2133 is approximately 14 GB/s per channel and a gaming PC has anywhere between four to six channels. Now, with Gen5x4 we have a 14 GB SSD. It's true that there are faster DDR4s and DDR5 is now available on the next-generation board, but that's not really the point. It's that we can now say SSD storage and DRAM in the same breath, operating in the same space. An SSD will never replace DRAM because DRAM is optimized for 64 bit I/O - it's super low latency. But, if you look at this from a caching perspective, the CPU has an L1, L2, L3 RAM cache while a Gen5 SSD can operate as an L4 cache at the same speed as DDR. So, now the natural granularity of the SSD aligns with the caching architecture of the CPU, and that's where things will start to change the way computers work in general.
MSI: That's amazing. What do you see as the biggest challenge going forward with these new products?
Sebastien: As the speed continues to go up with each new generation, our challenge will continue to be to manage the heat. But if you look at the bigger question of where PCs are going, there's an understanding in the client space that M.2 PCIe Gen5 is sort of hitting the limit of where it can go and the actual interface or the connector will become a bottleneck for future speeds. So new connectors are being developed, and they'll be available in the next few years, which greatly increase both the signal integrity and the heat dissipation capability of the SSD through conduction to the motherboard.
MSI: What about supply chain issues? We're seeing shortages with graphics cards, especially, but with many other components and materials. How is that affecting Phison and the consistency of your products?
Sebastien: The various global shortages in the supply chain aren't really impacting Phison. An SSD has a handful of components that are particularly unique. There's the controller, possibly DRAM, NAND, and power management ICs (PMIC). Everything else is discrete-capacitors and resistors-and there is no shortage of those. For NAND and DRAM, Phison has long-term supply agreements and we pre-purchase them to always have an inventory. PMIC supply shortages actually is a problem with many manufacturers, but we make our own. When Phison develops a new controller for an SSD, we also design a PMIC at the same time and manufacture both.
We benefit from dealing in huge volumes. When Phison started about 20 years ago, we were this little company that popularized USB thumb drives. Today, we make every type of storage available, from medical, industrial, transportation, aerospace, gaming, workstations and enterprise solutions to embedded storage in phones, cars, and IoT. Because of that, we deal in really high volumes. So, as long as we continue forecasting well in advance and manage our inventory of critical components, we'll be just fine.
MSI: Will it always remain a balancing act or do you see the future of storage leaning toward an increased focus on capacity as opposed to speed and capacity?
Sebastien: Everything in the industry is moving toward higher density and higher speed. The client space is likely to remain balanced with premium products going up in both speed and density, sort of along the technology curve we've seen over the last few years. But Phison makes SSDs for all sorts of different specialized applications in a variety of markets. We have super high-density 16 terabyte QLC SSDs that are very popular in the high-performance computing market-people that do database analysis, AI/machine learning, genomics, and other big data scientific work. Then we have all-SLC SSDs that are really useful for people doing very write-intensive workloads or that require steady-state performance so that you have burst and sustained that are the same. In the consumer space, we will be focused on balance, skewed toward density and performance for gaming. For the value market and medium segment there will be tradeoffs between cost and what that particular workload needs because you don't need crazy performance if all your using is a browser.
MSI: For gaming it seems like most SSDs have settled on TLC memory with a cache of SLC. But if you want more capacity within those same physical die sizes, do you have to go to QLC?
Sebastien: Oh, no. We can make a TLC SSD in M.2 format up to 8 terabytes right now and 16 terabytes in U.2 format, but a 16 terabyte SSD is quite expensive.
MSI: Games continue to increase in size and you have a huge amount of HD textures and other data to download. It is just a matter of time before we start to see those 8 and 16 terabyte TLC M.2s becoming more affordable?
Sebastien: Yeah, they will definitely become more affordable because the NAND vendors are always increasing the density of their die to bring the cost down. So, TLC will continue to have a life. But QLC is becoming very interesting in non-gaming applications. You can do gaming on a QLC SSD because the flow is primarily read and QLC is good at read. It's just not so great at write, which is considerably slower. But I would say that the densities will continue to go up. The prices will continue to go down. And the sweet spot for cost vs. density is moving up: The previous generation was one terabyte. The next generation will be two terabytes and go forward three or four generations, it'll be four terabytes. So, for any application except gaming, you will probably see more QLC because what most people do with their drives is read data-you write the Windows OS once and you boot from it a gazillion times. Sure, there are patches and they go in, but not at that high of a rate.