Entropy Control Architectures for Next-Generation Supercomputers
AI Breakdown
Get a structured breakdown of this paper — what it's about, the core idea, and key takeaways for the field.
Abstract
Progress in high-performance computing (HPC) fundamentally requires effective thermal dissipation. At present this challenge is viewed in terms of two distinct architectural components: a computational substrate and a heat-exchange substrate. Typically these two components are viewed as performing conceptually different functions that are independent up to optimization constraints applied when they are jointly implemented. Next-generation computing architectures, by contrast, must be designed from the perspective of computing as a thermodynamic process that can be managed in such a way as to actively control when and where heat is generated. The fundamental connection between computation and thermodynamics is not new. It dates back to Landaur’s principle from 1961 [1], which says that a computational process is thermodynamically neutral until information is destroyed. More specifically, what is thought of as an erasure of a bit is actually a process by which information is irreversibly transformed to heat. According to Landaur’s principle a fully reversible computational process will not produce heat because no information is destroyed (if information were destroyed then by definition the process could not be reversed). Reversible logic gates have been developed and studied [2, 3], especially recently in the context of quantum computing, and are designed to physically store information that would otherwise be lost. Said another way, they store the ancillary information needed for a conventional logic operation to be undone/reversed. The limitation of reversible computing is, of course, that the amount of stored information will tend to increase without bound during execution of any nontrivial algorithm. What is important to notice is that this ancillary information can be destroyed – i.e., converted to heat – or not at the discretion of the system. Beyond simple erase or don’t-erase discretion, the ancillary information retained to allow reversibility can, at the discretion of the system, be transported like any other information from one physical memory location to another. In principle, therefore, information can be scheduled for erasure and then be transported freely to physical locations where it can be converted to heat (erased) upon arrival with minimal impact on the ongoing computational process. To summarize, conventional computing hardware generates heat as a continuous by-product of the operation of logic gates during execution of a program. The precise distribution of heat is therefore a function of the particular algorithm and is not generally knowable a priori, thus the heat-dissipation substrate must be designed under an assumption that pernicious heat accumulation (spikes) can occur anywhere across the physical area in which logic operations may be performed. This need to uniformly maintain separate computation and heatsink functionality within the same physical space is why heat dissipation continues to limit the density and/or frequency of logic operations. An entropy-controlled computing architecture (ECCA) would permit the time and location at which heat is generated to be actively controlled during the computational process rather than letting it be indiscriminately generated at gate locations with an expectation that a separate physical system will soak it up. In the ECCA model the ancillary bits that are conventionally thought of as being stored as part of the operation of each reversible logic gate are now treated as bits of unneeded information that must be taken away in a manner somewhat analogous to garbage collection. More specifically, an unneeded bit could be sent without erasure to a nearby unused memory location or it could be sent to a location at which there is available capacity to accommodate the heat resulting from its erasure. Continuing with the analogy to garbage collection for memory management, the ECCA processing of a given program could progress without heat generation until storage for waste bits has been exhausted, at which point the computational process could be halted until the bits are erased in batch and the resulting heat is dissipated. Alternatively, waste bits could be incrementally transported for erasure at locations selected so that heat is generated uniformly across a physical area and can dissipate without localized accumulation. Batch-mode ECCA is potentially attractive for space-based applications in which heat distribution is easier to monitor than to control and the objective is to maximize the average rate of computation. For