High‐Efficient Compute Server Clusters

A compute server is a catch‐all term that includes the storage system, compute system, network system and anything else that directly processes digital data. The cooling system, of a data center for example, would not be part of the cost of doing compute work but would be included in running the data center.

Core components of a typical server comprise of motherboard (MB), CPU, memory, storage media, and PSU. Special‐use cases require additional components to streamline the compute process that include GPU, ASIC, and other add‐on cards. In today's ITE, most core components are standardized and optimized by individual vendors except the MB and daughter and add‐on cards.

The MB is one of the most critical components in a computer server. Anatomically, the MB is like a human body. The CPU, memory, storage media are parts of the brain network in the neural system, and Ethernet is a doorway used to communicate with others. The MB is a printed circuit board Assembly (PCBA). It is the foundation of a computer that connects all other computer components together. The MB connects power and enables the communication between the components such as CPU, memory RAM (random‐access memory), storage media, PSU, NIC (network interface card), and all other hardware components. In an enterprise rack mount server, the MB doesn't have a strict standard like most desktop computers. Hardware manufacturers use their creativity to design motherboards to fit into customized rack mount server chassis. Besides the major components, most of the other I/O interface and function chips such as PCIe (peripheral component interconnect express) slot, fan control pin, USB port, onboard sound chip, and onboard video chip will be modified or removed from the MB. Designing a MB is driven by the use case of the server and limited space in the chassis to maximize efficiency.

 

While a CPU performs actual work, the memory is the brain that feeds the CPU. Memory subsystem technology has not kept pace with the CPU performance gains and is lagging in the advancement of system performance. Nevertheless, newer memory products increase efficiency by improving performance per watt similar to CPU.

 

The NIC, also called Network Interface controller enables communication and data exchange between other computers.

 

PSU is the component that converts, controls, and maintains the electricity to a server from data center's outlet and/or power bus line. While a PSU does not perform any data‐related functions, it is one of the most critical computer system components. The voltage drop, the ripple, and current flow all directly affect the stability, functionality, and life span of a computer server. A good PSU must be designed to minimize heat waste when converting AC (alternating current) to DC (direct current).

 

GPU is a specialized accelerator (or “soul” called by NVIDIA) for applications such as entertainment games, data analytics, and AI. While CPU core count reaching one hundred is considered a high‐end product, GPU's provide a much higher core count. For example, while AMD EPYC 7702P has 64 cores but NVIDIA Quadro GV100 has 5,120 cores. Consequently, the power consumption in a GPU is much higher than a CPU that one can be benchmarked by using speed tester such as GPU UserBenchmark.

 

ASIC is an integrated circuit chip used to designed and customized for a particular purpose. This can be data encryption/decryption or uniquely patterned data processing such as high‐efficiency bitcoin miner. Today's data centers are moving to highly efficient SDDC (software‐defined data center) for two main reasons. First, a data center is not locked into one specific vendor. Second, ASICs can be used to accelerate certain tasks such as system‐level bottlenecks. ASIC applications could be server load balance or a security gateway. Routing is another application that can benefit from the performance of an ASIC. Compute servers combined with ASICs can optimize the performance per watt providing a sweet spot of server efficiency.

 

GPU Computer ServerThe GPU‐based system provides extreme challenges for designers. Each GPU has a TDP up‐to 300 W nowadays not counting watt needed by the CPU. Total power consumption of a single system can be approximately 3 kW in a 4U rack‐mounted server. This means the GPU will generate a lot of preheat for components placed after it, and the placement of the GPU will be creative. Additionally, the GPUs reside in the PCIe slots, which have to be in‐line to the airflow because most enterprise GPU cards usually has passive cooling (heat sink on GPU). Spacing between the PCIe slots becomes a competing requirement as insufficient space may cause system performance issues and different GPU cards have different heights in slots.

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics