-
Introduction
-
Nowadays, the battlefield is rapidly shifting; the winning requires a thorough understanding and knowledge of tactics. The combat tactics are all about the art and technology. Especially the high performance military application, the edge computing equipment (GPU server) with high bandwidth data ingest components tied to intrusion detection and assessment camera, radar, and LIDAR sensors, high capacity and low latency storage subsystems and high-performance compute engines that can perform the AI machine learning and inference tasks. Also the quick access and the ability to share complex, real-time data with battlefield commanders, who in turn push select information all the way down to the front-line warfighter, hence efficient and accurate performance is definitely required and strongly emphasized.
In this way, IT operators will no longer be disturbed by numerous interfaces, monitors and humongous servers. Various and complex system connecting one another should work coordinately, interpreting the received data and representing the commander's order. All data and assets can be managed from one location.
Ideal hardware utilization, expansion potentials and high efficiency operation are further guaranteed. In battlefield reality, where one second difference defines success and failure, the application of GPU server has become indispensable.
AV1000 rugged liquid cooled GPU server optimizes with excellent 3rd Gen Intel® Xeon® Scalable Processors. In a scenario where virtual machine is integrated with C4ISR system, the 40-core processor allows the CPU resources be reallocated to one or more virtual machines. AV1000 also equipped with high-capacity DDR4 memory, and leading-edge thermal management (liquid cooled), it combines impressive compute power, data-handling capabilities, and storage capacity (48TB NVMe in 2U) in a rugged solution that can withstand harsh conditions and demanding environments —including potholes, collisions, shock and vibration, and extreme temperatures that cause traditional systems to fail.
-
Reasons to Adopt Rugged Liquid Cooling-The New Era of HPC thermal solution
-
In the past decade, the demands of cloud, IoT, AI, and edge applications are once again resulting in IT technology changes. To fulfill the high speed of complex data transmission, high-performance and high-power GPUs and CPUs are being mandated to simultaneously increase energy efficiency, consolidate operations, it cause CPU and GPU’s total power consumption is increasing dramatically. Therefore, reducing the energy consumption of the cooling system has become a very important study.
AI applications need a balance between server performance requirements and mitigation of energy loss. The Green Computing revolution is well under way; to utilize less energy is the key to make the world a better place.
Take the most popular GPU Server in the market for example, the GPU server is powered by dual Ice Lake Xeon SP processors and integrated with 4 NVIDIA A100 GPU, its TDP will be up to 2KW. It needs an efficient thermal solution to dissipate the generated heat to avoid CPU, GPU throttling.
- Why Liquid Cooling Solution is Important?
-
In order to reduce the latency and improved the performance, CPUs, GPUs, and other components on board are moving closer, resulting in increased physical density and temperature within the server. Plus the data center rack power densities continue to increase, that will generate a tremendous amount of heat. The heat needs to be removed to protect the equipment, the liquid cooling’s benefits are being explored as an important and required solution in data centers.
PUE (Power Usage Effectiveness) is a ratio that describes how efficiently a computer in data center uses energy. It is also a standard of the power consumption of cooling system, and the ideal PUE ratio is 1.0. Take a look of the overall power consumption of data center, next to the IT systems themselves, cooling system is the biggest energy consumer in data center, that’s why cooling system have such a big impact on PUE. One of the major expenses in data centers is also the power usage. Traditional air cooling system needs more energy to drive high RPM motor, and the PUE is up to 1.6. The PUE of conduction liquid cooling can lower to 1.1, Liquid cooling consumes less energy, and it dissipate the heat more efficiently that allows servers to be operated at higher ambient temperatures.
-
Liquid Cooling Solutions Overview
-
1. Conduction
Using conduction cooling system, the heat is captured at the interface between the copper tubing and CPU and GPU, and then the heat is dissipated to the interface through the copper tubing and the solid conduction plate mounted on the side of the server.
2. Direct to Chip (D2C)
Direct to chip system cools processors directly, a liquid coolant is brought via tubes directly to the chip. Heat sinks mounted directly onto the processors and GPU, and heat sinks are connected to small tubing that carried liquid to and from the component.
3. Immersion
In an immersion system, the rack hardware is submerged into a tank of non-conductive and nonflammable dielectric fluid. Heat will be dissipated from the components into the fluid via convection to dissipate the heat, as it flows through the rack without any additional active cooling systems or parts required.
- What is C.L.C.P?
-
The liquid cooling can be a good solution to dissipate the heat. But most liquid cooling solutions are using close loop design — Direct to Chip (D2C) integrated pump & cold plate in the system. Some people may worry about liquid cooling solution might be a disaster because of the potential risk of leakage of the liquid. But once we prevent the leakage of the liquid, it becomes an utmost thermal solution.
In order to effectively reduce overall temperature and prevent the leakage of the liquid, 7Starlake innovated a unique heat exchanger which integrating Conduction Liquid Cold Plate (C.L.C.P.) building in most advanced " Gun Drilled ", which with 10 pipes (each pipe 5mm x 5mm x 𝝿 x 500mm) to dissipate max 4KW heat on the computing system. A gun-drilled liquid cold plate is manufactured by drilling a series of deep holes into an aluminum plate to form the liquid flow path.
The features of gun-drilled cold plates are adaptable to any temperature range; satisfying the flatness requirements; and avoiding the risk of leakage.
CLCP includes multi-channel cold liquid inlet/outlet owning high flexibility in adjusting numbers of inlet/outlet by request. When coolant flows through top sink, liquid can absorb the heat and take it away from the heat sources quickly to the heat exchanger. Leveraging both liquid-cooling and air-cooling’s strong points; these features accomplish higher rack density and efficiency, comprehensive reduction in power use, and increase of overclocking potential.
- What are the advantages of C.L.C.P?
-
1. Higher Level of Efficiency
The heatsink is mounted directly on CPU or GPU that will conduct the heat away by direct contact. After that, the heat will be transferred to liquid cold plate which transfers heat more effectively because of its high thermal conductivity, it will efficiently distributes heat over convection surface.
4. Rackmount Vertically with Heat Exchanger to Save Space
HE4K is a heat exchanger which is designed to dissipate 4kw of heat and can be combined with the C.L.C.P. on AV1000. Integrating C.L.C.P. thermal solution of AV1000 and air cooling solution of HE4K, offers an utmost thermal solution to dissipate heat. With going rackmount vertically, it reduces the space and easily to organize.
2. Longer Lifespan
If the computer becomes too hot, it is possible to destroy and shorten the lifespan of the hardware inside your computer, leading to irreparable damage and potential data loss. Going conduction and liquid cooling keeps the computer operating at a consistently cool temperature.
3. Cools off High-Performance GPUs
These days, a high-end GPU can generate two or three times the amount of heat of a CPU. With using the high performance GPUs like RTX6000, they need the outstanding thermal solution - C.L.C.P., which offers an efficient means of cooling things down to successfully combats GPU throttling to maintain peak performance.
-
AV1000 Breakthrough Performance in New AI Inference
-
1. Outstanding Performance with Powerful 3rd Gen Intel® Xeon® Scalable Processors
7Starlake's AV1000 AI Inference Rugged Server are featuring 3rd Gen Intel® Xeon® Scalable processors. The main highlight of Ice Lake-SP processors will be support for PCIe Gen 4 and 8-channel DDR4 memory, offer a balanced architecture with built-in acceleration and advanced security capabilities, designed through decades of innovation for the most in-demand workload requirements. The processors are up to 40 cores and an array of frequency and power supports, and deliver up to 40% better performance compared to the previous generations.
2. NVIDIA QUADRO RTX6000
AV1000 supports 2 x NVIDIA Quadro RTX GPUs which are built on the NVIDIA Turing architecture and the NVIDIA RTX platform. With the increasing needs of large AI workloads, the NVIDIA Quadro RTX 6000 passively cooled graphics board features RT Cores and 576 tensor cores for real-time ray tracing, AI, and advanced graphics capabilities which help improving the speed of machine learning applications. Tackle graphics-intensive mixed workloads, complex design, photorealistic renders, and augmented and virtual environments at the edge with NVIDIA Quadro RTX, designed for enterprise data centers to unleash the highest application performance possible on a single server.
3. Support 48TB NVMe
NVMe flash offers great benefits for specific AI use cases like training a machine learning model. The datasets of training of a model can be huge; and paging out the old training data and bringing in new data should be done as fast as possible to keep the GPUs from being idle.
NVMe provides low latency and protocol allowing, offers significantly high performance and superior speed. AV1000 is equipped with 1 x PCIe Gen 4.0 x 4, 1 x PCIe Gen 4.0 x 8 to support with 2 x NVMe SSD up to 48TB, which are expected to satisfy to a large extent, the requirements of high speed servers, gaming, graphics and data centers.