AMD launched a new artificial-intelligence chip named MI325X

Advanced Micro Devices announced on Thursday that it intends to begin mass production of a new variant of its artificial intelligence chip named MI325X in the fourth quarter, aiming to strengthen its position in a market primarily led by Nvidia.

During an event in San Francisco, AMD CEO Lisa Su stated that the company is set to launch its next-generation MI350 series chips in the latter half of 2025. These chips feature an enhanced memory capacity and will incorporate a new base architecture that AMD claims will significantly boost performance compared to the previous MI300X and MI250X chips.

These announcements were largely anticipated due to AMD’s disclosures earlier this year. They did not inspire confidence among investors, resulting in a nearly 5% drop in AMD shares during afternoon trading. Some analysts pointed to the lack of significant new cloud-computing clients for the chips as a reason for the decline.

Shares of competitor Nvidia rose by 1.5%, whereas Intel’s shares decreased by 1.6%.

The demand for AI processors from major tech companies like Microsoft and Meta Platforms has significantly surpassed the supply available from Nvidia and AMD, enabling the semiconductor firms to sell all that they can manufacture.

This surge in demand has led to a substantial increase in chip stocks over the past two years, with AMD’s shares rising about 30% since their recent low in early August.

“There have not yet been any newly announced customers,” noted Summit Insights research analyst Kinngai Chan, who added that the stock had already increased in anticipation of “something new” before the event.

Connected to this, AMD, based in Santa Clara, California, revealed that vendors such as Super Micro Computer would start delivering its MI325X AI chip to clients in the first quarter of 2025. The design is aimed at competing with Nvidia’s Blackwell architecture.

The MI325X chip uses the same architecture as the already-released MI300X launched by AMD the prior year. The new chip features a novel type of memory that AMD states will accelerate AI processing.

AMD’s upcoming AI chips are expected to exert additional pressure on Intel, which has struggled with a consistent strategy for AI chips. Intel anticipates AI chip sales exceeding $500 million in 2024.

NEW SERVER, PC CHIPS

At the event, AMD’s Su also mentioned that the company currently has no plans to utilize contract chip manufacturers beyond Taiwan’s TSMC for advanced manufacturing processes, which are essential for creating high-speed AI chips.

“We are eager to utilize more manufacturing capacity outside of Taiwan. We are actively utilizing TSMC’s facility in Arizona,” Su remarked.
AMD also introduced several networking chips designed to enhance data transfer between chips and systems in data centers.

The company announced the launch of an updated version of its server central processing unit (CPU) design. The chip family, previously codenamed Turin, includes a variant specifically designed to ensure that the graphics processing units (GPUs) are supplied with data efficiently, which will enhance AI processing speed.

The premier chip features nearly 200 processing cores and is priced at $14,813. The entire line of processors employs the Zen 5 architecture, which provides speed enhancements of up to 37% for advanced AI data processing.

Additionally, AMD unveiled three new PC chips developed for laptops, based on the Zen 5 architecture. These new chips are optimized for running AI applications and will support Microsoft’s Copilot+ software.

In July, AMD revised its AI chip revenue forecast for the year to $4.5 billion, up from the previous estimate of $4 billion. The demand for its MI300X chips has surged due to the excitement surrounding the development and implementation of generative AI technologies.

Analysts are projecting AMD’s data center revenue for this year to reach $12.83 billion, according to LSEG estimates. Meanwhile, Wall Street expects Nvidia’s data center revenue to hit $110.36 billion. Data center revenue serves as a proxy for the AI chips required to create and run AI applications.

Rising earnings expectations from analysts have kept AMD and Nvidia’s valuations in check, despite the increase in their share prices. Both companies are trading at more than 33 times their estimated 12-month forward earnings, in contrast to the benchmark S&P 500, which stands at 22.3 times.

The Instinct MI325X, as the chip is known, is set to begin production by the end of 2024, according to Advanced Micro Devices, which announced the new product on Thursday. If developers and cloud companies view AMD’s AI chips as a close alternative to Nvidia’s offerings, it may put pressure on Nvidia’s pricing, which has maintained approximately 75% gross margins amid high demand for its GPUs over the past year.

Advanced generative AI technologies, like OpenAI’s ChatGPT, necessitate enormous data centers packed with GPUs for essential processing, prompting a demand for more firms to produce AI chips.

In recent years, Nvidia has held a dominant position in the data center GPU market, while AMD has typically ranked second. Now, AMD is striving to gain market share from its competitor in Silicon Valley or at least capture a significant portion of the market, estimating it will be valued at $500 billion by 2028.

“AI demand has significantly increased and has actually surpassed expectations. It’s evident that investment rates continue to rise everywhere,” AMD CEO Lisa Su stated during the event.

At the event, AMD did not disclose any new major cloud or internet clients for its Instinct GPUs, though the company has previously mentioned that Meta and Microsoft purchase its AI GPUs and that OpenAI utilizes them for some applications. The company also withheld pricing details for the Instinct MI325X, which is usually sold as part of a complete server system.

With the launch of the MI325X, AMD is speeding up its product release schedule to introduce new chips on an annual basis to better compete with Nvidia and capitalize on the AI chip surge. This new AI chip serves as the successor to the MI300X, which began shipping late last year. AMD indicated that its chip for 2025 will be named MI350, and its chip for 2026 will be called MI400.

The introduction of the MI325X will place it in competition with Nvidia’s forthcoming Blackwell chips, which Nvidia has announced will start shipping in substantial quantities early next year.

A successful debut for AMD’s latest data center GPU could attract investors looking for additional companies poised to benefit from the AI surge. So far in 2024, AMD’s stock has risen by only 20%, while Nvidia’s has surged over 175%. Most industry forecasts suggest that Nvidia commands more than 90% of the data center AI chip market.

AMD’s primary challenge in gaining market share lies in the fact that its competitor’s chips utilize their proprietary programming language, CUDA, which has become the standard for AI developers. This effectively locks developers into Nvidia’s ecosystem.

In response, AMD announced this week that it has been enhancing its competing software, ROCm, to enable AI developers to more easily transition their AI models to AMD’s chips, which they refer to as accelerators.

AMD has positioned its AI accelerators as being particularly effective for scenarios where AI models are generating content or making predictions, rather than when an AI model is processing large amounts of data to make improvements. This is partly attributed to the advanced memory AMD employs on its chip, which, according to them, allows it to serve Meta’s Llama AI model more efficiently than certain Nvidia chips.

“What you see is that the MI325 platform delivers up to 40% greater inference performance than the H200 on Llama 3.1,” said Su while referring to Meta’s large language AI model.

Additionally facing competition from Intel

While AI accelerators and GPUs have become the most scrutinized segment of the semiconductor sector, AMD’s primary business has revolved around central processors, or CPUs, which are fundamental to nearly every server globally.

AMD’s data center revenue in the June quarter more than doubled year-over-year to $2.8 billion, with AI chips representing only about $1 billion of that total, the company reported in July.

AMD accounts for approximately 34% of all expenditures on data center CPUs, as stated by the company. However, this is still less than Intel, which remains the dominant player in the market with its Xeon chip series. AMD aims to change this narrative with its newly introduced line of CPUs, known as EPYC 5th Gen, which was also revealed on Thursday.

These chips come in various configurations, from an economical and energy-efficient 8-core chip priced at $527 to high-end 192-core, 500-watt processors intended for supercomputers costing $14,813 each.

The new CPUs are particularly effective for supporting data feeding into AI workloads, according to AMD. Almost all GPUs require a CPU to be present in the same system to power on the computer.

“Today’s AI is largely reliant on CPU capabilities, which is evident in data analytics and various similar applications,” Su stated.

With its latest chip, AMD aims to close the performance gap with Nvidia in the AI processor sector. The company from Santa Clara also announced intentions for its forthcoming MI350 chip, designed to compete directly with Nvidia’s new Blackwell system, which is anticipated to ship in the latter half of 2025.

In a discussion with the Financial Times, AMD CEO Lisa Su articulated her goal for AMD to establish itself as the “end-to-end” leader in AI within the next ten years. “This is just the start of the AI race, not the conclusion,” she stated to the publication.

As per AMD’s website, the newly introduced MI325X accelerator comprises 153 billion transistors and is constructed on the CDNA3 GPU architecture utilizing TSMC’s 5 nm and 6 nm FinFET lithography methods. This chip features 19,456 stream processors and 1,216 matrix cores distributed across 304 compute units. With a peak engine clock of 2100 MHz, the MI325X achieves a maximum performance of 2.61 PFLOPs in peak eight-bit precision (FP8) operations. For half-precision (FP16) tasks, it reaches 1.3 PFLOPs.

A small portion of Nvidia’s AI market share

The announcement of the new chip surfaces as Nvidia’s customers prepare to implement its Blackwell chips in this quarter. Microsoft has already become the first cloud service provider to feature Nvidia’s latest GB200 chips, which integrate two B200 Blackwell chips along with a “Grace” CPU for enhanced performance.

Although AMD has positioned itself as Nvidia’s nearest rival in the off-the-shelf AI chip market, it still trails in market share, according to the Financial Times. AMD forecasts $4.5 billion in AI chip sales for 2024, a fraction compared to Nvidia’s $26.3 billion in sales of AI data center chips for the quarter ending in July. Nevertheless, AMD has already secured Microsoft and Meta as clients for its current generation of MI300 AI GPUs, with Amazon potentially following suit.

The company’s renewed emphasis on AI signifies a transition from its historically PC-centric business focusing on consumer graphics cards; however, Su remains hopeful about the rising demand for AI data center GPUs. AMD estimates that the total addressable market for AI chips will hit $400 billion by 2027.

Technological Insights on AMD’s New AI Chips

AMD’s recent AI chip, the Instinct MI325X, marks a considerable technological leap designed to contest Nvidia’s supremacy in the AI chip arena. The MI325X showcases remarkable specifications, featuring 256GB of HBM3E memory and a bandwidth of 6 TB/s, surpassing Nvidia’s H200 chip in several critical aspects. AMD claims that the MI325X offers up to 40% greater inference performance on Meta’s Llama 3.1 AI model compared to Nvidia’s H200 chip. This performance enhancement is vital as AI models grow more intricate and necessitate increased computational capability.

Along with the MI325X, AMD has unveiled the forthcoming MI350 series, which is expected to debut in the latter half of 2025. The MI350 series is projected to provide a 35-fold enhancement in inference performance over the MI300X, featuring 288GB of HBM3E memory and 8 TB/s memory bandwidth. These advancements underline AMD’s dedication to advancing the performance of AI chips and establishing itself as a strong rival to Nvidia.

Strategic Alliances and Market Dynamics

AMD’s partnerships with major technology players such as Meta, Google, Oracle, and Microsoft are essential to its strategy against Nvidia. During the Advancing AI event, AMD CEO Lisa Su highlighted these collaborations, pointing out that Meta has leveraged over 1.5 million AMD EPYC CPUs and Instinct GPUs for initiatives like its Llama large language model. These alliances not only validate AMD’s technological expertise but also create opportunities for AMD to expand its foothold in the AI market.

The AI chip sector is projected to grow to $500 billion by 2028, and AMD is eager to seize a larger piece of this lucrative market. Currently, Nvidia dominates with over 90% of the data center AI chip market; however, AMD’s assertive approach with its new AI chips and strategic collaborations suggests a strong desire to contest Nvidia’s lead. At the end of Q2 2024, AMD’s market share for EPYC server processors reached a record high of 34%, indicating potential for ongoing growth in the AI chip space.

Comparative Performance Metrics

When evaluating AMD’s Instinct MI325X alongside Nvidia’s H200 chip, several key performance metrics emerge. The MI325X yields 40% greater throughput and 30% reduced latency for a 7-billion-parameter Mixtral model, in addition to 20% less latency for a 70-billion-parameter Llama 3.1 model. Furthermore, the MI325X reportedly excels by being 10% faster than the H200 in training a 7-billion-parameter Llama 2 model. These performance metrics highlight AMD’s capability to provide competitive AI solutions that can rival those of Nvidia.

Moreover, AMD’s MI325X platform, which showcases eight GPUs, delivers 2TB of HBM3E memory and 48 TB/s of memory bandwidth, offering 80% more memory capacity and a 30% increase in memory bandwidth compared to Nvidia’s H200 HGX platform. These improvements are essential for managing extensive AI workloads and exemplify AMD’s commitment to providing high-performance solutions.

As AI technologies like OpenAI’s ChatGPT continue to create a significant need for data center processing power, AMD recognizes an opportunity to capture a considerable share of this expanding market. The AI chip sector is expected to be valued at around $500 billion by 2028, indicating immense growth potential, and AMD is positioning itself to be a key player in this arena.

Lisa Su, CEO of AMD, emphasized the rapidly increasing need for AI technology, remarking, “AI demand has outstripped expectations, and investments are growing across the board.” Although AMD did not disclose any new major cloud partnerships at the launch event, it has previously announced collaborations with Meta and Microsoft for its AI chips, and OpenAI employs AMD’s products for certain applications.

The newly introduced MI325X chip is crafted to excel in scenarios where AI models are tasked with creating content or making predictions, largely due to its sophisticated memory capabilities. AMD claims that its chip surpasses Nvidia’s H200 GPU by up to 40% when executing Meta’s Llama 3.1 AI model, representing a notable edge for specific AI tasks.

While Nvidia maintains over 90% of the data center AI chip market, AMD’s latest chip and its ROCm software ecosystem strive to facilitate AI developers’ transition from Nvidia’s proprietary CUDA programming language. This approach could assist AMD in attracting developers and companies seeking alternatives to Nvidia’s hardware.

AMD’s approach includes a quicker product release strategy, intending to introduce new chips on an annual basis. Following the MI325X, AMD plans to launch the MI350 in 2025 and the MI400 in 2026 to keep up with Nvidia’s aggressive development pace, which includes the forthcoming Blackwell chips.

In addition to its AI-targeted GPUs, AMD is reinforcing its primary CPU business. The company unveiled its fifth-generation EPYC CPUs, designed for data centers and AI tasks. These processors range from budget-friendly 8-core versions to powerful 192-core models intended for supercomputers, allowing AMD to compete with Intel’s Xeon lineup.

With AI chips representing around $1 billion out of its $2.8 billion in data center sales during the June quarter, AMD continues to challenge both Nvidia and Intel in this rapidly changing market.

The chief executive of the US semiconductor company, Lisa Su, also revealed future plans to introduce next-generation AI chips. The upcoming MI350 series chips are anticipated to be launched in the second half of next year. These chips will feature enhanced memory and an innovative architecture expected to significantly improve performance compared to the current MI300X and MI250X models.

Despite these announcements, AMD’s shares fell by nearly 5%, with some analysts linking the decline to the absence of significant new cloud-computing clients for its AI chips. Conversely, Nvidia’s stock rose by 1.5%, while Intel, another major chip player, experienced a 1.6% decrease.

The rise in demand for AI processors, driven by large tech companies such as Microsoft and Meta Platforms, has significantly surpassed supply. Both Nvidia and AMD have profited from this increase, with AMD’s stock climbing approximately 30% since early August.

AMD confirmed that vendors, like Super Micro Computer, will begin delivering the MI325X AI chip to customers in Q1 2025. The MI325X utilizes the same architecture as the MI300X chip, released last year, but incorporates new memory designed to enhance AI processing speeds.

Additionally, the company rolled out several networking chips aimed at optimizing data transfer between chips and systems within data centers. AMD also introduced a new iteration of its server CPU design. Previously codenamed Turin, the new family of chips includes a model specially designed to optimize data flow to GPUs for enhanced AI processing.

AMD also launched three new laptop PC chips based on the Zen 5 architecture, optimized for AI uses, and designed to be compatible with Microsoft’s Copilot+ software.

AMD’s AI strategy

In August, AMD announced its intention to acquire ZT Systems in a deal worth $4.9 billion, involving both cash and stock. ZT Systems is a provider of AI and general-purpose compute infrastructure for hyperscale computing companies and specializes in supplying hyperscale server solutions for cloud applications. The company has a global manufacturing presence that extends across the US, EMEA, and APAC.

AMD’s new initiatives come at a time when the semiconductor sector is facing heightened demand due to the growth of AI technologies. The rise of generative AI and advanced technologies has put pressure on supply chains as firms ramp up production of AI-focused chips. This surge in demand for AI chips raises concerns about potential shortages.

A report from Bain and Company indicates that the AI-driven spike in demand for GPUs alone could lead to a 30% or more increase in total demand for certain upstream components by 2026. Despite initiatives like the US CHIPS Act, supply limitations and geopolitical tensions may impede the industry’s capacity to satisfy demand, particularly given the complexities involved in ramping up production for advanced AI chips.

Hyperscale server solutions provider ZT Systems will be acquired by AMD in a deal valued at $4.9bn

Advanced Micro Devices (AMD) has agreed to purchase ZT Systems, a provider of artificial intelligence (AI) and general-purpose computing infrastructure tailored for hyperscale computing firms, in a cash and stock agreement valued at $4.9 billion. This amount includes a contingent payout of up to $400 million, dependent on specific post-closing milestones.

“For nearly three decades, we have transformed our business to become a top provider of essential computing and storage infrastructure for the world’s leading cloud companies,” stated ZT Systems’ CEO, Frank Zhang. “AMD shares our vision regarding the crucial role our technology and staff play in designing and constructing the computing infrastructure that powers the largest data centers globally.”

ZT Systems 101

Located in New Jersey, ZT Systems specializes in providing hyperscale server solutions for cloud computing and AI, with a worldwide manufacturing presence that extends across the US, EMEA, and APAC regions. By acquiring ZT Systems, AMD aims to enhance its AI strategy to deliver leading AI training and inference solutions through innovation in silicon, software, and systems.

Furthermore, ZT Systems’ knowledge in designing and optimizing cloud computing solutions is anticipated to assist cloud and enterprise clients in accelerating the deployment of AMD-driven AI infrastructure at scale.

“ZT brings exceptional systems design and rack-scale solutions expertise that will considerably enhance our data center AI systems and customer support capabilities,” commented AMD’s chair and CEO, Lisa Su. “This acquisition also builds upon the investments we have made to fast-track our AI hardware and software roadmaps.

“Integrating our high-performance Instinct AI accelerator, EPYC CPU, and networking product lines with ZT Systems’ top-tier data center systems expertise will empower AMD to provide comprehensive data center AI infrastructure at scale in collaboration with our ecosystem of OEM and ODM partners.”

Following the conclusion of the deal, ZT Systems will become part of the AMD Data Center Solutions Business Group. According to the semiconductor firm, Zhang will oversee the manufacturing division, while ZT Systems president Doug Huang will manage the design and customer support teams.

Additionally, AMD intends to seek out a strategic partner to take over ZT Systems’ data center infrastructure manufacturing operations based in the US. Subject to regulatory approvals and other standard conditions, the transaction is anticipated to be finalized in the first half of 2025.

AMD vs. Nvidia

AMD’s acquisition of ZT Systems signifies a strategic move to bolster its AI capabilities. This decision comes in the wake of the company’s substantial progress in AI over the course of the year, which includes its $665 million purchase of Silo AI, a Finnish AI startup.

This acquisition is part of AMD’s broader strategy to improve its position against Nvidia. The company showcased its AI initiatives at Computex 2024, where AMD presented the Instinct MI325X accelerator and announced plans for the MI350 series, projected to launch in 2025.

These advancements are integral to AMD’s plans to close the competitive gap with Nvidia in the AI semiconductor industry. Moreover, AMD has not only intensified its internal research and development (R&D) activities but has also put over $1 billion into expanding its AI ecosystem and enhancing its AI software capabilities over the past year.

AMD’s CEO, Lisa Su, informed Wall Street analysts that interconnected server racks utilizing tens of thousands of GPUs for model training and inferencing are expected to become increasingly intricate over time. Consequently, customers will require a chip vendor capable of assisting them in designing systems and expediting production.

Presently, organizations usually take several quarters from the initial sampling of GPUs to deploying them within servers that handle production workloads, Su noted.

“The ZT team will assist AMD in scaling up rapidly,” Su mentioned during the conference call with analysts. “We can effectively conduct a substantial amount of development concurrently.”

ZT will aid AMD’s largest clients in developing their AI infrastructure. Simultaneously, the chip manufacturer will fine-tune its GPUs and CPUs for these systems, according to Su. Nevertheless, ZT will continue to create systems for entities looking to use silicon from rival companies.

“This initiative will not limit customer choice,” Su stated. “Some hyperscalers will seek distinct system design optimizations, and we will have the team available for that.”

AMD is significantly smaller in the AI accelerator market compared to Nvidia. Nvidia reported $22.6 billion in data center revenue for the quarter that concluded in April, with a considerable share derived from AI systems. AMD anticipates $4.5 billion in sales this year from its AI data center GPUs.

ZT also designs and produces non-AI CPU-based systems, suggesting that the acquisition could enhance AMD’s competitiveness against Intel in large organizations’ data centers, said Jack Gold, an analyst at J.Gold Associates. AMD could leverage ZT to promote its EPYC CPU in competition with Intel’s Xeon chip.

“With ZT providing non-AI solutions as well, this represents a direct challenge to Intel from AMD,” Gold commented on LinkedIn.

Analysts predict that the demand for AI GPUs will surpass that of CPUs in extensive data centers. AMD is rapidly launching AI accelerators to expand its market share, which Su believes will grow from $45 billion last year to $400 billion by 2027.

In December, AMD introduced the MI300 Series, marking its inaugural Instinct AI accelerator for hyperscale data centers. In 2026, the company intends to release the MI400 series aimed at large-scale AI training and inferencing. For programming GPUs that run large language models, AMD provides its ROCm open software stack consisting of tools, libraries, and APIs.

AMD plans to divest ZT’s hardware manufacturing division after finalizing the acquisition, Su indicated. ZT’s revenue was approximately $10 billion over the past year, primarily from its manufacturing division.

AMD expects to keep around a thousand of the privately held company’s 2,500 employees, anticipating operating expenses of $150 million. The chipmaker expects ZT to start contributing to its gross revenues in 2026.

Post-acquisition, ZT will integrate into AMD’s data center business group. ZT CEO Frank Zhang will take charge of AMD’s manufacturing operations, while ZT President Doug Huang will lead AMD’s system design teams.

The ZT acquisition followed closely after AMD completed the $665 million purchase of Silo AI, a European lab focusing on AI services for autonomous vehicles, manufacturing, and smart cities.

This ZT acquisition is among AMD’s most significant. In 2022, AMD purchased Xilinx, known for programmable integrated circuits that customize microprocessors, for $50 billion. That same year, AMD also acquired Pensando for $1.9 billion, which developed programmable data processing units to alleviate CPU workloads on servers.

Frank Zhang, who founded and leads ZT Systems as the CEO, will keep managing the manufacturing division and fulfill the commitments to current clients after AMD finalizes its acquisition of the company, expected to be completed early next year. In the interim, Zhang will seek a buyer for the manufacturing operations, which employs about 1,500 people, since AMD is not interested in competing with its customers by engaging in server manufacturing and sales. This stands in contrast to another well-known GPU system manufacturer we recognize.

Additionally, AMD has already experienced this with the microserver pioneer SeaMicro, which it acquired in March 2012 for $334 million under the leadership of CEO Rory Read (remember him?), just as Lisa Su transitioned from IBM Microelectronics to lead its global business units. They eventually shut it down in April 2015 as AMD reset its server business after Su took over as president and CEO.

“Clearly, we have already started discussions with all our OEM and ODM partners,” says Forest Norrod, general manager of AMD’s datacenter business and formerly in charge of custom server business at Dell, in an interview with The Next Platform. “A reassuring factor is that all of these discussions have been very positive. People quickly understand the rationale behind our decision, and they recognize and appreciate that we have no plans to compete with them. We’re not going to do that, it’s not going to happen. I fully understand both businesses and there’s no confusion on my part.”

AMD aims to enhance its systems architecture and engineering capabilities. Currently, AMD has approximately 500 system engineers, according to Norrod, whereas ZT Systems has 1,100 individuals performing this work. Since AMD designs systems according to multiple standards rather than just one, it requires a larger workforce to assist in the design and development of future GPU-accelerated systems, which will pose challenges; however, they do not plan to engage in production manufacturing.

It remains uncertain what AMD will acquire with the divestiture of ZT Systems’ manufacturing business, but acquiring 1,100 experienced system engineers would be prohibitively costly and might not be feasible through any other means than acquiring a specialized high-performance system manufacturer such as ZT Systems.

This option is more economical than buying Supermicro, and likely offers a similar number of system engineers.

Here’s the situation as Norrod explains it, and we provide the full quote to illustrate AMD’s reasoning for investing $4.9 billion in ZT Systems, which amounts to $4.45 million for each system engineer. (Some costs will be offset by the divestiture of the manufacturing side, of course.) Here is how Norrod articulated it:

“We have been looking ahead at the roadmap and grasping the challenges of designing competitive systems that excel in performance and efficiency. With the rise of AI systems, it’s becoming increasingly obvious to everyone in the sector that this will lead to significant challenges in designing systems capable of operating at these power levels and signaling rates, given the complexity involved. Maintaining and managing these systems will be quite challenging.”

“There are numerous issues that need addressing, and the requirements to meet these challenges trace back to the very early stages of the silicon development process. We are acquainted with some of these challenges since they are typical in supercomputer design. However, when examining the developments within AI systems, the complexity is increasing rapidly, making it essential for us to have a sufficient number of world-class system design engineers involved right from the silicon definition stage. Thus, it became apparent that we needed to significantly improve our capabilities here.”

“Furthermore, as we enhance our capabilities, we want to remain true to AMD’s legacy of fostering open ecosystems and offering customer choices, rather than constricting them within proprietary confines. Consequently, this necessitates an even larger number of engineers. If you wish to create a single proprietary system for universal use, you require a certain staffing level. However, to develop open ecosystems that accommodate choice and variation entails greater complexity and requires additional system engineers to ensure timely market delivery and uphold high quality.”

This is largely about accelerating time to market and enhancing the system design and engineering capabilities. AMD has effectively developed impressive CPUs and now GPUs, but it must create a comprehensive networking stack and system boards that integrate well with rackscale and cluster-wide system architectures, which should be thoroughly tested and validated at scale. This is the reason Nvidia established the DGX series, and AMD acknowledges that this is necessary, yet it will not manufacture systems for customers nor take on the role of a prime contractor for HPC or AI clusters. This is in contrast to Intel’s attempts, which did not succeed very well.

AMD’s acquisition of ZT Systems involves purchasing ZT’s rack-scale systems design and manufacturing assets for $4.9 billion, with 75% paid in cash and 25% in stock. This transaction builds on the $1 billion AMD has already invested in ZT over the previous year.

The acquisition will primarily focus on the design of Instinct AI GPUs, EPYC CPUs, and networking solutions from AMD and its partners. AMD plans to divest ZT’s manufacturing assets, retaining the systems design capabilities.

Frank Zhang, the CEO and founder of ZT, will oversee the manufacturing division that will be sold, while ZT President Doug Huang will manage design and customer enablement, reporting to Forrest Norrod, who leads AMD’s Data Center Solutions Business Group.

The AMD board has approved the deal, which is anticipated to finalize in the first half of 2025, pending regulatory approvals. It is expected that the acquisition will be beneficial for AMD on a non-GAAP basis by the end of 2025.

ZT Systems is engaged in the design, integration, manufacturing, and deployment of rack-level, hyperscale AI systems. It is rumored to generate $10 billion in annual revenue, mainly from its largest clients, AWS and Azure.

The company employs approximately 1,000 personnel in design and customer roles and another 1,000 in manufacturing. Founded in 1994 and based in Secaucus, New Jersey, ZT has evolved from producing desktop PCs and pedestal servers in its early days to focusing on data center servers since 2004, then transitioning to rack-scale design and integration in 2010, followed by a commitment to hyperscale solutions in 2013, and in 2024, it will ship “hundreds of thousands of servers annually.”

AMD’s acquisition of ZT positions the company for significant growth in the datacenter AI market. The increase in sales of AMD’s Instinct GPUs has been substantial, showing growth from $0 in the first half of 2023 to a projected $4.5 billion by 2024, driven by considerable investments in hardware and software. However, in comparison to AMD’s own AI accelerators and GPUs market forecast of $400 billion by 2027, the company requires catalysts to facilitate its rapid growth and capture what I refer to as its “unfair” share.

Although there have been improvements, AMD faces two primary competitive hurdles in AI infrastructure: its software limitations and the scale and maturity of its systems. While AMD has effectively addressed these issues for non-AI EPYC servers and PCs, its solutions for AI racks require enhancement. AMD could develop these capabilities in-house, but the time required for such an endeavor is considerable.

The company has previously completed three minor software acquisitions (Silo AI, Nod.ai, and Mipsology) to enhance its mid- and high-level software functionalities and support customer customization of LLMs. Furthermore, AMD has made significant progress with ROCm AI optimizations and compatibility with PyTorch and Hugging Face for both Instinct and EPYC. I anticipate that AMD will pursue additional software acquisitions in the future.

While AMD could not foresee a $4.5 billion annual projection for Instinct without some systems capabilities, what it currently possesses is insufficient to carve out its fair share of the anticipated $400 billion market. The AI infrastructure landscape extends beyond merely being a chip-focused arena; it has transitioned to encompass a more integrated system and software approach. “Chip” manufacturers are now expected to supply complete rack solutions and software stacks to achieve continuous improvements in performance, efficiency, quality, and time-to-market. The acquisition of ZT is strategically aimed at enhancing AMD’s capabilities “above the chip” and “below software” for AI server solutions.

I believe this acquisition, if executed with Lisa Su’s usual precision, will serve as the catalyst needed for AMD to drive remarkable revenue growth for both Instinct and EPYC at the head node, particularly with hyperscalers, tier-two CSPs, and some of the largest on-premises facilities for governmental, financial, energy, and pharmaceutical sectors.

I am optimistic about the cultural compatibility between the two entities. During a discussion with her regarding the deal, Su highlighted the long-standing partnership between AMD and ZT. “Our team has collaborated with them for many years,” she stated. “They contributed to some of our initial EPYC designs and MI 250 designs, and have been actively involved in MI 300 designs. This has allowed us to become very familiar with them.”

This synergy extends to customer relationships as well. Su mentioned Frank Zhang’s focus on the datacenter and cloud markets for over 15 years. Consequently, instead of spreading too thin, ZT has strategically focused on a select few crucial clients. While Su could not disclose any customer names due to ZT being a private entity, she emphasized that “Every one of their customers is our customer.” Hence, even though integrating engineering teams from different companies typically presents challenges, it is favorable that all parties will continue to serve the same clientele.

Lastly, I appreciate the decision to eventually divest ZT’s manufacturing, sales, and support functions, as these areas would dilute AMD’s focus and profitability. For context, Supermicro operates with net margins in the mid-single digits, while AMD maintains margins around 25%. In connection to this, Su mentioned that AMD would avoid entering the systems business as Nvidia has done with DGX. I have mixed feelings about this since DGX provides Nvidia with significant revenue and profit margins, creating a solid revenue stream.

Avoiding Involvement in the Systems Manufacturing Sector

At the same time, I appreciate the choice to eventually eliminate ZT’s manufacturing, sales, and support functions, as these areas would be significantly dilutive. For context, Supermicro’s net profit margins fall within the mid-single digits, while AMD’s are around 25%. In relation to this, Su mentioned that AMD would not venture into the systems market the way Nvidia has with DGX. I have mixed feelings about this, as DGX generates considerable revenue and profit for Nvidia and provides a platform to promote an all-inclusive solution. Undoubtedly, hyperscalers and top-tier OEMs would prefer AMD not to enter the systems space, but AMD needs a compensation model for avoiding competition with its clients. So far, it appears that Nvidia isn’t adversely affected by this situation.

Su believes that clients appreciate having options and customized solutions instead of a model that imposes a specific configuration of CPU, GPU, and networking within a set form factor for data centers. According to Su, this new agreement will change that perception. “We’re going to say, ‘I would welcome you to use my CPU and GPU along with our open networking standard, but actually, I will customize the system for you. Please let me know what your ideal data center would look like.’”

There’s another competitive aspect to consider. ZT Systems designs, manufactures, and implements Nvidia systems—allegedly for AWS and Azure as well. In that context, the implications of the ZT acquisition for AMD’s leading datacenter AI chip rivals, Nvidia and Intel, are somewhat unclear in terms of design. After the deal concludes, I would anticipate that all design work for Nvidia and Intel will cease. AMD claims that production for the competing systems will carry on, which seems logical if the manufacturing segment is indeed separated and sold off.

Although some may justifiably critique AMD in certain areas, pinpoint execution has become a defining characteristic of Su’s leadership. Meticulous execution is precisely what is required to turn this investment into a success for the company by boosting revenue and gaining market share. When compared to AMD’s acquisition of Xilinx a few years back, this one appears straightforward. This acquisition further solidifies the advantage that AMD and Nvidia have accumulated over their competitors in the realm of AI chips. I am confident that this purchase will be beneficial for the company and allow it to capture a larger portion of the projected $400 billion datacenter AI market by 2027.

Exit mobile version