cerebras-ceo-andrew-feldman-with-condor-galaxy

AI startup Cerebras has built a giant AI computer for the G42 Abu Dhabi with 27 million AI ‘cores’

cerebras-ceo-andrew-feldman-with-condor-galaxy

Cerebras co-founder and CEO Andrew Feldman, here seen standing over packing crates for the CS-2 systems prior to their installation at partner Colovore’s Santa Clara, Calif., host facility.

Photo: Rebecca Lewington/Cerebras Systems

The fervor surrounding AI “isn’t a Silicon Valley thing, it’s not even a U.S. thing, it’s now worldwide — it’s a global phenomenon,” according to Andrew Feldman, co-founder and CEO of AI computer startup Cerebras Systems.

In that spirit, Cerebras announced on Thursday that it had entered into a contract to build what it calls “the world’s largest supercomputer for AI,” called the Condor Galaxy, on behalf of its client, G42, a five-year-old investment firm based in Abu Dhabi, United Arab Emirates.

Also: GPT-4 is getting significantly dumber over time, according to one study

The machine is focused on “training” neural networks, the part of machine learning in which a neural network’s settings, its “parameters” or “weights” must be tuned to a level where they are sufficient for the second stage, making predictions, known as the “inference” stage.

Condor Galaxy is the result, Feldman said, of months of collaboration between Cerebras and G42, and is the first major announcement of their strategic partnership.

The initial contract is worth more than $100 million to Cerebras, Feldman told ZDNET in an interview. That will eventually expand multiple times, to hundreds of millions of dollars in revenue, as Cerebras builds Condor Galaxy in stages.

Also: Before artificial intelligence, this other wave of technology is spreading rapidly

Condor Galaxy is named after a cosmological system located 212 million light-years from Earth. In its initial configuration, dubbed CG-1, the machine consists of 32 Cerebras special-purpose AI computers, the CS-2, whose chips, the “Wafer-Scale-Engine” or WSE, collectively hold a total of 27 million computing cores, 41 terabytes of memory, and 194 trillion bits per second of bandwidth. They are supervised by 36,352 EPYC x86 server processors from AMD.

condor-galaxy-1-in-color-cropped

The 32 CS-2 machines networked together as CG-1.

Rebecca Lewington/Cerebras Systems

The machine runs at 2 exa-flops, which means it can process a trillion floating point operations per second.

The Largeness is the latest example of big-ness from Cerebras, founded in 2016 by savvy semiconductor and networking entrepreneurs and innovators. The company stunned the world in 2019 with the unveiling of the WSE, the largest chip ever, a chip that occupies nearly the entire surface area of ​​a 12-inch semiconductor wafer. It is the WSE-2, introduced in 2021, that powers the CS-2 machines.

Also: AI startup Cerebras celebrated the triumph of chips where others have tried and failed

The CS-2s in the CG-1 are complemented by Cerebras’ special “fabric” switch, the Swarm-X, and its dedicated memory hub, the Memory-X, which are used to group the CS-2s together.

The claim to be the largest supercomputer for AI is somewhat hyperbolic, as there is no general ledger for the size of AI computers. The common measure of supercomputers, the TOP500 list, maintained by Prometeus GmbH, is a list of conventional supercomputers used for so-called high-performance computing.

Those machines aren’t comparable, Feldman said, because they operate with what’s called 64-bit precision, where each operand, the value for the computer to work with, is represented to the computer by sixty-four bits. The Cerebras system represents data in a simpler form called “FP-16”, using only sixteen bits for each system.

In the 64-bit precision class machines, Frontier, a supercomputer at the US Department of Energy’s Oak Ridge National Laboratory, is the world’s most powerful supercomputer, running at 1.19 exa-flops. But it can’t directly compare to CG-1 at 2 exa-flops, Feldman said.

Of course, CG-1’s pure computation is unlike many computers on the planet that you can think of. “Think of a single computer with more computing power than half a million Apple MacBooks working together to solve a single problem in real time,” Feldman offered.

Also: This new technology could wipe out GPT-4 and everything else

The Condor Galaxy machine is not physically in Abu Dhabi, but rather installed at the Santa Clara, California facilities of Colovore, a hosting provider that competes in the cloud services market with the likes of Equinix. Cerebras previously announced in November a partnership with Colovore for a modular supercomputer called “Andromeda” to accelerate large language models.

condor-press-deck-7-13-23-slide-15

CG-1 stats in phase 1

Brain systems

condor-press-deck-7-13-23-slide-17

CG-1 stats in phase 2

Brain systems

As part of the multi-year partnership, Condor Galaxy will transition to the CG-9 version, Feldman said. Phase 2 of the partnership, expected in the fourth quarter of this year, will double the CG-1 footprint to 64 CS-2s, with a total of 54 million processing cores, 82 terabytes of memory and 388 teraflops of bandwidth. That machine will double throughput at 4 compute exa-flops.

Putting it all together, in Phase 4 of the partnership, to be delivered in the second half of 2024, Cerebras will bring together what it calls a “constellation” of nine interconnected systems, each running at 4 exa-flops, totaling 36 exa-flops of capacity, at sites around the world, to create what it calls “the world’s largest interconnected AI supercomputer.”

“This is the first of four exa-flop machines we’re building for G42 in the US,” Feldman explained, “And then we’ll build six more around the world, for a total of nine machines interconnected by four exa-flops producing 36 exa-flops.”

Also: Microsoft announces Azure AI trio at Inspire 2023

The machine is the first time that Cerebras not only builds a clustered computer system, but also manages it for the customer. As a result, the partnership offers Cerebras multiple ways to earn.

The partnership will reach hundreds of millions of dollars in direct sales to G42 by Cerebras, Feldman said, as he moves through the various stages of the partnership.

“Not only is this deal bigger than all the other startups they’ve sold, combined, in their lifetime, but it’s set to grow beyond just over a hundred million [dollars] it is now, but two or three times past,” he said, alluding to competing AI startups including Samba Nova Systems and Graphcore.

Also, “Together, we resell excess capacity through our cloud,” which is to allow other Cerebras customers to lease capacity in CG-1 when it’s not in use by G42. The partnership “gives our cloud a profoundly new scale, obviously,” he said, so that “we now have the opportunity to pursue dedicated AI supercomputers as a service.”

Also: Artificial intelligence and advanced applications are putting the current technological infrastructure to the test

This means that anyone who wants AI computing power in the cloud will be able to “jump onto one of the world’s biggest supercomputers for a day, a week, a month if you want.”

The ambitions for AI appear to be as big as the car. “Over the next 60 days, we will be announcing some very, very interesting models that have been trained on CG-1,” Feldman said.

G42 is a global conglomerate, Feldman notes, with approximately 22,000 employees, in twenty-five countries, and with nine operating companies under its umbrella. The company’s G42 Cloud subsidiary operates the largest regional cloud in the Middle East.

“The shared vision of G42 and Cerebras is that Condor Galaxy will be used to address society’s most pressing challenges in healthcare, energy, climate action and more,” said Talal Alkaissi, CEO of G42 Cloud, in prepared remarks.

Also: Nvidia sweeps the AI ​​benchmarks, but Intel brings significant competition

A joint venture between G42 and Abu Dhabi investment firm Mubadala Investments. Co., M42, is one of the largest genome sequencers in the world.

“They are sort of pioneers in the use of artificial intelligence and healthcare applications in Europe and the Middle East,” noted G42’s Feldman. The company has produced 300 AI publications in the past 3 years.

“They [G42] I wanted someone who had experience building very large AI supercomputers, and who had experience developing and deploying large AI models, and who had experience manipulating and managing very large data sets,” Feldman said, “And these are all things that we’ve kind of really honed in the last nine months.”

The CG-1 machines, Feldman pointed out, will be able to scale to ever-larger neural network models without having to incur many times as much additional code.

“One of the key elements of the technology is that it enables customers like G42 and their customers to quickly take advantage of our machines,” Feldman said.

Plus: AI will change software development in huge ways

In a slideshow presentation, he outlined how a 1 billion-parameter neural network like OpenAI’s GPT can be plugged into a single Nvidia GPU chip with 1,200 lines of code. But to scale the neural network to a 40 billion-parameter model, running on 28,415 Nvidia GPUs, the amount of code needed to deploy grows to nearly 30,000 lines, Feldman said.

For a CS-2 system, however, it is possible to run a 100 billion parameter model with the same 1,200 lines of code.

condor-press-deck-7-13-23-slide-26

Cerebras claims it can scale to ever-larger neural network models with the same amount of code as the explosion of code required to cobble together Nvidia’s GPUs.

Brain systems

“If you want to put a 40 billion parameter or a hundred billion, or a 500 billion parameter, model, you use exactly the same 1,200 lines of code,” Feldman explained. “That’s really a key differentiator, is that you don’t have to do that,” write more code, he said.

For Feldman, the scale of the latest creation represents not just greatness itself, but an attempt to achieve qualitatively different results by moving from the largest chip to the largest cluster systems.

Also: MedPerf aims to speed up medical AI while keeping data private

“You know, when we started the company, you think you can help change the world by building cool computers,” Feldman mused. “And over the past seven years, we’ve built bigger and bigger computers, and some of the biggest.

“We’re now on a path to build, somehow, unimaginably large, and it’s amazing, walking through the data center and seeing rack after rack of your equipment humming.”


#startup #Cerebras #built #giant #computer #G42 #Abu #Dhabi #million #cores
Image Source : www.zdnet.com

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *