24GB of RAM in a smartphone? It’s not as crazy as you might think.

Rumors have been circulating for some time that there will be smartphones arriving in the next year with a whopping 24GB of RAM. That’s a huge amount by any measure, with the most common RAM configuration on gaming PCs being a mere 16GB at the time of writing. 24GB of RAM seems like a ridiculous amount, Butnot when it comes to AI.


AI is hungry for RAM

Google Pixel 6 Pro Now playing

If you’re looking to run any AI model on a smartphone, the first thing you need to know is that to run just about any model, you need a amount of RAM. This philosophy is why you need a lot of VRAM when working with applications like Stable Diffusion, and it applies to text-based models as well. Basically, these models will usually be loaded into RAM for the duration of the workload, and it is A amount faster than running from memory.

RAM is faster for a couple of reasons, but the two most important ones are that it has lower latency, as it is closer to the CPU, and has higher bandwidth. It is necessary to load large language models (LLM) to RAM due to these properties, but the next question that usually follows is exactly How much RAM is used by these models.

If Vicuna-7B were to power the Google Assistant on people’s devices with the help of cloud services, you would in theory have all the benefits of an LLM running on one device with the added benefit of cloud-based data collection.

There is a lot to look into when it comes to some LLMs currently in distribution, and one I recently played with was Vicuna-7B. It is an LLM trained on a 7 billion parameter dataset that can be deployed on an Android smartphone via MLC LLM, a universal app which aids in LLM deployment. It takes about 6GB of RAM to interact with it on an Android smartphone. It’s obviously not as advanced as other LLMs on the market right now, but it also runs entirely locally without the need for an internet connection. For context, GPT-4 is said to have 1.76 trillion parameters and GPT-3 has 175 billion.

Qualcomm and AI on device

Qualcomm Snapdragon 8 Gen 2 2

While tons of companies are racing to create their own large language models (and the interfaces to interact with them), Qualcomm has focused on one key area: implementation. The cloud services that businesses use cost money millions to run the most powerful chatbots and OpenAI’s ChatGPT is said to run the company up to $700,000 a day. Any device deployment that leverages user resources can save a lot of money, especially if it’s widespread.

Qualcomm refers to this as “Hybrid AI” and combines cloud and device resources to split the computation where it’s most appropriate. It won’t work for everything, but if Vicuna-7B were to power Google Assistant on people’s devices with the help of cloud services, you would in theory have all the benefits of an LLM running on one device with the added benefit of cloud-based data collection. That way, it runs for Google at the same cost as Assistant, but at no extra cost.

This is just one way that on-device AI gets around the cost issue that businesses are currently facing, but that’s where the additional hardware comes into play. Since then, the company has shown ControlNet running on an Android device as well. He’s clearly been building hardware that can handle intense AI workloads for some time, and MLC LLM is a way to test that right now.

Vicuna 7B LLM running on an Android phone

From the screenshot above, note that I’m on airplane mode with Wi-Fi turned off and it’s still working very well. generates at about five tokens per second, where a token is about half a word. Therefore, it generates about 2.5 words per second, which is very fast for something like this. It doesn’t interact with the internet in its current state, but since it’s all open source, a company could take the work done by MLC LLM and the team behind the Vicuna-7B model and implement it in another useful context.

On-device generative AI applications

I spoke with Karl Whealton, senior director of product management at Qualcomm, responsible for CPU, DSP, benchmarking and AI hardware. He told me all about the various applications of AI models running on Snapdragon chipsets and gave me an insight into what might be possible on Snapdragon chipsets today. He tells me that Snapdragon 8 Gen 2’s micro-tile inference is incredibly good at transformers, where a transformer is a model that can draw relationships in sequential data (like the words in a sentence) that can also learn context.

To that end, I asked him about those currently rumored RAM requirements, and he said that with a language model of any type or scale, basically Need to load it into RAM. He went on to say that he would expect that if an OEM were to implement something like this in a more limited RAM environment, they would be more likely to use a smaller, perhaps more specialized language model in a smaller segment of RAM than just run it from device memory. Otherwise it would be brutally slow and not a good user experience.

An example of a specialized use case is what Qualcomm spoke about recently at its annual Computer Vision and Pattern Recognition conference: Generative AI can serve as a fitness coach for end users. For example, a visually grounded LLM can analyze a video feed to see what a user is doing, analyze whether they are doing it wrong, feed the result to a language model that can put into words what the user is doing wrong, and then use a speech model to convey that information to the user.

In theory, OnePlus could provide 16GB of RAM for general use, but in addition to an additional 8GB of RAM only used for AI.

Of course, the other important factor in on-device AI is privacy. With these models, you are very likely to share parts of your personal life with them when asking questions, or even just giving AI access to your smartphone could worry people. Whealton tells me that anything that goes into the SoC is highly secure, and that’s “one reason” why doing it on the device is so important to Qualcomm.

To that end, Qualcomm also announced that it was working with Meta to enable the company’s open-source Llama 2 LLM to run on Qualcomm devices, with it expected to be available for devices starting in 2024.

How can 24GB of RAM be built into a smartphone

OnePlus 12 front and back on dark background

Source: Smartprix

With recent leaks pointing to the upcoming OnePlus 12 packing up to 16GB of RAM, you might be wondering what happened to those 24GB of RAM rumors. Thing is, it doesn’t stop OnePlus from including AI on the device, and there’s a reason for that.

As Whealton pointed out to me, when you check DRAM, there’s nothing stopping you from segmenting the RAM so that the system can’t access all of it. In theory, OnePlus could provide 16GB of RAM for general use, but in addition to an additional 8GB of RAM only used for AI. In this case, it wouldn’t make sense to advertise it as part of the total number of RAM, since it’s inaccessible to the rest of the system. Also, this amount of RAM is very likely to remain static even in 8GB or 12GB RAM configurations as the needs of the AI ​​will not change.

In other words, it is not excluded that the OnePlus 12 still has 24GB of RAM; it’s just that 8GB may not traditionally be accessible. Leaks like these that happen as soon as they arrive typically crop up from people who might be involved in the actual manufacturing of the device, so it could be the case that they worked with 24GB of RAM and weren’t aware that 8GB might be reserved for very specific purposes. However, this is just conjecture on my part, and is an attempt to make sense of the leaks where both Digital Chat Station and OnLeaks may Both to be right.

However, 24GB of RAM is a crazy amount in a smartphone, and with the introduction of features like these, it has never been clearer that smartphones are just super powerful computers that can only get more powerful.

#24GB #RAM #smartphone #crazy
Image Source : www.xda-developers.com

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *