Llama 4 News
Meta had a conference call last Wednesday, and while there were no major announcements, Mark Zuckerberg had interesting things to say about the upcoming Llama 4 models:
We’re training the Llama 4 models on a cluster that is bigger than 100k H100s or bigger than anything that I’ve seen reported for what others are doing. I expect that the smaller Llama 4 models will be ready first, and they’ll be ready, we expect sometime early next year, and I think that they’re going to be a big deal on several fronts — new modalities, capabilities, stronger reasoning, and much faster. It seems pretty clear to me that open source will be the most cost-effective, customizable, trustworthy, performant, and easiest-to-use option that is available to developers, and I’m proud that Llama is leading the way on this.
I’m particularly interested in the “easiest-to-use” claim. I still haven’t successfully deployed a Llama Stack server, so I access Llama using HuggingFace’s Transformers library. If there was a way to perform inference using PyTorch, I’ll use that in a heartbeat.
It still hurts to see that Nvidia’s CUDA has become the only language for GPU computing. I spent years writing a book and a blog about OpenCL, and I gave a handful of presentations on the topic. But now OpenCL is as dead as ActionScript, and that’s why Nvidia is in the Dow Jones Industrial Average instead of Intel.
Projects Archive About