AI baked into silicon: What hardwired models mean for the world
A startup just etched an AI model directly into silicon. Here's what that means – for business, for safety, and for the devices around us.
Something strange happened in late February 2026. A small Canadian startup called Taalas unveiled a chip that generates 17,000 tokens per second. For context: a fast human reader processes roughly 5 tokens per second. Nvidia’s H200 GPU manages around 230. The super-fast groq chip produces around 600. Taalas claims to be much faster than that.
The reason is radical: They didn’t build a chip that runs an AI model. They built a chip that is one.
This blog post explains what Taalas did, what it means for AI business, and what AI-on-a-chip could mean in a wider perspective.
Baking a brain into silicon
To understand what Taalas did, you need to back up a bit.
Large language models – powering chatbots, AI agents and more – consist of billions of numbers representing connections between artificial neurons. When the models are used, these numbers are multiplied and added according to certain patterns, to produce the next token – the AI equivalent of the molecules making up words, sentences and messages.
Producing these tokens requires huge amounts of computation, in combination with fetching the relevant ones and zeroes from memory. For every token. It consumes enormous amounts of energy and time, and this is why GPUs – computer chips built to do these operations quickly – have become so high in demand lately.
What Taalas has done is to take the model weights – the connections between the artificial neurons – and etched them into a computer chip. Since compute and storage are unified in the same physical structure, the data never has to travel anywhere. The computations are done by the weights themselves, not by a separate chip that has to read the weights. Electrons flow through the pattern, and the answer emerges.
The result is a chip that runs exactly one model – Llama 3.1 8B, in Taalas’s case – and runs it extraordinarily fast, at extraordinarily low power. No exotic cooling. No high-bandwidth memory stacks. Just silicon and physics.
To make this work, Taalas had to heavily compress the language model. Instead of using 8 bits (ones and zeroes), they use a mix of 3 and 6 bits for each weight – effectively rounding each number pretty harshly. Some quality is lost. But for many real-world tasks, it’s more than good enough – and what you get in return is speed and efficiency that nothing else can match.
You can try it yourself at chatjimmy.ai. The speed is immediately apparent.
The perfect solution for AI wearables
Taalas’s HC1 chip was built by 24 engineers for $30 million in product costs. It runs the AI model at a fraction of the cost of GPU-based alternatives, using one-tenth of the power. That is the kind of gains you want to see if you want AI on wearable or always-on devices.
As if that wasn’t enough, they also have an elegant manufacturing trick: To produce a chip for another AI model, Taalas only needs to modify two of the approximately 100 metal layers in the chip. The rest of the chip stays identical. That means a new model can be deployed in silicon in roughly two months. Normally, custom silicon would take years.
This makes hardwired AI chips economically viable not just for hyperscalers, but potentially for any company that has settled on an AI model they want to use in high volumes. And it opens a very obvious strategic play: If you’re OpenAI building a wearable AI device with Jony Ive (or Meta iterating on AI glasses), on-device inference via hardwired chips means longer battery life, lower latency, and no dependency on a network connection.
Taalas is likely to be acquired. The only question is by whom.
Shipping wearable devices with a hard-coded AI model of course means that you can’t update the model later on. Consumers are stuck with the model included when they bought the gadget. For the producer, this is not a problem. It’s a reason for people to buy a new and better device the next year, and the year after that.
A world where AI is everywhere
When AI inference becomes cheap enough, small enough, and power-efficient enough, it stops being a cloud service and starts being an ingredient. 30–40 years ago, LCD displays were expensive. Now they’re used to mark up prices in the supermarket. AI-on-a-chip could have the same effect on language models.
One application is interfaces. Today, most devices – appliances, medical equipment, industrial machinery – communicate through menus, buttons, and cryptic error codes. That’s not because it’s the best way, but because natural language processing hasn’t been available to embed.
Remove those constraints, and the design space changes completely. A blood pressure monitor that discusses your results and asks follow-up questions. A hearing aid that doesn’t just amplify sound but interprets it. Factory equipment that explains its own failures in plain language to a technician. Agricultural sensors that reason locally about soil, weather history, and crop condition – without sending anything to the cloud. Maybe, just maybe, even stove controls that are intuitive.
The important shift isn’t that these devices become smarter. It’s that intelligence stops being a destination you visit (via a chatbot interface) and becomes infrastructure you inhabit.
This is also, incidentally, a privacy argument. On-chip inference means data never leaves the device. For medical applications, for government systems, for anyone who’s uncomfortable with sensitive conversations routing through a data center in another country – local, offline AI is a meaningful alternative.
Safety implications of frozen intelligence
Etching an AI model into a chip has some complicated implications for safety and security.
A hardwired AI chip shares many of the security properties of open-weight models – the kind anyone can download and run locally. No central oversight. No usage logging. No ability to revoke access. If the model has a flaw, a bias, or a dangerous capability, the manufacturer cannot push an update. The chip is frozen at the moment of manufacture.
But there are two important differences between open-weight models and AI-on-a-chip. The first is that open-weight models can be modified, for example with fine-tuning that tailors the model behaviour towards particular use cases – or removes guardrails. The current chip from Taalas actually has a similar functionality, with on-chip memory that can insert tweaks to the model weights. Hacking that memory to remove guardrails, or other unauthorized changes, is more difficult than fine-tuning an open-weight model, but not impossible for an actor with resources. And when such a method exists, it will spread.
The second important difference is that chips can’t be copied the way software can. An open-weight model can be downloaded, duplicated, and distributed to millions of people. Anyone who wants to misuse an AI on a chip needs to have such a chip, physically, not just make a copy from someone who has one.
This difference matters most at the extremes. The smallest, cheapest chips – the ones that will proliferate most widely – are also the least capable. A low-capacity model baked into a cheap consumer device can probably be made to swear and be rude, but it’s unlikely to assist someone trying to synthesize a dangerous pathogen in any meaningful way.
A potentially more worrying case is disinformation. Even small and cheap chip-based models are likely to be capable of social manipulation and disinformation. A malicious actor could buy a few thousand cheap devices, hack them using the on-chip memory or by old-fashioned jailbreaks, and have the equivalent of an entire datacenter for disinformation campaigns. A datacenter that is cheap to run and maybe even portable.
This implies that it becomes crucial to have thorough safety testing before a model is etched into millions of chips, not after. In a low-regulation, high-proliferation scenario, that window is likely to close very fast. And, of course, there’s the problem that the methods for safety testing we have today are far from comprehensive.
A geopolitical footnote
AI-on-a-chip allows for much faster use of AI models. Will this change the balance in the AI race between the US and China?
There are certainly implications I have not yet seen, but I don’t think this technology changes the balance significantly.
Today, the US has the huge advantage of having the best chips for training and using AI models. The Taalas chips are created using manufacturing techniques that are a few generations old, which means that it is not unthinkable that China could produce AI-on-a-chip. These could, in theory, allow for scaling up the use of AI models in China. This would free up capacity in their existing datacenters to train new AI and better AI models.
But only in theory.
Investing in thousands or millions of chips with a hard-coded AI model is a sensible move if that model is useful for several years. Today, new frontier models are released every six months or so. The capacity for cranking out AI-on-a-chip would be better spent on creating chips that are more flexible.
AI-on-a-chip makes sense for creating consumer products, not pushing the frontier.
What comes next?
I’d like to finish up with some thoughts on what to keep an eye on.
Who acquires Taalas?
Meta, OpenAI, SpaceX, Apple and probably Google all have strong strategic reasons to own this technology. The acquirer’s model library becomes the default intelligence in a generation of devices.
(Anthropic doesn’t have as much reason to acquire Taalas, since they don’t aim for any consumer devices and have a pure AI-as-a-service business strategy. Taalas chips might become interesting for Anthropic if they decide to launch an AI model with long-term support – but that seems unlikely as long as we’re on an exponential.)
Strong AI models in ternary quantization
Taalas’ current chip uses 3 and 6 bits for each model weight. It is possible to compress models into a “ternary” format, where each model weight is represented by +1, 0 or –1. If such a model was etched into silicon, it would give two huge advantages. The first is that more model weights would fit on the chip, since each would require fewer transistors. The second is that calculations would speed up radically, since all multiplications would be reduced to addition or subtraction.
Truncating an existing AI model to a ternary format drastically reduces its performance, but if a model is trained from scratch into a ternary format, its capability is remarkably close to a model with a lot of decimals for each model weight. This could probably be done using so-called distillation, where a teacher model is used to train a smaller model.
This means that any frontier model provider could create a strong AI model in ternary format. If OpenAI, Google, Anthropic, or possibly SpaceX acquires Taalas, or starts creating AI-on-a-chip themselves, we should expect that the gap between “what fits on a chip” and “what’s genuinely useful” can close fast.
Devices around us are probably about to get a lot more capable. Not because they connect to something smarter, but because the intelligence moves in with them and becomes as common as LCD displays. That shift probably deserves more attention than it’s getting.



Another thought occurred two days later: AI-on-a-chip is a way to export AI power in a way that gives some kind of strategic autonomy for the buyer, while not giving away too much power from the seller.
Example: Country X buys one million chips with an LLM adapted to fit the country's society, culture and language. This gives strategic autonomy, since country X can rely on the chips for running the necessary infrastructure without being dependent on external suppliers. But the chips can't be used to run other models (for example models built for military purposes).
If AI capabilities continues to advance, the chips will age quickly. That would probably make them uninteresting for frontline commercial R&D. But they could still be very useful for managing basic administrative work for public services, assisting in health care, and generally keep basic stuff in the society running.