OpenAI and Broadcom unveil Jalapeño, a custom chip built for LLM inference

OpenAI and Broadcom unveiled Jalapeño on June 24, 2026 — OpenAI's first custom AI chip, purpose-built for LLM inference. It is the opening move in a multi-generation compute platform the two companies are building together, and it puts OpenAI formally in the same category as Google, Microsoft, and Amazon: AI companies that design their own silicon.

The physical chip was delivered to OpenAI CEO Sam Altman and President Greg Brockman by Broadcom President and CEO Hock Tan and President Charlie Kawwas — a ceremony that underscored how significant both companies regard the milestone.

Nine months from blank sheet to running chip

The stat that stands out is the development timeline: nine months from initial design to manufacturing tape-out. That is unusually fast for a high-performance ASIC. The standard development cycle for a chip at this complexity typically runs two to three years. Broadcom, TSMC (handling fabrication), and Celestica (board, rack, and system design) pulled off what Anthropic's announcement describes as potentially the fastest ASIC development cycle ever achieved in high-performance advanced semiconductors.

Engineering samples of Jalapeño are already running ML workloads in OpenAI's labs — including GPT-5.3-Codex-Spark — at production target frequency and power.

Built for inference, not training

Jalapeño is an ASIC: a chip designed to do one thing exceptionally well rather than a general-purpose accelerator adapted to AI workloads. The target is inference — running trained models in response to user requests — not the training process itself.

The architecture directly addresses the bottleneck that limits GPU efficiency on LLM inference: memory bandwidth and data movement. Nvidia GPUs are optimized across a range of workloads; Jalapeño optimizes specifically for the compute-to-memory balance, networking efficiency, and scheduling patterns that large language models demand. The result, according to OpenAI, is performance per watt substantially better than current state-of-the-art at this workload type.

The early performance claim: roughly 50% lower inference cost per token compared to current-generation Nvidia GPUs.

OpenAI's models helped design the chip

One detail that threads through the announcement: OpenAI used its own AI models to accelerate parts of the chip design and optimization process. The company has been applying its models to scientific and engineering work more aggressively over the past year; hardware design is now on that list.

The 10-gigawatt commitment

Jalapeño is described as the first chip in a multi-generation compute platform — not a one-off experiment. The broader deal between OpenAI and Broadcom calls for deploying OpenAI-designed accelerators at gigawatt scale in data centers built with Microsoft and other partners, with a commitment to reach 10 gigawatts of capacity through 2029.

Hock Tan framed the scale plainly: "Our collaboration with OpenAI represents a fundamental commitment to scaling the physical infrastructure required for the next decade of AI."

The initial deployment target is end of 2026, with the full production ramp in 2027 and expansion through 2028–2029.

The Nvidia angle

Nvidia has supplied the vast majority of AI compute since the large-language-model era began. OpenAI has been one of its largest customers. Jalapeño is explicitly designed for the fastest-growing segment of AI demand — inference — and early performance comparisons are made directly against Nvidia's Blackwell chips and Google's TPUs.

That does not mean OpenAI is immediately displacing Nvidia for training workloads. Jalapeño is inference-only, and training at frontier scale still runs on Nvidia hardware. But inference is where volume is growing fastest as AI products scale to millions of users, and controlling that cost at the chip level changes OpenAI's unit economics significantly.

What "full stack" means here

Observers have drawn comparisons to Apple's transition to custom silicon: a company that sells experiences at the top of the stack quietly takes control of the hardware underneath, gaining cost efficiency, integration, and a competitive moat that third-party hardware vendors cannot easily replicate.

OpenAI already builds the models and the products. With Jalapeño, it is now also in the business of chips, kernels, networking, scheduling, and deployment systems. The company calls this strategy "building the full stack."

What it means for professionals

For most people using ChatGPT or tools built on OpenAI's API, Jalapeño will be invisible. It runs underneath the interface. But its effects compound: cheaper inference means faster price reductions for API access, lower cost for high-volume products, and potentially more aggressive deployment of inference-heavy features like real-time reasoning and voice at scale.

The bigger implication is long-term. A major AI lab that controls its own compute is insulated from chip supply constraints in a way that labs dependent on Nvidia allocations are not. If Jalapeño performs as advertised and the multi-generation roadmap stays on track, OpenAI's infrastructure position shifts meaningfully by 2028.

Sources: OpenAI · Broadcom · CNBC · Tom's Hardware · VentureBeat