There are two phases in an AI model's life: training (the one-time, hugely expensive process of building it) and inference (every time it's actually used to produce an answer). When you send a prompt and watch the reply stream back, that's inference. It's the part you pay for and wait on, over and over.

Most professionals never train anything — they only ever do inference, through a tool or an API.

Why it matters at your desk. Inference is where the two costs you feel — money and time — come from. Every answer consumes compute measured in tokens, which is why pricing is per-token and why a verbose request costs more than a tight one. It's also why speed varies: a reasoning model "thinking" longer is doing more inference, trading latency for quality. Launches like Opus 4.8 advertise faster, cheaper inference and a speed dial precisely because, at scale, inference cost is the bill. For a freelancer on metered API access through a tool like Claude Projects, that bill is yours.

What to watch for: "free" consumer tiers still cost the provider inference money, which is why they come with rate limits and usage caps — when you hit one, you're bumping into the real cost of running the model.