How to Validate an AI Product Idea Before Paying for OpenAI Tokens
Validate an AI product idea before paying €0.01 in inference costs — landing pages, fake doors, pre-sales. The pre-token validation playbook.
OpenAI does not care whether your AI product has buyers. Anthropic does not care. Google does not care. They charge by the token either way — and the meter starts the second your first user clicks "generate."
We've watched founders rack up a €1,400 inference bill in three weeks on a product that turned out to have no audience at all. The bill arrived. The traction did not. The model providers got paid; the founder got a lesson.
The fix is structural: validate the wedge with a landing page and paid ads BEFORE writing any prompt code, BEFORE paying any inference cost. €0 in tokens. €200 in ads. 14 days to a real signal. Token spend follows paying customers — never precedes them.
The inference bill that does not tell you anything
Here is the trap. You ship an AI product. Some traffic shows up — Product Hunt, a Reddit post, a tweet. People click "generate" out of curiosity. Your token meter ticks up.
At the end of the month, you have a €380 OpenAI invoice and 2,100 generations. It looks like usage. It looks like signal.
It is neither. It is curiosity-clicks plus your own debugging plus three friends. None of those people are going to pay you. The bill tells you exactly one thing: that the inference pipeline runs. It tells you nothing about whether anyone wants the product.
We've seen this pattern enough times to name it. Token spend is correlated with usage, not with demand. The two diverge most aggressively in the AI category, because curiosity-clicks are unusually high — "AI" on a hero pulls 2–3x the click rate of equivalent non-AI categories. Most of those clicks are tourists.
What to validate before tokens
The thing that has to clear before you write a line of LLM glue is the wedge. The wedge is three things stacked: a bounded audience, a bounded workflow, and a bounded outcome. If any of the three is fuzzy, the test will not save you.
A wedge that's tight on all three:
"For sales-ops teams at SaaS companies with 50–500 employees, this writes a 600-word account-research brief from a domain plus 12 input fields, in under 90 seconds, in a tone you can ship to your AE."
Bounded audience. Bounded workflow. Bounded outcome. The reason ChatGPT-direct doesn't solve it is implicit: 12 fields, repeatable, reviewed for tone, embedded in a CRM workflow.
A wedge that won't survive the test: "AI for sales people." Three axes, three blanks. The market cannot tell you anything because you didn't ask anything specific.
The wedge has to convert on a landing page — paid traffic from the channel where that audience lives, costly CTA, AI-adjusted threshold — before any prompt is written. We covered the general framing in our piece on validating ChatGPT wrappers; this article is the AI-specific operations playbook on top of it.
The AI-adjusted playbook (curiosity-clicks change the math)
Standard pre-launch landing-page thresholds — 5% for a free CTA, 1.5% for a paid deposit — are not safe in the AI category. The audience clicks on anything labeled "AI" out of general interest, which inflates the top of funnel without inflating intent.
We adjust upward. The numbers we work to:
- Free waitlist / email capture: aim for 6%+ instead of the standard 5%
- Paid deposit (€5+ reservation): aim for 2%+ instead of the standard 1.5%
- B2B "book a 15-min demo" CTA: aim for 2.5%+ instead of 2%
- Design-partner slot (B2B, scarcity hook): aim for 3%+
The reason for the adjustment is single: in non-AI categories, a click usually implies the visitor recognised the problem. In the AI category, a click often only means the visitor recognised the letters "AI." The higher threshold filters tourists out of the signal.
If you only clear the standard threshold but miss the AI-adjusted one, treat that as a yellow light, not a green one. You may be converting curiosity, not intent.
Fake-door techniques that work in AI specifically
The AI category lends itself to a few fake-door techniques that work well because the visitor expects a frontier-tech vibe — they're primed to accept "join the early access" or "we're onboarding the first cohort."
The "Try this AI feature" button. The hero has a button that says Try it or Generate. The visitor clicks. They land on a modal that says "We're building this. Join the waitlist for early access — we're onboarding the first 50 design partners next month." Capture rate from button-click to email is itself a signal: above 40% is strong, below 20% is weak. The button click told you the headline worked; the modal capture tells you the offer worked.
The mock-screenshot landing page. A high-fidelity image of the product UI doing the thing. Not a video, not a demo — a single high-resolution screenshot embedded in the landing page like the product is shipped. Founders we've worked with run this with the explicit copy "preview" or "coming soon — reserve a spot." The screenshot answers the "what does it actually do" question without you having to write any prompt code at all.
The Wizard-of-Oz demo video. A 30-second screencast of you, the founder, manually running the workflow and producing the output. You generate the brief in ChatGPT, paste it into a Notion page, screenshot it. The video shows the output, not the engineering. Visitors don't know — and shouldn't — that the "product" in the demo is you with three browser tabs open. What they're responding to is the outcome, which is exactly what you want them to respond to.
Each of these techniques is honest in spirit: you're not promising a product that exists, you're promising a product you'll build if enough people want it. The waitlist or deposit makes that clear. None of them require a single token of inference.
When to actually start spending on tokens
Two gates. Both have to clear.
Gate 1: the wedge converts above the AI-adjusted threshold on a paid-traffic test. €150–€200 in ads, the right channel, the costly CTA. Real strangers, real money on the line. If it doesn't clear, you don't spend on tokens. You pivot the wedge or kill the idea.
Gate 2: you fulfil the first 5–10 requests manually. This is the under-discussed gate. Even when the wedge converts, the AI piece may not actually deliver the outcome you promised. The only way to find out without burning thousands in inference is to do the work by hand for the first cohort — ChatGPT in one tab, your editing brain in another, output delivered as a Google Doc.
If the manual fulfilment takes you 25 minutes per request and the customer is ecstatic, you have a real product to automate. If the manual fulfilment takes you 2 hours and the output is mediocre even with you in the loop, the LLM-only version will be worse, not better. Token spend won't fix that.
Only after both gates clear — wedge converts, manual fulfilment delivers — is it worth wiring up the real inference pipeline. At that point you know the offer works AND the technology can deliver, AND you have 5–10 paid customers funding the build.
The cost trajectory
The numbers we run to look like this:
- Validation phase (14 days): €0 in tokens. €150–€200 in ads. €0–€20 in tools (landing page builder, email capture). The output is conversion data + qualitative comments + a kill / proceed decision.
- Manual fulfilment phase (next 30 days): €50–€100 in tokens. This is you running ChatGPT or Claude by hand for the first 5–10 paying customers. Token spend is incidental to your own usage. The output is a fulfilment validation: can the AI actually deliver?
- Build phase (only after the first two clear): €500+ in tokens, then ramping with usage. By this point you have paying customers funding the inference. Token spend follows revenue, not preceding it.
The shape of that trajectory is the entire point. Most AI founders run it in reverse: €500+ in tokens during build, €100 in ads at launch as an afterthought, €0 spent on validating the wedge. Then they wonder why the meter is running and the customers aren't. Inverting the order changes the economics.
A worked example: AI for sales-ops account research
A team we worked with last quarter wanted to build an AI tool that produces account-research briefs for sales-ops teams. Plausible. The two co-founders had spent four years in revenue operations at a B2B SaaS. They were ready to spend three months and a few thousand euros on prompt engineering.
Instead, they ran the playbook.
Wedge: "For sales-ops teams at SaaS companies, this turns 12 input fields into a publication-ready 600-word account-research brief in 90 seconds."
Landing page: a single hero, three benefits, a mock screenshot of the brief output, a CTA: "Book a 15-min demo — first 10 design partners get 6 months at €0."
Traffic: €217 on LinkedIn ads, targeting "Sales Operations" titles at SaaS companies with 50–500 employees in Western Europe.
Result over 11 days: 184 clicks, 7 demos booked, 4 paid design-partner slots taken at €99 each. Conversion to demo: 3.8%. Conversion to payment: 2.2%. Both above the AI-adjusted threshold.
Token spend during validation: €0. No prompt code written.
Then they hit Gate 2. The 4 paying customers got their first briefs fulfilled manually: ChatGPT-4 plus the founders editing for tone and accuracy, delivered as a Google Doc within 24 hours. Total token cost across the four manual fulfilments: €38. Each brief took the founders 35 minutes — about half the time the customers said it took them internally — and three of the four customers asked when they could submit their second account.
Only at that point did the team start writing the actual product. They knew the wedge converted. They knew the LLM could deliver the output with a human in the loop. They had €396 in revenue funding the first month of inference.
Total spent before the build started: €217 in ads + €38 in tokens + €0 in product engineering = €255. Compare to the original plan of three months of prompt work at €0 customer feedback.
Why this order matters more in AI than anywhere else
Inference is one of very few categories where the cost of the product is variable, ongoing, and pegged to a third party's pricing. A regular SaaS that fails finds buyers — its hosting bill is roughly the same. An AI product that fails finds buyers still pays for every curious click. The downside is asymmetric.
Combine that with two other AI-specific facts: the curiosity-click problem inflates top-of-funnel noise, and inference getting 30–100x cheaper since 2023 has flooded the category with undifferentiated wrappers. Both effects make "just ship and see" more expensive in expectation than it's ever been.
The pre-token playbook isn't a clever optimisation. It's the only sequence that doesn't pay the model providers to learn what you could have learned for €200.
How LemonPage fits
LemonPage exists for this exact loop. AI-specific landing pages, the costly-CTA patterns above, paid traffic to the right channel, the AI-adjusted thresholds saved next to the test, and a kill criterion you commit to before you launch. We built it because the friction of running this manually — Webflow + LinkedIn campaigns + analytics + a separate kill-criterion spreadsheet — is exactly the friction that lets AI founders skip validation and ship into the void.
Related reading: are ChatGPT wrappers still viable in 2026 · 11 new businesses that only became possible because AI got cheap · how to validate a startup idea in 2026.
The token meter starts the second your first user clicks generate. Make sure that user wants what you're selling before they click.