Building a car recognition application (pt. 1)

Building a car recognition application (pt. 1) | WildEdge Blog

Appearance

Return to top

Building a car recognition application (pt. 1) 1 July 2026·Piotr Duda, Damian Kołakowski, Wojciech Kedzierski Implementing a car recognition app used to mean collecting images, labeling them, and training a model before you could ship anything. Now you point a phone at a car, send the photo to a multimodal LLM (Claude, Gemini, or GPT), and get back the results in an afternoon.

That speed is real, and so is the catch. Every scan is a billed API call, the price and the model are set by someone else, and when the provider has a bad day your app has one too. It is the cheapest way to start but an expensive way to grow. This post builds such a prototype and instruments it, so you can see what the convenience actually costs. In the next parts we will train a specialized model with an effort to replace the remote one.

CarScanner is our open-source iOS experimentation app. Point it at a car and it returns: the make, model, color, approximate year, and a bounding box around the vehicle. It calls Gemini directly or through OpenRouter, both currently pointing at Gemini 2.5 Flash. OpenRouter is there so you can later swap in any model it supports without touching the Swift code. The first version took an afternoon. Here is most of the interesting code (GeminiClient.swift): swiftfunc recognizeCar(image: UIImage) async throws -> CarRecognition { let base64 = image.jpegData(compressionQuality: 0.8)! .base64EncodedString()

let response = try await gemini.generateContent( systemInstruction: carRecognitionPrompt, parts: [.data(mimetype: "image/jpeg", base64)]

return try JSONDecoder().decode( CarRecognition.self, from: response.text!.data(using: .utf8)! Gemini owns the model, the training data, and the inference pipeline. Your side of it is a prompt and a decoder. What the API returns The structured response includes make, model, color, year_range, body_style, and a confidence score from 0 to 1. json{ "make": "Toyota", "model": "Camry", "color": "Silver", "year_range": "2018-2021", "body_style": "Sedan", "confidence": 0.94 Every call is captured as a WildEdge inference event, so here is a real one, payload and all:

One caveat on that last field: confidence here is not a calibrated probability. It is a number you asked the model to produce, so treat it as the model's opinion of itself, not a measurement. For a prototype, that is plenty. The generalist copes with partial occlusion, bad lighting, odd angles, and models it was never specifically tuned for. You get broad coverage for free. What the numbers look like after an afternoon of scanning Before writing another line of product code, we instrumented the API calls with WildEdge. Every call becomes one inference event, carrying latency, confidence, cost, model version, and the full response (GeminiClient.swift): swiftimport WildEdgeSDK

let gemini = WEGeminiClient( apiKey: apiKey, dsn: "https://...", modelName: "gemini-2.5-flash" That swap is the whole instrumentation step. Every chart in this section is rendered straight from the resulting inference events in the WildEdge dashboard, not plotted by hand. After a couple dozen scans in a parking lot (23 inferences in one afternoon session), the WildEdge dashboard gave us our first real read on what the prototype actually does: 2.6s average latency, a 3.5s p95, a 0% error rate, and confidence that barely moves. Latency Average round-trip was 2.6 seconds, and p95 was 3.5 seconds, with the p95 trend wandering between 2.7 and 3.6 seconds across the session. For an app where you point your phone and expect an instant answer, 2.6 seconds already feels slow, and a 3.5-second tail is long enough that you start to wonder whether it crashed. A representative single scan took 2,368 ms end to end. These numbers are effectively a floor, measured under gigabit Wi-Fi with full signal. On a congested cell network, in a parking garage, or anywhere with a weak signal, real-world latency only goes up from here.

Confidence Self-reported confidence sat around 0.95 and barely moved. The top-1 trend held between 0.95 and 0.97 across the whole session, and WildEdge's confidence-drift index (PSI) came back at 0.0000, no drift at all. That flatness is the story. The model returns a high number on almost everything, including the harder cases (partial cars, odd angles, models it was never tuned for), which is exactly why the confidence field is the model's opinion of itself rather than a measurement. A verdict from the person holding the phone is worth more than any of these numbers.

Coverage The generalist recognized a genuinely mixed bag with no tuning: a BMW M4, a Lamborghini Huracán, a Ford Expedition, a Nissan GT-R, a Ferrari LaFerrari, and a run of Fiats (Punto, 126, Cinquecento) all came back labeled in a single afternoon. That spread is the "broad coverage for free" that makes a generalist such a good first guess.

Cost...

Building a car recognition application (pt. 1)

Related Articles

(no title)

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

ZCode – Harness for GLM-5.2

Apertus – Open Foundation Model for Sovereign AI