Clinical AI Research · Jan 2026 – Present · 2026
I trained a model that looked great on every metric — then discovered the metrics were lying.
A medical imaging project that became a lesson in why you never trust a loss curve until you've looked at the data with your own eyes.
That gap between what the metrics said and what my eyes told me is what this project is really about.
The moment I stopped trusting my numbers
Virtual contrast-enhanced CT is a genuinely useful idea: roughly 15% of patients can’t receive iodine contrast (kidney disease, allergy), so a model that synthesizes a contrast scan from a plain one could spare them a second, contrast-loaded acquisition. The clinical team handed me paired plain/contrast scans and asked me to train a synthesis model.
I did. The loss curves were beautiful. SSIM climbed past 0.88. By every standard metric, the model was learning to turn a plain scan into a contrast one.
Then I opened a few predictions next to the ground truth — and the anatomy didn’t match. The model wasn’t synthesizing contrast into the right slice. It was painting contrast onto the wrong part of the body and still scoring well, because the metric was comparing the wrong things.
What I actually found: the slices were shifted
The paired scans weren’t aligned. A plain CT and a contrast CT of the same patient are two separate acquisitions — they start at slightly different positions along the body. In this dataset that offset was 150–190 mm. So “slice 47 of the plain scan” and “slice 47 of the contrast scan” were not the same anatomical slice. Across 17 patients (55,639 slices in total), the model had learned to map one to the other anyway, and SSIM rewarded it — because SSIM measures structural similarity, not whether you’re looking at the same organ.
The alignment algorithm
This is the part I built independently. Given two scans of the same patient, I align them before they ever reach the model:
- Resample both volumes to a common voxel grid, so coordinates mean the same thing in both.
- Slide one volume along the z-axis and compute normalized cross-correlation at each offset — the offset with the highest correlation is the true alignment.
- Gate 1 drops any pair whose best overlap is still too low (some scans simply don’t share enough anatomy).
- Gate 2 drops pairs that align mathematically but not anatomically.
Twenty candidate pairs went in; fifteen came out verified. Those fifteen are the only ones I’ll train on.
Where things stand
This is an active project, and I want to be honest about its limits:
- There is no true held-out test set yet — validation patients currently share identity with training patients, which the retrain fixes.
- The model is being retrained from scratch on the 15 aligned pairs; the earlier numbers describe a model I no longer trust.
- Deployment is pending clinical review; the underlying data is under hospital NDA, so no public repo.
The win here isn’t a finished model. It’s that I caught a systematic error that made every metric lie — and built the fix before anyone downstream got misled.
results
- 150–190 mm hidden scan-range offset I found
- 20→15 pairs after my alignment filter
- 55,639 slices · 17 patients