Virtual Contrast-Enhanced CT Synthesis

The moment I stopped trusting my numbers

Virtual contrast-enhanced CT is a genuinely useful idea: roughly 15% of patients can’t receive iodine contrast (kidney disease, allergy), so a model that synthesizes a contrast scan from a plain one could spare them a second, contrast-loaded acquisition. The clinical team handed me paired plain/contrast scans and asked me to train a synthesis model.

I did. The loss curves were beautiful. SSIM climbed past 0.88. By every standard metric, the model was learning to turn a plain scan into a contrast one.

Then I opened a few predictions next to the ground truth — and the anatomy didn’t match. The model wasn’t synthesizing contrast into the right slice. It was painting contrast onto the wrong part of the body and still scoring well, because the metric was comparing the wrong things.

What I actually found: the slices were shifted

Paired CT slices before and after alignment, showing mismatched vs matching anatomy — Fig 3 — before alignment, the model compares two different anatomical locations; after, it compares the same slice.

The paired scans weren’t aligned. A plain CT and a contrast CT of the same patient are two separate acquisitions — they start at slightly different positions along the body. In this dataset that offset was 150–190 mm. So “slice 47 of the plain scan” and “slice 47 of the contrast scan” were not the same anatomical slice. Across 17 patients (55,639 slices in total), the model had learned to map one to the other anyway, and SSIM rewarded it — because SSIM measures structural similarity, not whether you’re looking at the same organ.

The alignment algorithm

This is the part I built independently. Given two scans of the same patient, I align them before they ever reach the model:

Four-stage alignment pipeline: resample, cross-correlation along z, then two gating checks — Fig 2 — the alignment pipeline I wrote: resample, find the best z-shift by normalized cross-correlation, then two gates.

Resample both volumes to a common voxel grid, so coordinates mean the same thing in both.
Slide one volume along the z-axis and compute normalized cross-correlation at each offset — the offset with the highest correlation is the true alignment.
Gate 1 drops any pair whose best overlap is still too low (some scans simply don’t share enough anatomy).
Gate 2 drops pairs that align mathematically but not anatomically.

Twenty candidate pairs went in; fifteen came out verified. Those fifteen are the only ones I’ll train on.

Where things stand

This is an active project, and I want to be honest about its limits:

There is no true held-out test set yet — validation patients currently share identity with training patients, which the retrain fixes.
The model is being retrained from scratch on the 15 aligned pairs; the earlier numbers describe a model I no longer trust.
Deployment is pending clinical review; the underlying data is under hospital NDA, so no public repo.

The win here isn’t a finished model. It’s that I caught a systematic error that made every metric lie — and built the fix before anyone downstream got misled.