Skip to content

Training guideline

Super-resolution GANs can be temperamental, and the same is true for inpainting or text-conditioned generators. This guideline distils lessons learned from training across remote sensing, medical imaging, microscopy, consumer photography, and the new tasks entering GAN-Engine. Use it as a playbook to stabilise and accelerate your own experiments.

1. Start with reconstruction losses

Before introducing the discriminator, pretrain the generator using reconstruction objectives only (L1/L2/SSIM). Set Training.pretrain_g_only: true and adjust g_pretrain_steps based on your dataset size. For high-noise domains (CT, SAR), extend the pretraining phase to ensure the generator captures signal statistics.

2. Normalise thoughtfully

  • Compute per-channel statistics that reflect your modality. For CT, clip Hounsfield units; for multispectral data, respect known reflectance ranges.
  • Align LR and HR normalisation to avoid scale drift.
  • When statistics drift between training and deployment, recompute them or enable adaptive histogram matching.

3. Ramp in adversarial pressure

Adversarial losses are powerful but destabilising. Use Training.Losses.adv_warmup to slowly increase adv_loss_beta. Cosine ramps often feel smoother than linear ramps. Keep an eye on discriminator loss oscillations—if they explode, or reduce the ramp speed.

4. Choose perceptual channels carefully

Perceptual networks (VGG/LPIPS) were trained on RGB. When working with medical or hyperspectral data:

  • The bands used are automatically selected, no matter the number of input bands.
  • Consider training your own feature extractor on modality-specific data and plugging it in via the registry.
  • Balance perceptual weights with structural metrics like SSIM or SAM to avoid hallucinating features.

5. Monitor gradient statistics

  • Discriminator gradients – If they vanish, increase adv_loss_beta or reduce EMA smoothing.
  • Gradient clipping – Adjust Training.Stability.gradient_clip_val to keep updates within a safe range.

6. Use EMA for evaluation

EMA checkpoints often deliver the best validation visuals. Enable Training.ema with decay 0.995–0.9999. Swap between raw and EMA weights during validation to understand their trade-offs.

Augmentations should mirror the variability seen in deployment environments.

7. Evaluate with domain metrics

Beyond PSNR/SSIM, compute modality-aware metrics:

  • Medical – Structural similarity on organ masks, segmentation overlap, clinical scoring.
  • Remote sensing – Spectral angle mapper, vegetation index consistency, change detection accuracy.
  • Microscopy – F1 scores on downstream segmentation or object detection tasks.
  • Consumer – User studies, NR-IQA scores, or downstream detection accuracy.

If these metrics are not already implemented, or you want to add on, feel free to open a PR.

10. Plan conditioning strategies

For inpainting, supply explicit mask channels and ensure your dataset loader pads masked regions consistently. For conditional or text-to-image tasks, cache tokenised prompts and consider freezing the text encoder for the first training phases to avoid catastrophic forgetting. Monitor conditioning losses alongside adversarial terms.


Following these guidelines will help you train robust, high-quality GAN models—whether you're restoring resolution, filling in missing regions, or generating imagery from scratch.