What types of AI-generated images can this API detect?

Our API detects images from all major AI generators including Midjourney (v4–v6), DALL-E 2 & 3, Stable Diffusion (1.5, XL, 3.0), Flux, Adobe Firefly, Google Imagen, and 15+ other models. We also detect hybrid images that combine AI generation with manual editing.

How accurate is the AI Image Detector API?

Our detection model achieves 99.1% accuracy on our benchmark dataset of 10M images. We maintain separate precision and recall metrics per generator: 99.4% for Midjourney, 98.9% for DALL-E, 99.2% for Stable Diffusion, and 98.7% for Flux. We continuously retrain as new models emerge.

Can the API detect partially AI-edited images?

Yes. Beyond fully synthetic images, our API detects AI inpainting, outpainting, face swaps, background replacements, and generative fill edits. The pixel-level heatmap response highlights exactly which regions were AI-modified.

How does source model attribution work?

Our classifier analyzes frequency-domain signatures, noise patterns, and structural artifacts unique to each generator. It returns the most likely source model with a confidence score. When multiple generators were used (e.g., Stable Diffusion base + Midjourney upscale), it flags the image as multi-model.

Is there a free tier?

Yes. The free tier includes 500 image scans per month with full API access, model attribution, and heatmap generation. No credit card required.

Blog / Technical

AI Image Forensics 101: Pixel-Level Analysis for Synthetic Media

May 22, 202613 min read

On this page

Why forensics works at all
Technique 1: Frequency-domain analysis
Technique 2: Noise residual analysis
Technique 3: Demosaicing artifacts
Technique 4: Compression artifact analysis
How a modern detector combines techniques
Adversarial attacks against forensic detectors
What this means for buying a detection API
A peek under the hood at our API
Frequently asked questions

When a 2026-vintage AI image fools the human eye, what's left to detect with? The answer is forensics — the math-level analysis of an image's pixels, frequency content, and structural fingerprints. Modern detection APIs combine multiple forensic techniques into a single classifier; understanding what those techniques are gives you the vocabulary to evaluate detectors and to know when their verdicts are trustworthy.

This guide is for engineers, researchers, and technically-inclined trust-and-safety practitioners who want to understand what's happening under the hood of an AI-detection API. We'll cover the four most important forensic techniques in current production detectors, the math behind why they work, what kinds of attacks defeat each one, and how a defense-in-depth detector combines them.

If you're more interested in the practical detection workflow than the underlying theory, our pillar guide on detecting AI-generated images is the right starting point. This piece complements that one with the technical depth.

Why forensics works at all

The intuition: real photographs and AI-generated images, even when visually indistinguishable, have different physical origins. A real photo is the result of light hitting a sensor, undergoing demosaicing, going through a camera's image signal processor, getting compressed by the file format. A generated image is the result of a neural network sampling pixel values from a learned distribution, often passed through a few post-processing steps.

These different origins leave different statistical signatures. Even when the human visual system can't see the difference, the differences are real and measurable in:

The frequency-domain representation of the image
The local noise distribution
The compression artifact patterns
The cross-channel correlations between RGB

A trained classifier can pick up on these differences with high accuracy, even on images that fool human reviewers. The four techniques below are the foundation of how modern detectors work.

Technique 1: Frequency-domain analysis

Every image can be transformed from its spatial representation (a grid of pixels) to its frequency representation (how much of the image is at each frequency, in each direction). The Fourier transform is the standard tool; the discrete cosine transform (DCT) is a related transform used in JPEG.

What the frequency domain reveals:

Real photographs have characteristic frequency spectra. Most of their energy is in low-to-mid frequencies (the smooth gradients and large objects); high frequencies are present but usually with specific patterns related to:

Camera sensor noise (random, broadband)
Demosaicing patterns (a grid-like pattern at specific frequencies tied to the Bayer color filter array)
JPEG compression (energy reduction at high frequencies, with characteristic block boundaries)

AI-generated images have different spectra. Diffusion models in particular tend to over-represent some frequencies and under-represent others compared to natural photos. The exact pattern varies by architecture (DDPM vs DDIM vs latent diffusion), training data, and post-processing.

A classifier trained on the 2D frequency spectra of real and generated images can distinguish them with 90%+ accuracy on benchmark sets. The signal is robust to many image transformations — re-saving in a different format, mild color correction — but is degraded by aggressive compression or downsampling.

This was the first technique deployed in production AI-detectors (around 2022) and remains a core component of all modern ones.

Technique 2: Noise residual analysis

Real cameras produce images with characteristic noise. The noise has a known structure:

Photon shot noise — proportional to sqrt(intensity), fundamental physics of light
Read noise — additive, sensor-specific, varies by ISO setting
Pattern noise — fixed-pattern noise from sensor manufacturing variations
Color filter array noise — distinct noise statistics in each color channel due to demosaicing

For any given camera and ISO setting, you can extract the noise residual — the difference between the image and its denoised version. The noise residual has predictable statistical properties: standard deviation that scales with intensity in a known way, characteristic spatial correlation, distinct profiles in R/G/B channels.

AI-generated images have noise residuals too — generative models don't produce images with literally zero noise. But the noise structure is different:

AI noise is often more uniform across the image (real noise depends on local intensity)
Cross-channel noise correlation is different (real noise has specific correlations from demosaicing; generated noise typically does not)
The noise spatial pattern is unstructured in a way that real sensor noise isn't

Detection technique: extract the noise residual, compute its statistical properties (per-channel variance, intensity-dependence, spatial correlation), and feed those features to a classifier. Real photos cluster in one region of the feature space; generated images cluster in another.

This technique is robust to: format changes, mild editing, color corrections. Vulnerable to: aggressive denoising, deliberate noise injection, downsampling.

Technique 3: Demosaicing artifacts

This is the most physics-grounded technique and the hardest for generative models to defeat.

Real cameras use a Bayer color filter array (CFA): each pixel on the sensor only captures one color channel (red, green, or blue), arranged in a specific pattern. The full RGB image is produced by demosaicing — interpolating the missing color channels at each pixel from the neighbors.

Demosaicing leaves a fingerprint. The exact algorithm used by each camera manufacturer (and sometimes each model) produces characteristic patterns at specific frequencies and orientations. These patterns:

Are consistent across the entire image
Have specific signatures detectable in the frequency domain (specifically, a grid-like pattern at the Nyquist frequency in certain directions)
Persist through moderate JPEG compression
Are absent from AI-generated images, which never went through a demosaicing process

The technique: analyze the second-order statistics of an image's color channels for the patterns expected from common demosaicing algorithms. If the demosaicing fingerprint is present and consistent, the image is likely a real photograph (or one carefully edited from a real photograph). If it's absent or inconsistent, the image is likely synthetic.

This technique is one of the strongest signals available because:

The fingerprint is hard to fake without actually running the image through a virtual sensor and demosaicing pipeline
It's robust to many AI-detection-defeating attacks (recompression, color correction)
It distinguishes real photos from generated even when the generative model has been deliberately trained to mimic photographic style

It's not perfect:

Heavy denoising can degrade the fingerprint
Aggressive downsampling can destroy it
Some sophisticated generation pipelines do post-process generated content through a virtual demosaicing pass to add this fingerprint

But it remains a high-value signal in modern detection.

Technique 4: Compression artifact analysis

JPEG and other lossy compressors leave specific artifacts. The most well-known:

8x8 block boundaries — JPEG processes images in 8x8 blocks; transitions between blocks have characteristic discontinuities
DCT coefficient quantization — the loss of high-frequency detail follows specific patterns
Chroma subsampling — colors are typically stored at lower resolution than luminance, leaving characteristic patterns

The forensic technique: analyze how the image's compression history is consistent with its spatial content. If the image was generated as PNG and then saved as JPEG once, it has a specific compression signature. If it was photographed, JPEG'd by the camera, transferred, JPEG'd again by the upload pipeline, and JPEG'd a third time by the platform, it has a different signature with three rounds of quantization.

For AI-detection specifically:

Single-pass JPEG is consistent with: a photographic original or an AI-generated image saved once. Most AI-generated images circulating in the wild have this signature.
Multi-pass JPEG is consistent with: a real photo that's gone through multiple platforms. Most real photos circulating in the wild have this signature.
Mismatched compression — single-pass JPEG with content claiming to come from a heavy-edit-history platform — is a yellow flag.

By itself this is a weak signal (an AI image saved through a real-style pipeline could mimic the multi-pass signature). Combined with frequency analysis and noise residuals, it adds discriminative power.

How a modern detector combines techniques

A production AI-detection model typically:

Extracts a feature vector from the input image including:

- Output of a CNN trained on image classification (general visual features) - Frequency-domain spectrum features - Noise residual statistics - Demosaicing fingerprint detection output - Compression artifact analysis - Image metadata signals (resolution, compression history, color space)

Feeds the combined feature vector into a final classifier (often a transformer or a deep MLP) that outputs a calibrated probability of AI generation.

Optionally produces auxiliary outputs: model attribution (which generator), region heatmap (which parts of the image are most suspicious), confidence calibration metrics.

The training data typically combines:

Real photographs from a wide variety of sources (professional cameras, smartphones, edited, unedited, compressed, multi-platform-relayed)
AI-generated images from a wide variety of generators (Midjourney v6/v7, Flux variants, DALL-E versions, Stable Diffusion fine-tunes, Sora-image, more obscure models)
Hybrid images (inpainted, outpainted, style-transferred)
Adversarially-processed examples (compressed, noised, color-shifted, downsampled)

Total training data for a state-of-the-art detector is typically 10M+ images, balanced to roughly equal real and synthetic with broad sub-distribution coverage.

This is what makes detection work at scale. No single technique is sufficient; the combination is.

Adversarial attacks against forensic detectors

Sophisticated bad actors try to defeat detectors. The common attacks:

Recompression. Save the AI-generated image as JPEG, decompress, save again, repeat. This degrades frequency-domain signatures and adds compression-history complexity.

Defense: modern detectors are trained on multi-pass-JPEG'd AI content; they recognize this attack pattern. Effective in 2022; less effective now.

Downsampling and re-upsampling. Reduce image resolution then increase back to original. This destroys high-frequency information (where many forensic signals live).

Defense: detectors trained on downsampled images perform reasonably well, but accuracy drops by 5-10 percentage points. This is genuinely effective and remains a partial defeat for many detectors.

Mild noise injection. Add a small amount of Gaussian noise to the image to mask the demosaicing fingerprint and noise residual signatures.

Defense: detectors trained on noised examples are partially robust. The accuracy hit is typically 3-7 percentage points; the AI image still classifies correctly most of the time but with lower confidence.

Compression-aware adversarial generation. Train the generative model with a JPEG compressor in the loop, so it produces images that have natural-looking compression signatures.

Defense: this is a known threat. Detectors are training against compression-aware generation but it remains an effective attack against detectors that haven't been updated for it.

Style-transfer through real-photo aesthetics. Pass the AI image through a style-transfer or photographic processing pipeline that adds real-photo aesthetic markers (lens-shaped bokeh, film grain, specific color profiles).

Defense: detectors trained on style-transferred AI content can recognize this. The attack adds latency and cost to the bad actor's pipeline; sophisticated attackers will absorb that cost.

Hybrid generation (real photo + AI inpaint). Take a real photo and only use AI to modify part of it. The image retains demosaicing fingerprints from the real-photo regions; only the inpainted region has AI signatures.

Defense: modern detectors return region heatmaps showing which regions are suspicious. This catches partial-AI cases that single-image-level detectors miss. Detection accuracy on hybrid content is significantly lower than on full-AI content but improving.

The arms race is permanent. The current state in 2026 is that good detectors win most of the time, including against sophisticated attacks, but no single forensic technique is sufficient and no detector is invulnerable.

What this means for buying a detection API

Practical implications when evaluating a detection vendor:

Ask about their forensic technique stack. "What signals does your detector use?" A vendor that says "deep learning model trained on millions of images" without details is likely using only the CNN feature extraction without explicit forensics. A vendor that lists frequency analysis, noise residuals, demosaicing fingerprints, compression analysis, and a learned classifier on top is more thorough.

Ask about adversarial robustness. "How do you handle recompression / downsampling / noise injection?" The vendor should have specific answers, including benchmark numbers under each adversarial transform.

Ask about update cadence. New generators ship monthly; new attacks emerge regularly. A vendor with a quarterly retraining cycle is keeping up; a vendor with annual updates is falling behind.

Ask about model attribution. A detector that can identify which generator produced the image is doing more sophisticated forensic analysis than one that just produces a binary verdict. Specific generator attribution (Midjourney v7 vs Flux Pro) is hard and indicates depth.

Ask about inpainting / partial-AI detection. A detector that returns region heatmaps and handles hybrid content is significantly more sophisticated than one that only returns image-level verdicts. For 2026 use cases (where deliberate fraud often involves inpainting real photos), region-level output is increasingly important.

We've covered evaluation methodology in our accuracy guide and a vendor comparison in our Sightengine vs Hive vs us comparison — both worth reading if forensic depth is a key criterion.

A peek under the hood at our API

For full technical disclosure, our AI Image Detector API combines:

A vision-transformer-based feature extractor pretrained on natural images and fine-tuned for forensic classification
Explicit frequency-domain spectral analysis fed as auxiliary features
Noise residual extraction with intensity-conditional features
Demosaicing fingerprint detection with multi-Bayer-pattern matching
Compression-history analysis (number of JPEG passes, quality factor estimation)
Model-attribution head trained on labeled outputs from 25+ specific generators
Region heatmap via gradient-class-activation mapping for partial-AI detection

The model is retrained every 6-8 weeks with new generator outputs as they ship. Adversarial training includes recompression, downsampling, noise injection, and several other transforms. Performance numbers (per-model accuracy, FPR at threshold, adversarial robustness) are documented in our accuracy guide and shared with Pro and Enterprise customers in detail.

The free tier (500 scans/month) gives you the same model that production customers use. You can run your own test set against it without committing to a paid plan.

Frequently asked questions

Are these forensic techniques publicly documented?

Yes — most are documented in academic literature dating back to 2007 (early steganography and image forensics work) through current AI-specific detection research. Common search terms: "image forensics", "PRNU sensor noise", "JPEG forensics", "AI image detection". The CVPR, ICCV, and IEEE TIFS proceedings are good source venues.

Could I implement these myself?

For a research project or a non-production tool, yes. The basic frequency-domain and noise-residual techniques are documented and implementable in Python with numpy and scikit-image over a weekend. Demosaicing fingerprint detection and full classifier training are more involved but tractable for a small team. Production-grade accuracy requires substantial training data and continuous retraining; that's where vendor APIs add value.

Why doesn't every detector do all of these?

Compute cost and latency. Each forensic technique adds processing time and infrastructure. The cheapest detectors (sub-cent per scan) typically use only the CNN feature extraction. The most sophisticated detectors (a few cents per scan) include the full stack. Pricing reflects this — there's no free lunch.

Are forensic techniques effective on video?

Yes, and video adds powerful additional signals: temporal consistency (does the demosaicing fingerprint stay the same across frames?), optical flow (does motion match physical possibility?), and biological signal extraction (rPPG-based heart-rate detection in face regions). Video deepfake detection benefits from longer per-clip processing time and produces higher accuracy than single-image detection. Our deepfake detection guide covers this.

Do these techniques work on illustrations and 3D renders?

Less well. Most forensic techniques are calibrated for photographic content. AI-generated illustrations and 3D renders share statistical features with human-created versions of the same content type (no demosaicing fingerprint either way, frequency spectra differ from photos). Detection on non-photographic content is an active research area; current accuracy is significantly lower than on photographic content. If you need detection on this content type specifically, ask vendors directly about their performance.

What's the future of image forensics?

The arms race continues. Generators are getting better at producing forensically-realistic outputs; detectors are getting better at finding even the subtle signatures that remain. The long-term trajectory is probably:

Detection-only accuracy ceiling: 95-98% on adversarial images
Augmented with provenance (C2PA): much higher confidence when manifest is present
Augmented with behavioral and contextual signals: the "is this image plausibly real in this context" question goes beyond the pixels

Defense in depth — combining forensics with provenance and context — is the long-term winning strategy. Pure pixel-level analysis hits a ceiling that more sophisticated generators will eventually defeat.

AI image forensics is the math underneath modern detection APIs. Frequency analysis, noise residuals, demosaicing fingerprints, and compression analysis combine into classifiers that catch most AI-generated content even when human reviewers can't. The arms race against sophisticated adversarial attacks is permanent; the current state in 2026 is favorable to detectors when the full stack is deployed.

If you want to see what these forensic techniques produce on your own images, our free API tier returns the per-technique signals alongside the final verdict so you can see the breakdown. Sub-100ms latency, calibrated confidence scores, region heatmaps, model attribution. 500 scans per month, no credit card.

Try the AI Image Detector API

500 free scans per month. No credit card. Sub-100ms detection with model attribution and region heatmaps.

Get an API key →