Synthetic Data Enables Human-Grade Microtubule Analysis with Foundation Models for Segmentation

Abstract

Studying microtubules (MTs) and their mechanical properties is central to understanding intracellular transport, cell division, and drug action. While important, experts still need to spend many hours manually segmenting these filamentous structures. The suitability of state-of-the-art methods for this task cannot be systematically assessed, as large-scale labeled datasets are missing. We address this gap by introducing the synthetic dataset SynthMT, produced by tuning a novel image generation pipeline on real-world interference reflection microscopy (IRM) frames of in vitro reconstituted MTs without requiring human annotations. In our benchmark, we evaluate nine fully automated methods for MT analysis in both zero- and Hyperparameter Optimization (HPO)-based few-shot settings. Across both settings, classical algorithms and current foundation models still struggle to achieve the accuracy required for biological downstream analysis on in vitro MT IRM images that humans perceive as visually simple. However, a notable exception is the recently introduced SAM3 model. After HPO on only ten random SynthMT images, its text-prompted version SAM3Text achieves near-perfect and in some cases super-human performance on unseen, real data. This indicates that fully automated MT segmentation has become feasible when method configuration is effectively guided by synthetic data.

Key Contributions

🔧 Zero-annotation pipeline: Got MT images, but no annotations? Our open-source pipeline turns them into realistic synthetic training data with perfect ground-truth masks — no manual labeling needed!
📦 Ready-to-use benchmark (SynthMT): We release SynthMT, a synthetic dataset with instance masks for MTs, judged by domain experts for biological plausibility.
📊 Honest model comparison: We benchmark 9 classical and foundation models on SynthMT — zero-shot and with HPO — so you know what actually works.
🚀 Finally, it works! SAM3Text + simple HPO on just 10 synthetic images from our pipeline → human-grade segmentation on real data. Fully automated MT analysis is here! ⭐

Benchmark Results

📊 What we measure

SKIoU (Skeleton IoU) measures how well predicted segmentations match ground-truth microtubule shapes — the core metric for filament segmentation.

Count measures the absolute difference in the number of detected filaments compared to the ground truth.

Length KL and Curvature KL capture how well the model preserves biologically meaningful properties. Lower = predictions match ground-truth MT distributions better.

🔬 Key takeaways

Foundation models outperform baselines: All foundation models beat the traditional FIESTA baseline, and microscopy-specific models often outperform general ones on biological tasks.
HPO transfers to real data: Optimizing hyperparameters on just 10 synthetic images significantly improves performance on real data for some models, especially SAM3Text.
Segmentation predicts downstream success: Better segmentation usually leads to better biological analysis, though some models can match distributions even with lower segmentation accuracy.
SAM3Text enables automation: With HPO, it achieves human-grade performance on real data, proving that fully automated MT analysis is feasible.

💡 Full results with all models and metrics available in the paper.

`SynthMT`

Model	SKIoU ↑	Length KL ↓	Curvature KL ↓
Traditional Baselines
`FIESTA`	0.12	5.03	0.997
`FIESTA` + HPO	0.24	3.74	0.706
Pretrained Domain-Specific Models
`TARDIS`	0.45	0.56	0.019
`TARDIS` + HPO	0.48	0.41	0.031
SAM-based Models
`µSAM`	0.02	0.88	0.130
`µSAM` + HPO	0.66	1.24	0.132
`CellSAM`	0.56	0.19	0.021
`CellSAM` + HPO	0.59	0.21	0.031
`Cellpose-SAM`	0.27	0.12	0.019
`Cellpose-SAM` + HPO	0.65	0.12	0.012
Foundation Models
`SAM`	0.37	3.90	0.700
`SAM` + HPO	0.16	5.45	0.912
`SAM3Text`	0.85	0.07	0.063
`SAM3Text` + HPO 🏆	0.93	0.02	0.069

Unseen Real Data

Model	Count (n/img)	Length KL ↓	Curvature KL ↓
SAM-based Models
`CellSAM`	21.29	0.07	0.09
`CellSAM` + HPO	16.73	0.08	0.14
`Cellpose-SAM`	12.27	0.05	0.18
`Cellpose-SAM` + HPO	13.00	0.07	0.09
Foundation Models
`SAM`	28.55	0.79	0.16
`SAM` + HPO	108.26	1.39	0.15
`SAM3Text`	14.98	0.21	0.25
`SAM3Text` + HPO	25.08	0.09	0.14
Inter-annotator	25.29	0.09	0.11
Ground Truth	23.91	0	0