We present the Segment Anything (SA) project: a promptable segmentation model, dataset, and task advancing foundation models in computer vision. The Segment Anything Model (SAM) processes diverse prompts (points, boxes, text) through a ViT-based encoder and real-time mask decoder (~50ms per prompt), resolving ambiguity via multi-mask outputs. Trained on SA-1B–1.1B masks from 11M licensed images collected via a scalable data engine–SAM achieves human-level mask quality (94% IoU vs. professional edits). Zero-shot evaluations across 23 benchmarks demonstrate strong performance: surpassing RITM in point-based segmentation (human-rated), 0.768 ODS edge detection on BSDS500, and 59.3 AR@1000 object proposals on LVIS. SAM enables flexible integration into systems for tasks like text-guided segmentation. We release models and data to catalyze vision foundation model research.
Инноватика-2025 : сборник материалов XXI Международной школы-конференции студентов, аспирантов и молодых ученых, 28-30 апреля 2025 г., г. Томск, Россия. Томск, 2025. С. 255-262