Scientific Literature Improving rare-class detection in deep-sea imagery via generative augmentation with stable diffusion
Research Abstract & Technology Focus
Megabenthos play a critical role in maintaining deep-sea ecosystem stability, making accurate detection important for deep-sea conservation. However, the high cost of deep-sea exploration and the long-tailed distribution of available datasets lead to severe data scarcity for rare species, limiting deep-sea benthos detection. To address this challenge, we propose a data augmentation framework based on Stable Diffusion (SD) and ControlNet. Specifically, we fine-tune a pretrained SD model using Low-Rank Adaptation (LoRA) to synthesize images of rare benthos, and leverage ControlNet to composite the generated targets into deep-sea backgrounds with controllable layouts and automatic bounding-box annotation. We constructed two megabenthos datasets collected using an optically tethered underwater vehicle (OTV) and an autonomous underwater vehicle (AUV), covering 16 biological categories; data augmentation was applied to 7 rare species with the fewest samples. The generated images achieved a Fréchet Inception Distance (FID) of 117.11 and an Inception Score (IS) of 4.97. When combined with real data for RT-DETR training, the augmentation strategy increased the AP50-95 and AP50 on the OTV dataset to 45.2% and 75.2%, representing improvements of 3.7% and 6.1% over the baseline. Similarly, on the AUV dataset, it increased the AP50-95 and AP50 to 36.8% and 64.7%, yielding enhancements of 2.2% and 4.2% over the baseline. Gains were especially pronounced for tail classes, with AP50-95 increased by 23.6% and 21.9% for Octopus and Bryozoa on the OTV dataset, and by 15.1% and 14.6% for Bryozoa and Hydrozoa on the AUV dataset. Moreover, the proposed approach outperforms traditional augmented methods by 1.6% and 0.8% in AP50-95 on the OTV and AUV datasets, respectively, indicating its utility for improving detection in deep-sea megabenthic surveys.
Correlated Market Trend: Artificial Intelligence
Bridging academia to market: The 60-day public search velocity mapping directly to the core technology of this paper. Dashed line represents 7-day moving average.
Market Trends