基于YOLOv9+SAM的动态目标检测与分割实践指南

背景介绍

本文将介绍基于YOLOv9+SAM实现动态目标检测和分割的实践方法，重点提供详细步骤和代码示例。通过结合YOLOv9的高效检测能力和SAM的零样本分割优势，我们将在RF100 Construction-Safety-2数据集上构建自定义目标检测模型，展示其在自动驾驶、医学成像等多个领域的实际应用价值。

YOLOv9简介

YOLOv9（You Only Look Once）是目标检测领域的一款高性能模型，基于可编程梯度信息（PGI）和通用高效层聚合网络（GELAN）设计，显著提升了检测精度和运行效率。YOLOv9在MS COCO数据集上的出色表现证明了其在实时目标检测中的优越性。

YOLOv9的主要特点包括：

高效性：通过PGI和GELAN架构，实现了速度与精度的平衡

灵活性：支持多种模型规模（从紧凑版到大规模版）

创新性：解决了深度学习中信息丢失问题，跨层保留基本数据特征

SAM简介

SAM（Segment-Anything Model）是一种革命性的图像分割模型，基于迄今为止最大的Segment Anything 1-Billion（SA-1B）数据集，通过简单的提示驱动实现零样本分割。SAM的核心优势在于：

通用性：无需依赖专业知识、强大的计算能力或大量标注数据

灵活性：支持超过10亿种不同分割掩模

跨领域适用：在AR/VR、医学成像、科学研究等多个领域展现出卓越表现

数据集介绍

本文使用Roboflow提供的RF100施工数据集，特别是Construction-Safety-2子集，作为模型的训练和验证数据。RF100数据集旨在建立开源目标检测的标准化基准，强调数据集的通用性和可访问性，为AI研究者和应用开发人员提供了丰富的资源。

实现步骤

环境设置

账户与资源：需要Google帐户和Colab云服务，提供免费的GPU资源（可达16GB）

GPU检查：确保GPU状态就绪，为模型运行做好准备

模型配置

克隆仓库：通过Git克隆YOLOv9仓库并切换至项目目录

安装依赖：运行requirements.txt安装所需软件包

模型权重下载

创建权重目录：mkdir -p {HOME}/weights

下载预训练权重：从GitHub下载YOLOv9和GELAN模型权重

图像处理

数据目录创建：mkdir -p {HOME}/data

下载示例图像：通过Wget下载测试图像

检测运行

执行检测脚本：运行detect.py进行目标检测，并设置检测结果保存路径

分割实现

初始化SAM模型：加载预训练SAM权重并注册模型

加载图像：通过OpenCV加载图像并准备进行分割

结果可视化

颜色映射：为每个类别分配唯一颜色

显示检测结果：结合YOLOv9检测结果和SAM分割掩模进行可视化展示

掩模提取

生成聚合掩模：将多个分割掩模融合成一个完整的分割掩模

展示最终图像：将分割掩模应用于原始图像，展示最终效果

代码示例

# 检测结果提取import cv2# 定义图像路径image_path = '/content/drive/MyDrive/data/image9.jpeg'# 读取图像获取尺寸image = cv2.imread(image_path)image_height, image_width, _ = image.shapedetections_path = '/content/yolov9/runs/detect/exp/labels/image9.txt'bboxes = []class_ids = []conf_scores = []# 从文件中读取检测结果with open(detections_path, 'r') as file:    for line in file:        components = line.split()        class_id = int(components[0])        confidence = float(components[5])        cx, cy, w, h = [float(x) for x in components[1:5]]        # 转换为图像坐标系        cx *= image_width        cy *= image_height        w *= image_width        h *= image_height        # 转换为边界框坐标        xmin = cx - w / 2        ymin = cy - h / 2        xmax = cx + w / 2        ymax = cy + h / 2        bboxes.append((xmin, ymin, xmax, ymax))        class_ids.append(class_id)        conf_scores.append(confidence)# 初始化SAM模型from segment_anything import sam_model_registry, SamAutomaticMaskGenerator, SamPredictorsam_checkpoint = "/content/yolov9/sam_vit_h_4b8939.pth"model_type = "vit_h"sam = sam_model_registry[model_type](checkpoint=sam_checkpoint)predictor = SamPredictor(sam)# 加载图像进行分割import cv2image = cv2.cvtColor(cv2.imread('/content/drive/MyDrive/data/image9.jpeg'), cv2.COLOR_BGR2RGB)predictor.set_image(image)# 可视化分割结果import matplotlib.patches as patchesfrom matplotlib import pyplot as pltimport numpy as npimport yamlwith open('/content/yolov9/data/coco.yaml', 'r') as file:    coco_data = yaml.safe_load(file)    class_names = coco_data['names']# 定义颜色映射color_map = {}for class_id in class_ids:    color_map[class_id] = np.concatenate([np.random.random(3), np.array([0.6])], axis=0)# 定义辅助函数def show_mask(mask, ax, color):    h, w = mask.shape[-2:]    mask_image = mask.reshape(h, w, 1) * np.array(color).reshape(1, 1, -1)    ax.imshow(mask_image)def show_box(box, label, conf_score, color, ax):    x0, y0 = box[0], box[1]    w, h = box[2] - box[0], box[3] - box[1]    rect = plt.Rectangle((x0, y0), w, h, edgecolor=color, facecolor='none', lw=2)    ax.add_patch(rect)    label_offset = 10    label_text = f'{label} {conf_score:.2f}'    ax.text(x0, y0 - label_offset, label_text, color='black', fontsize=10, va='top', ha='left',            bbox=dict(facecolor=color, alpha=0.7, edgecolor='none', boxstyle='square,pad=0.4'))plt.figure(figsize=(10, 10))ax = plt.gca()plt.imshow(image)# 展示分割结果for class_id, bbox in zip(class_ids, bboxes):    class_name = class_names[class_id]    color = color_map[class_id]    input_box = np.array(bbox)    masks, _, _ = predictor.predict(        point_coords=None,        point_labels=None,        box=input_box,        multimask_output=False    )    show_mask(masks[0], ax, color=color)    show_box(bbox, class_name, conf, color, ax)plt.axis('off')plt.show()# 生成最终图像aggregate_mask = np.zeros(image.shape[:2], dtype=np.uint8)for bbox in bboxes:    input_box = np.array(bbox).reshape(1, 4)    masks, _, _ = predictor.predict(        point_coords=None,        point_labels=None,        box=input_box,        multimask_output=False    )    aggregate_mask = np.where(masks[0] > 0.5, 1, aggregate_mask)binary_mask = np.where(aggregate_mask == 1, 1, 0)white_background = np.ones_like(image) * 255new_image = white_background * (1 - binary_mask[..., np.newaxis]) + image * binary_mask[..., np.newaxis]plt.figure(figsize=(10, 10))plt.imshow(new_image.astype(np.uint8))plt.axis('off')plt.show()

总结

通过以上步骤，我们成功实现了基于YOLOv9+SAM的动态目标检测与分割系统。该系统不仅在检测精度和分割粒度上表现优异，还具有强大的通用性和扩展性，适用于多个实际场景。

转载地址：http://resfk.baihongyu.com/

你可能感兴趣的文章

Objective-C实现NumberOfIslands岛屿的个数算法（附完整源码）