CelebA Object Detection (YoloV7)

This example will demonstrate how to integrate YoloV7 to the Tensorleap system. The architecture we use for this example is the YoloV7-tiny model, trained on the CelebA full dataset on 1 class (faces).

The starting point for this example is having a trained model PyTorch weights (.pt) that was trained using the YoloV7 repository.

Dataset Script

To use the CelebA dataset for object detection, we set up the CelebA dataset into an images and labels folders, and created the corresponding .txt files according to the YOLOv7 specs.

In the following entries we provide an in-depth description of the main components of our dataset, following by the complete dataset script

Key components

Before going to each component in depth, two things are important to note:

  • The GT function converts a YOLO-format: [class,X,Y,W,H] and outputs [X,Y,W,H,class]

  • The input function return a channels-last image [H,W,3]

YoloV7 utils

YOLOv7 requires a decoder (so we can view the images), a custom loss (that is composed of an object & class & IOU losses), and a specific grid definition to be able to map the predictions to the priors. A complete description of these elements and their configuration could be found in the helpers section.

Here, we set up the YOLO utils with a YOLO-tiny config. This includes the loss config (overlap threshold, maximum matches, weights), the decoder config (NMS & confidence threshold, top_k and max bb to plot) and the Grid config (heads size, strides, and image size).

from code_loader.helpers.detection.yolo.decoder import Decoder
from code_loader.helpers.detection.yolo.utils import scale_loc_prediction, reshape_output_list
from code_loader.helpers.detection.yolo.grid import Grid
from code_loader.helpers.detection.yolo.loss import YoloLoss
from code_loader.helpers.detection.utils import xywh_to_xyxy_format, xyxy_to_xywh_format, jaccard

# -------------------------------------OD Functions ----------------------------------- #
CATEGORIES = ['face']  # class names
BACKGROUND_LABEL = 1 
MAX_BB_PER_IMAGE = 30
CLASSES = 1
IMAGE_SIZE = (640, 640)
FEATURE_MAPS = ((80, 80), (40, 40), (20, 20))
BOX_SIZES = (((10, 13), (16, 30), (33, 23)),
             ((30, 61), (62, 45), (59, 119)),
                 ((116, 90), (156, 198), (373, 326))) #tiny fd
NUM_FEATURES = len(FEATURE_MAPS)
NUM_PRIORS = len(BOX_SIZES[0]) * len(BOX_SIZES) #[3*3]
OFFSET = 0
STRIDES = (8, 16, 32)
CONF_THRESH = 0.35
NMS_THRESH = 0.65
OVERLAP_THRESH = 0.0625 #might need to be 1/16
BOXES_GENERATOR = Grid(image_size=IMAGE_SIZE, feature_maps=FEATURE_MAPS, box_sizes=BOX_SIZES,
                                         strides=STRIDES, offset=OFFSET)
DEFAULT_BOXES = BOXES_GENERATOR.generate_anchors()
LOSS_FN = YoloLoss(num_classes=CLASSES, overlap_thresh=OVERLAP_THRESH,
                                default_boxes=DEFAULT_BOXES, background_label=BACKGROUND_LABEL,
                                from_logits=False , weights=[4.0, 1.0, 0.4], max_match_per_gt=10)
DECODER = Decoder(CLASSES,
                           background_label=BACKGROUND_LABEL,
                           top_k=20,
                           conf_thresh=CONF_THRESH,
                           nms_thresh=NMS_THRESH,
                           max_bb_per_layer=MAX_BB_PER_IMAGE,
                           max_bb=MAX_BB_PER_IMAGE)

Preprocess

The following method downloads our input text files from the public cloud, reads them, and parses the first NUM_SAMPLES entries from each file.

def subset_images_list() -> List[PreprocessResponse]:
    lists_base_path = Path('celebA/celeba_full/input_lists')
    lists_names = ["train.txt", "val.txt", "test.txt"]
    NUM_SAMPELS = 100
    lists_full_path = [lists_base_path / subset for subset in lists_names]
    lists_files = [_download(str(f)) for f in lists_full_path]
    subset_image_pths = [None]*3
    subset_labels_pths = [None]*3
    for i in range(len(lists_names)):
        with open(lists_files[i], 'r') as f:
            subset_image_pths[i] = f.read().strip().splitlines()
            subset_labels_pths[i] = transform_image_list_to_labels(subset_image_pths[i])
    subset_image_pths = [sub_pth[:NUM_SAMPELS] for sub_pth in subset_image_pths]
    subset_labels_pths = [sub_pth[:NUM_SAMPELS] for sub_pth in subset_labels_pths]
    responses = [PreprocessResponse(length=len(img_pth), data={'img_path': img_pth, 'label_path': lab_pth})
                 for img_pth, lab_pth in zip(subset_image_pths, subset_labels_pths)]
    return responses

Input Images

this method downloads the images from our cloud, loads them, and then resizes them to a specific IMAGE_SIZE

def input_image(idx: int, data: PreprocessResponse) -> NDArray[float]:
    """
    Returns a BGR image normalized and padded
    """
    data = data.data
    filepath = data['img_path'][idx]
    fpath = _download(filepath)
    image = np.array(Image.open(fpath).resize((IMAGE_SIZE[1], IMAGE_SIZE[0]), Image.BILINEAR))/255.
    # rescale
    return image

Ground Truth

This method reads the YOLO-format labels files and returns a [X,Y,W,X,class_idx] encoded ground truth, with a MAX_BB_PER_IMAGE GT instances per image

def get_bb(idx: int, data: PreprocessResponse) -> NDArray[np.double]:
    """
    returns an array shaped (MAX_BB_PER_IMAGE, 5) where the channel idx is [X,Y,W,H] normalized to [0,1]
    """
    data = data.data
    filepath = data['label_path'][idx]
    fpath = _download(filepath)
    with open(fpath, 'r') as f:
        gt_list = [x.split() for x in f.read().strip().splitlines()]
    bboxes = np.zeros([MAX_BB_PER_IMAGE, 5])
    max_anns = min(MAX_BB_PER_IMAGE, len(gt_list))
    for i, gt_entry in enumerate(gt_list):
        ann = gt_entry
        bboxes[i, :4] = np.array(ann[1:]).astype(float)
        bboxes[i, 4] = np.array(ann[0]).astype(float)
    bboxes[max_anns:, 4] = BACKGROUND_LABEL
    return bboxes

Complete Dataset Script

The complete Face Detection dataset script could be found here.

Exporting an ONNX Model

After the PyTorch training is finished, an ONNX model should be exported using the YOLOv7 export script. YOLOv7 has multiple export options, but the one that would allow the easiest integration with the TensorLeap system is exporting the model without NMS, but with the decoder.

To export the PyTorch model to ONNX you should execute the following command:

python export.py --weights WEIGHTS_PATH --grid --simplify --img-size 640 640 --max-wh 640

Where WEIGHTS_PATH is the .PT weights and 640 is the resolution of the input images.

Example ONNX model

Our YoloV7 exported model could be found here.

Model Integration

Following the import model guide we can now upload the ONNX model to the platform.

Removing last node

This model has a redundant node added to it at upload time - it should be removed.

Setting up the model

To set up the model, we need to first move the dataset node from the left-most part of the model to the right.

We should then select the YOLO parsed dataset on the dataset node. and connect several nodes:

  • The GT visualizer (visualize GT BB)

  • Prediction visualizer (visualize prediction BB)

  • Image visualizer (visualize input)

  • Custom Loss

Don't forget to choose the loss within the dropdown menu after adding the loss node

  • Optimizer

  • Metrics

After connecting these nodes you should save the model (by overriding current version)

At this point both the dataset and the model is integrated into the platform. You can run evaluate and training

Last updated