YOLO

Utils for the Yolo Object Detection task

This package contains multiple helpers and methods for the YOLO Objet detection task:

YoloLoss

The YoloLoss is an implementation of the YoloLoss used by the Yolov7 repository.

We expose an interface to the loss that could be configured using the following properties:

from code_loader.helpers.detection.yolo.loss import YoloLoss

LOSS_FN = YoloLoss(num_classes: int, default_boxes: List[NDArray[np.int32]],
                   overlap_thresh: float, background_label: int,
                   features: List[Tuple[int, int]] = [],
                   anchors: Optional[NDArray[np.int32]] = None,
                   from_logits: bool = True, weights: List[float] = [4.0, 1.0, 0.4],
                   max_match_per_gt: int = 10,
                   image_size: Union[Tuple[int, int], int] = (640, 640),
                   cls_w: float = 0.3, obj_w: float = 0.7, box_w: float = 0.05,
                   yolo_match: bool = False):
Args

num_classes

The number of classes in the dataset

default_boxes

A List of NDArray representing the model's grid . see Grid.generate_anchors()

overlap_thresh

The Matcher overlap threshold that sets what constitutes a match. YOLO default is 0.0625

background_label

If no background_label was used during training should be set to NUM_CLASSES+1

features

Only required if yolo_match is True. The size of the predictions heads your model uses [[H1,W1],[H2,W2]...]

anchors

Only required if yolo_match is True. The anchors used in your model

from_logits

True if the model was exported without a sigmoid. False if the model was exported as recommended by us

weights

the weights used to scale the object loss

max_match_per_gt

The number of priors matches per GT. Yolov7 default is 10

Image_size

the size of your images.

cls_w

Classification loss weight

obj_w

The object loss weight

box_w

The regression loss weight

yolo_match

When yolo_match is True we use the same Matcher as YoloV7. When yolo_match is False we use a slightly faster matcher, that approximates the YoloV7 matcher.

This loss has a __call__ method that computes the yolo_loss:

iou_loss, obj_loss, class_loss = 
LOSS_FN(y_true: tf.Tensor, y_pred: Tuple[List[tf.Tensor], List[tf.Tensor]])
Args

y_true

The ground truth encoded into a shape of [MAX_BB,5]. the 5 channels represent [X,Y,W,H,class]

y_pred

A tuple (loc,class) composed of: - loc. A list the size of the number of heads. Each element is of size [Batch,#BB,4]. The channels represent [X,Y,W,H] - class. A list the size of the number of heads. Each element is of size [Batch,#BB,#classes+1]

This returns the three losses of the YOLO repo (IOU loss, Object loss, Classification loss)

Decoder

Since we recommend exporting the model without the NMS and top-k components we need a Decoder model that can serve as a head to filter only the most confident bounding box.

We expose an interface for our default decoder:

from code_loader.helpers.detection.yolo.decoder import Decoder

DECODER = Decoder(self, num_classes: int, background_label: int, top_k: int,
                  conf_thresh: float, nms_thresh: float, max_bb_per_layer: int = 20,
                  max_bb_per_layer: int = 20, max_bb: int = 20)
Args

num_classes

The number of classes in the dataset

background_label

If no background_label was used during training should be set to NUM_CLASSES+1

top_k

The number of BB for the top_k param. Per-layer and per-class.

conf_thresh

A threshold for the confidence. BB with confidence lower with this will not be shown by the decoder

nms_thresh

The NMS threshold for IOU-overlap calculation. see Tensorflow's non_max_suppression IOU supression param for more details.

max_bb_per_layer

The maximum amount of BB selected per layer

max_bb

The maximum amount of BB selected overall

This decoder has a __call__ function that returns a list of the selected bounding_boxes

bounding_boxes = DECODER(loc_data: List[tf.Tensor], conf_data: List[tf.Tensor],
        prior_data: List[NDArray[np.float32]],
         from_logits: bool = True, decoded: bool = False)
Args

loc_data

A list the size of the number of heads. Each element is of size [Batch,#BB,4]. The channels represent [X,Y,W,H]

conf_data

A list the size of the number of heads. Each element is of size [Batch,#BB,#classes+1]

prior_data

a List of NDArray representing the model's grid . see Grid.generate_anchors()

from_logits

True if the model was exported without a sigmoid. False if the model was exported as recommended by us

decoded

True if the model was exported with a decoder, as recommended by us. False otherwise (i.e. the predictions are still relative to anchors and are not in image coordinates)

Grid

This class represents the YOLO priors grid.

from code_loader.helpers.detection.yolo.grid import Grid

BOXES_GENERATOR = Grid(image_size: Tuple[int, int], feature_maps: Tuple[Tuple[int, int], ...],
                       box_sizes: Tuple[Tuple[float, ...], ...], strides: Tuple[int, ...],
                       offset: int)
Args

image_size

the image size we use for inference

feature_maps

the shapes of the model heads ((H1,W1),(H2,W2))..

box_sizes

The shape of the anchors as set in the Yolov7 YAML

strides

The strides that connects the head_size to image_size (IMAGE_SIZE/HEAD_SIZE)

offset

0 if the grid starts from (0,0) as expected by the YOLO repo

This class has a generate_anchors() method that creates the grid used by the loss and decoder.

DEFAULT_BOXES = BOXES_GENERATOR.generate_anchors()

DEFAULT_BOXES is of type List[NDArray[np.float32]]. each entry is an entry sized (#head-BB,4) representing the coordinates for the bounding box located in each head.

Last updated