YOLO
Utils for the Yolo Object Detection task
This package contains multiple helpers and methods for the YOLO Objet detection task:
YoloLoss
The YoloLoss is an implementation of the YoloLoss used by the Yolov7 repository.
We expose an interface to the loss that could be configured using the following properties:
Args | |
---|---|
num_classes | The number of classes in the dataset |
default_boxes | A List of NDArray representing the model's grid . see Grid.generate_anchors() |
overlap_thresh | The Matcher overlap threshold that sets what constitutes a match. YOLO default is 0.0625 |
background_label | If no background_label was used during training should be set to NUM_CLASSES+1 |
features | Only required if yolo_match is True. The size of the predictions heads your model uses [[H1,W1],[H2,W2]...] |
anchors | Only required if yolo_match is True. The anchors used in your model |
from_logits | True if the model was exported without a sigmoid. False if the model was exported as recommended by us |
weights | the weights used to scale the object loss |
max_match_per_gt | The number of priors matches per GT. Yolov7 default is 10 |
Image_size | the size of your images. |
cls_w | Classification loss weight |
obj_w | The object loss weight |
box_w | The regression loss weight |
yolo_match | When yolo_match is True we use the same Matcher as YoloV7. When yolo_match is False we use a slightly faster matcher, that approximates the YoloV7 matcher. |
This loss has a __call__
method that computes the yolo_loss:
Args | |
---|---|
y_true | The ground truth encoded into a shape of [MAX_BB,5]. the 5 channels represent [X,Y,W,H,class] |
y_pred | A tuple (loc,class) composed of: - loc. A list the size of the number of heads. Each element is of size [Batch,#BB,4]. The channels represent [X,Y,W,H] - class. A list the size of the number of heads. Each element is of size [Batch,#BB,#classes+1] |
This returns the three losses of the YOLO repo (IOU loss, Object loss, Classification loss)
Decoder
Since we recommend exporting the model without the NMS
and top-k
components we need a Decoder model that can serve as a head to filter only the most confident bounding box.
We expose an interface for our default decoder:
Args | |
---|---|
num_classes | The number of classes in the dataset |
background_label | If no background_label was used during training should be set to NUM_CLASSES+1 |
top_k | The number of BB for the top_k param. Per-layer and per-class. |
conf_thresh | A threshold for the confidence. BB with confidence lower with this will not be shown by the decoder |
nms_thresh | The NMS threshold for IOU-overlap calculation. see Tensorflow's non_max_suppression IOU supression param for more details. |
max_bb_per_layer | The maximum amount of BB selected per layer |
max_bb | The maximum amount of BB selected overall |
This decoder has a __call__
function that returns a list of the selected bounding_boxes
Args | |
---|---|
loc_data | A list the size of the number of heads. Each element is of size [Batch,#BB,4]. The channels represent [X,Y,W,H] |
conf_data | A list the size of the number of heads. Each element is of size [Batch,#BB,#classes+1] |
prior_data | a List of NDArray representing the model's grid . see Grid.generate_anchors() |
from_logits | True if the model was exported without a sigmoid. False if the model was exported as recommended by us |
decoded | True if the model was exported with a decoder, as recommended by us. False otherwise (i.e. the predictions are still relative to anchors and are not in image coordinates) |
Grid
This class represents the YOLO priors grid.
Args | |
---|---|
image_size | the image size we use for inference |
feature_maps | the shapes of the model heads ((H1,W1),(H2,W2)).. |
box_sizes | The shape of the anchors as set in the Yolov7 YAML |
strides | The strides that connects the head_size to image_size (IMAGE_SIZE/HEAD_SIZE) |
offset | 0 if the grid starts from (0,0) as expected by the YOLO repo |
This class has a generate_anchors()
method that creates the grid used by the loss and decoder.
DEFAULT_BOXES is of type List[NDArray[np.float32]]. each entry is an entry sized (#head-BB,4) representing the coordinates for the bounding box located in each head.
Last updated