How to extract and visualize feature value for an arbitrary layer during inference with YOLOv7?

Question:

In my case, I would like to extract and visualize the features output in layers 102, 103, 104 in the following code in cfg/training/yolov7.yaml.

# yolov7 head
head:
  [[-1, 1, SPPCSPC, [512]], # 51
  
   [-1, 1, Conv, [256, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [37, 1, Conv, [256, 1, 1]], # route backbone P4
   [[-1, -2], 1, Concat, [1]],
   
   [-1, 1, Conv, [256, 1, 1]],
   [-2, 1, Conv, [256, 1, 1]],
   [-1, 1, Conv, [128, 3, 1]],
   [-1, 1, Conv, [128, 3, 1]],
   [-1, 1, Conv, [128, 3, 1]],
   [-1, 1, Conv, [128, 3, 1]],
   [[-1, -2, -3, -4, -5, -6], 1, Concat, [1]],
   [-1, 1, Conv, [256, 1, 1]], # 63
   
   [-1, 1, Conv, [128, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [24, 1, Conv, [128, 1, 1]], # route backbone P3
   [[-1, -2], 1, Concat, [1]],
   
   [-1, 1, Conv, [128, 1, 1]],
   [-2, 1, Conv, [128, 1, 1]],
   [-1, 1, Conv, [64, 3, 1]],
   [-1, 1, Conv, [64, 3, 1]],
   [-1, 1, Conv, [64, 3, 1]],
   [-1, 1, Conv, [64, 3, 1]],
   [[-1, -2, -3, -4, -5, -6], 1, Concat, [1]],
   [-1, 1, Conv, [128, 1, 1]], # 75
      
   [-1, 1, MP, []],
   [-1, 1, Conv, [128, 1, 1]],
   [-3, 1, Conv, [128, 1, 1]],
   [-1, 1, Conv, [128, 3, 2]],
   [[-1, -3, 63], 1, Concat, [1]],
   
   [-1, 1, Conv, [256, 1, 1]],
   [-2, 1, Conv, [256, 1, 1]],
   [-1, 1, Conv, [128, 3, 1]],
   [-1, 1, Conv, [128, 3, 1]],
   [-1, 1, Conv, [128, 3, 1]],
   [-1, 1, Conv, [128, 3, 1]],
   [[-1, -2, -3, -4, -5, -6], 1, Concat, [1]],
   [-1, 1, Conv, [256, 1, 1]], # 88
      
   [-1, 1, MP, []],
   [-1, 1, Conv, [256, 1, 1]],
   [-3, 1, Conv, [256, 1, 1]],
   [-1, 1, Conv, [256, 3, 2]],
   [[-1, -3, 51], 1, Concat, [1]],
   
   [-1, 1, Conv, [512, 1, 1]],
   [-2, 1, Conv, [512, 1, 1]],
   [-1, 1, Conv, [256, 3, 1]],
   [-1, 1, Conv, [256, 3, 1]],
   [-1, 1, Conv, [256, 3, 1]],
   [-1, 1, Conv, [256, 3, 1]],
   [[-1, -2, -3, -4, -5, -6], 1, Concat, [1]],
   [-1, 1, Conv, [512, 1, 1]], # 101
   
   [75, 1, RepConv, [256, 3, 1]],   #extract
   [88, 1, RepConv, [512, 3, 1]],   #extract
   [101, 1, RepConv, [1024, 3, 1]], #extract

   [[102,103,104], 1, IDetect, [nc, anchors]],   # Detect(P3, P4, P5)
  ]

Also, the following is the result of printing out the model.

Model(
  (model): Sequential(
    (0): Conv(
      (conv): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (act): SiLU(inplace=True)
    )
    (1): Conv(
      (conv): Conv2d(32, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
      (act): SiLU(inplace=True)
    )
    (2): Conv(
      (conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (act): SiLU(inplace=True)
    )
----------------------------------------------------
    (102): RepConv(
      (act): SiLU(inplace=True)
      (rbr_reparam): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) # extract
    )
    (103): RepConv(
      (act): SiLU(inplace=True)
      (rbr_reparam): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) # extract
    )
    (104): RepConv(
      (act): SiLU(inplace=True)
      (rbr_reparam): Conv2d(512, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) # extract
    )
    (105): IDetect(
      (m): ModuleList(
        (0): Conv2d(256, 21, kernel_size=(1, 1), stride=(1, 1))
        (1): Conv2d(512, 21, kernel_size=(1, 1), stride=(1, 1))
        (2): Conv2d(1024, 21, kernel_size=(1, 1), stride=(1, 1))
      )
      (ia): ModuleList(
        (0): ImplicitA()
        (1): ImplicitA()
        (2): ImplicitA()
      )
      (im): ModuleList(
        (0): ImplicitM()
        (1): ImplicitM()
        (2): ImplicitM()
      )
    )
  )
)

However, I would like to be able to take out features of any layer if possible, as I may need features of layers other than this one.

How can I do this?

I tried to do the extraction and visualization from the Model class in models/yolo.py with reference to https://github.com/ultralytics/yolov5/issues/3089, but could not figure out which code to edit and how.
I tried to do the same with the IDetect class, but could not figure it out either.

Asked By: neg

||

Answers:

You can register a forward hook to the layer(s) in question. Per pytorch documentation, "The hook will be called every time after forward() has computed an output."

Essentially, the forward hook function modifies a variable of global scope that will persist after the layer forward call terminates. You store the output of the layer forward call (by way of the forward hook function) in this variable, and you can then reference it later.

(Explicitly, I believe what happens is that registering the forward hook implicitly changes the nn.module in question to be of global scope such that the value persists after the termination of the function call. See pytorch docs for more on this.)

In any case , the forward hook function needs the following function signature:

hook(module, input, output) -> None or modified output

So, a trivial example would be:

def make_hook(key):
    def hook(model, input, output):
        intermediate_output[key] = output.detach()
    return hook

The outer function itself returns a function, as the input to register_module_forward_hook is a function with the above signature.

Then we can add the forward hook to any module with:

model.<layer_name>.register_forward_hook(make_hook("example_key"))

So, in summary, your code would look something like:

def make_hook(key):
    def hook(model, input, output):
        intermediate_output[key] = output.detach()
    return hook

# define model
model = Yolo5() # I know this is wrong but you didn't include the actual model in your question so this is just an example
intermediate_output = {}


# register hook to as many layers as you want
model.conv4.register_forward_hook("conv4") # same here, I made these layer names up
model.maxpool8.register_forward_hook("maxpool8")


# dummy input
inp = torch.random.rand(1,3,1080,1920)

# forward pass
model(inp)

# reference intermediate_output
intermediate_output["conv4"] # should have the output from this layer stored as value

Do note that because using forward hooks "adds global state" to the module pytorch docs suggest to use this feature only temporarily for debugging purposes and not for persistent solutions. For a longer-term solution you could modify the forward pass of the main model architecture to store these values as intermediate outputs and return all of these values at the end.

Answered By: DerekG

Thanks to @DerekG for helping me figure this out!
The following is the code in yolov7/detect.py after the resolution.
The ----- line indicates the omission of a code.

-------------------------------------------------------------
from utils.plots import plot_one_box, plot_ts_feature_maps # Add plot_ts_feature_maps method
-------------------------------------------------------------
def detect(save_img=False):
-------------------------------------------------------------
    # Load model
    model = attempt_load(weights, map_location=device)  # load FP32 model
    ---------------------------------------------------------------------
    # Set Dataloader
    vid_path, vid_writer = None, None
    if webcam:
        view_img = check_imshow()
        cudnn.benchmark = True  # set True to speed up constant image size inference
        dataset = LoadStreams(source, img_size=imgsz, stride=stride)
    else:
        dataset = LoadImages(source, img_size=imgsz, stride=stride)
    --------------------------------------------------------------------------
    for path, img, im0s, vid_cap in dataset:
        img = torch.from_numpy(img).to(device)
        img = img.half() if half else img.float()  # uint8 to fp16/32
        img /= 255.0  # 0 - 255 to 0.0 - 1.0
        if img.ndimension() == 3:
            img = img.unsqueeze(0)
        ------------------------------------------------------------------
        # Start of postscript

        def make_hook(key):
            def hook(model, input, output):
                intermediate_output[key] = output.detach()
            return hook

        layer_num = 104 # Intermediate layer number
        intermediate_output = {}
        model.model[layer_num].register_forward_hook(make_hook(layer_num))

        # forward pass
        model(img)

        # print feature map shape
        feature_maps = intermediate_output[layer_num]
        print(feature_maps.shape)

        # Outputs a feature map of the intermediate layer
        plot_ts_feature_maps(feature_maps)

        # End of postscript

        t2 = time_synchronized()
        ------------------------------------------------------------------

Also, yolov7/utils/plots.py was added as follows.
Torchshow is a module to visualize Tensor. Here is the official GitHub: https://github.com/xwying/torchshow

-------------------------------------------------------------------------
# Add module
import torchshow as ts
-------------------------------------------------------------------------
# Add plot_ts_feature_maps method at the bottom
def plot_ts_feature_maps(feature_maps):
    import matplotlib
    matplotlib.use('TkAgg')
    feature_maps = feature_maps.to(torch.float32)
    ts.show(feature_maps[0])

As a test, to extract 4 feature maps for the second layer, I changed layer_num = 1 in detect.py and ts.show(feature_maps[0][:4]) in plots.py and ran the following command.

python detect.py --weights yolov7.pt --source inference/images/horses.jpg --device 0 --no-trace

The inference results and feature maps were then output as follows.
inference results
feature map

Answered By: neg