FPGA Image Signal Processor - Examples - Custom Accelerator

Introduction

This wiki provides an example of how to create a custom accelerator by combining modules from the V4L2 FPGA project and the FPGA ISP project in order to create more complex pipelines. Vivado HLS provides a strong framework to create modules by using data flow programming style and pipelining, which boosts the accelerator throughput. By using this framework, the penalty of adding new elements to the pipeline is only on latency but keeping the maximum frame rate of the slowest element.

The user is also highly encouraged to add their own modules to the pipeline, facilitating the job of adding custom peripherals to the FPGA which can help to either receive data or perform some actions on actuators. Some examples:

Cameras connected directly to the FPGA on custom hardware in applications purely industrial.
Custom devices with sensors and actuators, where some actions are taken according to the image: color histogram under a threshold, illumination adjustment, monitoring the quality of a production line, and defects detection.
Artificial Intelligence on the edge and with low power consumption.

In this wiki, you will found some useful hints to describe (or code) your own accelerator with V4L2 FPGA and FPGA ISP.

Work flow

V4L2 FPGA offers a wrapper to make any accelerator V4L2 compliant, generating a pair of video devices (capture/output) accessible to the user by means of GStreamer or any other application which can use V4L2. The accelerator development workflow is composed of four basic steps:

1. Write a cpp file with the accelerator source code.

2. Write a header file with the declaration of the function defined in the source code.

3. Write another cpp including the accelerator function to the wrapper.

4. Write a Makefile to integrate the custom accelerator to the building system.

There are some optional steps for creating a testbench for testing the accelerator for debugging purposes. The flow is the same as a normal project in Vivado HLS.

A project skeleton is provided in the following lines.

Source code

The source code is mainly composed of the headers of the modules used in the project and a function to be called from the wrapper. You can have also some internal functions, which are declared as statics since they shouldn't be accessible from outside.

 1 /*
 2  * My custom accelerator
 3  * Author: John Doe
 4  *
 5  * This accelerator is a skeleton
 6  */
 7 
 8 /* Provides the pixel formats in FOURCC format */
 9 #include <core/rr_formats.hpp>
10 /* Provides the data types for streams, RAM ports, counters, custom registers */
11 #include <core/rr_types.hpp>
12 /* Your includes here*/
13 #include "custom_accelerator.hpp"
14 
15 /* Pixels Per Clock - ARGB32 -> 2 */
16 const int kPPC = 2; 
17 
18 void my_custom_function(RR_stream_port& out_stream, RR_stream_port& in_stream,
19                         RR_dim_counter_type width, RR_dim_counter_type height,
20                         RR_format_type in_format) {
21 
22   /* Counters for current row and column */
23   RR_dim_counter_type row;
24   RR_dim_counter_type col;
25 
26   /* Data packages used by the streams */
27   RR_stream_t input, output;
28 
29 my_custom_function_rows_loop:
30   for (row = 0; row < in_height; row++) {
31 #pragma HLS PIPELINE
32   my_custom_functions_cols_loop:
33     for (col = 0; col < in_width; col += kPPC) {
34 #pragma HLS LOOP_FLATTEN off
35       /* Query a package from the input stream */
36       input = in_stream.read();
37 
38       /* Copy the flags */
39       output.data = input.data; /* Current data <uint64_t>*/
40       output.last = input.last; /* Last pixel flag <bool> */
41       output.keep = input.keep; /* Valid pixels <uint8_t> */
42 
43       /* Push the package to the output stream */
44       out_stream.write(output);
45 
46       /* Early stop - To control if there is an interruption */
47       if (output.last) {
48         return;
49       }
50     }
51   }
52 }

In this skeleton, the accelerator only handles symmetric data: the image from the output has the same format as the input. The line 13 includes the header file where the accelerator function is declared. Line 24 are the packages send back and forth through the streams. It is an structure with the following members:

typedef struct {
  ap_uint<64> data; /* Holds the pixel package */
  ap_uint<1> last;  /* Holds the flag used to indicate the end of stream */
  ap_uint<8> keep;  /* Holds the flags to enable or disable specific bytes */
} RR_stream_t;

You can convert the member data to uint64_t if needed, but it is highly recommended to use the hardware data types.

The labels in lines 26 & 29 are intended for debugging purposes since the tool takes these labels to indicate the resources and latency taken by those for loops. The pragmas within the code are for optimization purposes. #pragma HLS PIPELINE enhances the execution of functions, splitting the loop in stages, maximizing the throughput. The #pragma HLS LOOP_FLATTEN off is actually optional but indicates the tool that the for loops should not be flattened in one single loop. In this case, it is not recommended to flatten the loops since they are not perfect loops (with a fixed number of iterations).

From lines 32 to 46, the logic of the accelerator begins, getting a one-pixel package from the input stream, performing some operations, and pushing a new package into the output stream. The term pixel package stands for the capability of containing more than one pixel. The data package is 64-bits and, in the case of a GRAY8 format, it will hold up to 8 pixels in a single transfer.

Header code

The header file just contains the function declaration

1 #pragma once
2 
3 #include "core/rr_types.hpp"
4 
5 void my_custom_function(RR_stream_port& out_stream, RR_stream_port& in_stream,
6                         RR_dim_counter_type width, RR_dim_counter_type height,
7                         RR_format_type in_format);

The first line is a header guard.

Wrapper code

The wrapper is composed of basically three components:

1. Input image format capability

2. Output image format capability

3. The wrapper module

The two first components are intended to give information to V4L2 about the device capabilities, such as supported formats and image dimensions. The third one will work as the top module of the design, which will call the custom function described above.

  1 /* Include the wrapper files */
  2 #include "core/rr_formats.hpp"
  3 #include "core/rr_types.hpp"
  4 #include "core/rr_wrapper.hpp"
  5 
  6 /* Include the custom accelerator*/
  7 #include "custom_accelerator.hpp"
  8 
  9 /* Accelerator-specific defaults */
 10 const RR_format_type kDefaultOutputFormat = RR_RGB_ARGB32_FMT;
 11 
 12 /**
 13  * Set the accelerator input image properties
 14  * This sets the properties of the input image and capabilities of the
 15  * accelerator. The input image should accomplish with the properties
 16  * established here.
 17  */
 18 RR_SET_INPUT_PROPERTIES {
 19   /* Set the dimension limits. By default: min: 8x8, max: 4096x2160 */
 20   SET_WIDTH_MIN(RR_MIN_WIDTH_RESOLUTION);
 21   SET_WIDTH_MAX(RR_MAX_WIDTH_RESOLUTION);
 22 
 23   SET_HEIGHT_MIN(RR_MIN_HEIGHT_RESOLUTION);
 24   SET_HEIGHT_MAX(RR_MAX_HEIGHT_RESOLUTION);
 25 
 26   /* Set the number of formats, which will be specified below */
 27   SET_NUM_FORMATS(1);
 28 
 29   /* Set the supported formats. In this case, RGB32 */
 30   ADD_SUPPORTED_FORMAT(0, kDefaultOutputFormat);
 31 }
 32 
 33 /**
 34  * Set the accelerator output image properties
 35  * This sets the properties of the output image and capabilities of the
 36  * accelerator. The output image will accomplish with the properties
 37  * established here. The accelerator will not send a image with an unsupported
 38  * property.
 39  */
 40 RR_SET_OUTPUT_PROPERTIES {
 41   /* Set the dimension limits. By default: min: 8x8, max:4096x2160 */
 42   SET_WIDTH_MIN(RR_MIN_WIDTH_RESOLUTION);
 43   SET_WIDTH_MAX(RR_MAX_WIDTH_RESOLUTION);
 44 
 45   SET_HEIGHT_MIN(RR_MIN_HEIGHT_RESOLUTION);
 46   SET_HEIGHT_MAX(RR_MAX_HEIGHT_RESOLUTION);
 47 
 48   /* Set the number of formats, which will be specified below */
 49   SET_NUM_FORMATS(1);
 50 
 51   /* Set the supported formats. The output will be a RGB32 */
 52   ADD_SUPPORTED_FORMAT(0, kDefaultOutputFormat);
 53 }
 54 
 55 /**
 56  * Main module - wrapper
 57  */
 58 RR_MODULE(out_stream, in_stream) {
 59   /*
 60    * Read the properties
 61    * This properties needs to be validated afterwards
 62    */
 63   RR_dim_counter_type width = GET_INPUT_WIDTH;
 64   RR_dim_counter_type height = GET_INPUT_HEIGHT;
 65   RR_format_type format_in = GET_INPUT_FORMAT;
 66   RR_format_type format_out = GET_OUTPUT_FORMAT;
 67 
 68   /*
 69    * Validate and set defaults
 70    * In case of an invalid value, override it and set a default value.
 71    */
 72   if (width < RR_MIN_WIDTH_RESOLUTION || width > RR_MAX_WIDTH_RESOLUTION) {
 73     width = RR_DEFAULT_WIDTH_RESOLUTION;
 74     SET_INPUT_WIDTH(RR_DEFAULT_WIDTH_RESOLUTION);
 75   }
 76   if (height < RR_MIN_HEIGHT_RESOLUTION || height > RR_MAX_HEIGHT_RESOLUTION) {
 77     height = RR_DEFAULT_HEIGHT_RESOLUTION;
 78     SET_INPUT_HEIGHT(RR_DEFAULT_HEIGHT_RESOLUTION);
 79   }
 80   if (format_in != kDefaultInputFormat) {
 81     format_in = kDefaultInputFormat;
 82     SET_INPUT_FORMAT(kDefaultInputFormat);
 83   }
 84   if (format_out != kDefaultOutputFormat) {
 85     format_out = kDefaultOutputFormat;
 86     SET_OUTPUT_FORMAT(kDefaultOutputFormat);
 87   }
 88 
 89   /* Mirror properties to the output. The image preserves its dimensions */
 90   SET_OUTPUT_WIDTH(width);
 91   SET_OUTPUT_HEIGHT(height);
 92 
 93   /* The accelerator type is a filter, since we have input/output streams */
 94   SET_ACCELERATOR_TYPE(RR_ACCELERATOR_FILTER);
 95 
 96   /*
 97    * For register set operation only
 98    * This avoids the accelerator to get stuck during registers initialization
 99    */
100   if (in_stream.empty()) {
101     return;
102   }
103 
104   /* Call your custom accelerator, which will be composed by a daisy chain */
105   my_custom_function(out_stream, in_stream, width, height, format_in);
106 }

The setup sections written in lines 18-31 and 40-53 are more likely to be the same in many accelerators that support only one format. If more formats are needed, the number of supported formats shall be specified by SET_NUM_FORMATS and the formats are listed by using ADD_SUPPORTED_FORMAT(INDEX, FORMAT_FOURCC). The index of the last instructions starts from 0.

The module macro RR_MODULE(out_stream, in_stream) is perhaps the most important part to guarantee the compatibility. It contains several macros, which can be found in core/rr_wrapper.hpp, to query information from the accelerator registers, such as the current height, width, format, etc. In this case, all the validation of the properties shall be done here. The query of the values is done in lines 63-66. You can use the variables declared as auto to avoid handling the types. In lines 72-91, the verification and value overwriting is taking place. To overwrite a value, you can use the SET_INPUT_X macros, defined in the wrapper header.

line 94 specifies the kind of accelerator. It can be a filter, frame grabber, and sink, depending on the application. It will indicate if the input/output is valid or not. Lines 100-102 are a guard for executing the accelerator without valid data, which is usually the case when setting the accelerator up, and avoids that it gets stuck waiting for a package.

Finally, line 105 invokes the function described by the user.

Makefile

The most generic Makefile places under fpga-isp defaults location is

1 # Add your sources below
2 ACCELERATOR_SRCS = $(wildcard *.[ch]pp)
3 
4 # Define your project settings below
5 COMPONENT_NAME = rr_custom_isp_example
6 PROJECT_NAME = Custom ISP example (RidgeRun FPGA ISP)
7 
8 include ../../../../../Rules.mk
9 include ../../core/Xilinx.mk

It may happen you will need to include also those modules which are not templated. It can be done with appending the sources by using ACCELERATOR_SRCS += $(wildcard ../my_module/*.[ch]pp).

The final step is to run make in the accelerator's location to synthesize it.

Importing other modules

Importing a module is quite similar to importing a function in C++: adding the header and use the function. Also, in case that it is not a template function, the source code should be added for synthesis, as it is normally done for software compilation. Nevertheless, there is one trick more: using a pragma for establishing a data flow. Summarizing, it is composed by three steps:

1. Add the header file and use the function

2. Add the source code to the build system

3. Implement a data flow

Adding the header

It is like adding any source code in C++. You can see the list of available modules in Modules. By default, the build system interprets the platform as the include directory (src/hdl/$PLATFORM/). It means that V4L2 FPGA files will be:

core
convolution
demo

On the other hand, the modules which belong to FPGA ISP will be accessible through the fpga-isp folder:

fpga-isp/convolution
fpga-isp/fft
fpga-isp/utils
fpga-isp/gtu
etc

For example, for including the fft:

 1 #include <complex>
 2 #include <cstdint>
 3 
 4 /* Include header */
 5 #include <fpga-isp/fft/fft_1d.hpp>
 6 
 7 /* Inside of the top function */
 8 const int FFT_SIZE = 64;
 9 const int MODE = 1; /* For forward FFT */
10 rr::isp::fft_1d_complex<FFT_SIZE, float>(in, out, MODE);

For more information about the FFT, please visit FPGA_Image_Signal_Processor/Modules/Fast_Fourier_Transform_1D.

Add the source code to the build system

This is done by editing the Makefile

 1 # Add your sources below
 2 ACCELERATOR_SRCS = $(wildcard *.[ch]pp)
 3 # Add module sources
 4 ACCELERATOR_SRCS = $(wildcard ../gtu/*.[ch]pp)
 5 
 6 
 7 # Define your project settings below
 8 COMPONENT_NAME = rr_custom_isp_example
 9 PROJECT_NAME = Custom ISP example (RidgeRun FPGA ISP)
10 
11 include ../../../../../Rules.mk
12 include ../../core/Xilinx.mk

Line 4 tells the build system to add the files of the GTU if the custom accelerator is placed following the common structure: src/hdl/$PLATFORM/fpga-isp/my_accelerator/. It is important to notice that most of the header files and source codes are ended by .hpp and .cpp. The extensions .hh and .cc are reserved for testbench codes.

Implement the data flow

Allowing the accelerator to execute in a data flow is the best and only enabled mode for streams. To use it, you will need to declare stream ports by using the #pragma HLS STREAM and place a data flow region within the top function of your custom accelerator by using #pragma HLS DATAFLOW. Using the dataflow is only applicable when using more than one module in the daisy chain.

To declare a stream variable:

1 static RR_stream_port my_stream;
2 #pragma HLS STREAM variable = my_stream dim = 1 depth = 1

The static keyword allow the FIFO to persist in the RTL co-simulation. Also, it will limit the execution of the code. Be careful with streams or multiple modules, since the simulation is done in pure C++.

To create the dataflow:

 1 #pragma HLS DATAFLOW
 2   /* In stream is the input stream port */
 3   debayer(my_stream_1, in_stream, width, height, in_format);
 4   /* Split the channels for convolution on each channel */
 5   split<ARGB32>(splitted_a, splitted_r, splitted_g, splitted_b, my_stream_1,
 6                 width, height);
 7   /* Apply blur filter. A per-channel convolution is needed on color images */
 8   convolution<kKernelDiameter, kKernelRadius>(convoluted_a, splitted_a,
 9                                               kGaussKernel, width, height);
10   convolution<kKernelDiameter, kKernelRadius>(convoluted_r, splitted_r,
11                                               kGaussKernel, width, height);
12   convolution<kKernelDiameter, kKernelRadius>(convoluted_g, splitted_g,
13                                               kGaussKernel, width, height);
14   convolution<kKernelDiameter, kKernelRadius>(convoluted_b, splitted_b,
15                                               kGaussKernel, width, height);
16   /* Join channels back */
17   merge<ARGB32>(out_stream, convoluted_a, convoluted_r, convoluted_g,
18                 convoluted_b, width, height);

Notice that the stream ports are generally the first arguments of each module, where the firsts are likely to be the outputs. Another consideration is that the output of one module is the input of the following. For more information, we encourage you to visit HLS Dataflow.

Example

You can combine the modules listed above to create a high-throughput ISP pipeline, performing operations on the whole image or by single channels. Let’s go for an example.

The Daisy Chain example accelerator is composed by the following modules:

Error creating thumbnail: Unable to save thumbnail to destination

Figure 1. Daisy Chain example diagram

To describe this accelerator, it is possible to code the whole accelerator is as follows:

File: custom_isp.cpp: Source code

#include "custom_isp.hpp"

#include "convolution/convolution.hpp" /* Provides lightweight convolution<>() */
#include "core/rr_types.hpp"
#include "fpga-isp/awb/awb.hpp"                  /* Provides awb() */
#include "fpga-isp/debayer/debayer.hpp"          /* Provides debayer() */
#include "fpga-isp/utils/channel_operations.hpp" /* Provides merge() and split() */
#include "fpga-isp/utils/formats.hpp"            /* Provides ARGB32 */

/**
 * @brief Daisy chaining FPGA ISP modules example
 * @details This is an example about how to interconnect the FPGA ISP modules
 * to get a more complex and powerful ISP pipeline.
 *
 * It is important to recall that a module is the function which is imported by
 * a RR_MODULE instance. The RR_MODULE as a whole produces an accelerator.
 * An accelerator, therefore, can be a group of modules.
*/
void custom_isp_example(RR_stream_port& out_stream, RR_stream_port& in_stream,
                        RR_dim_counter_type width, RR_dim_counter_type height,
                        RR_format_type in_format) {
  using namespace rr::isp;     /* Provides debayer, awb */
  using namespace rr::formats; /* Provides ARGB32 */
  using namespace rr::utils;   /* Provides split, merge */

  /* These constants are required by this module */
  const int kKernelDiameter = 3;
  const int kKernelRadius = 1;

  /*
   * Declare all the streams needed by this module. The dimensionality is one
   * dimension given we are transmiting the pixels in bunches of 64 bits
   * (defined by the bus width). The depth is set as the number of register of
   * the FIFO. For this case, we only need to store one element. It works
   * similar to a queue in GStreamer.
  */
  static RR_stream_port demosaiced;
#pragma HLS STREAM variable = demosaiced dim = 1 depth = 1
  static RR_stream_port splitted_a, splitted_r, splitted_g, splitted_b;
#pragma HLS STREAM variable = splitted_a dim = 1 depth = 1
#pragma HLS STREAM variable = splitted_r dim = 1 depth = 1
#pragma HLS STREAM variable = splitted_g dim = 1 depth = 1
#pragma HLS STREAM variable = splitted_b dim = 1 depth = 1
  static RR_stream_port convoluted_a, convoluted_r, convoluted_g, convoluted_b;
#pragma HLS STREAM variable = convoluted_a dim = 1 depth = 1
#pragma HLS STREAM variable = convoluted_r dim = 1 depth = 1
#pragma HLS STREAM variable = convoluted_g dim = 1 depth = 1
#pragma HLS STREAM variable = convoluted_b dim = 1 depth = 1
  static RR_stream_port argb_joined;
#pragma HLS STREAM variable = argb_joined dim = 1 depth = 1

  /* Gaussian Blur kernel. This kernel is used by the convolution stage */
  kernel_type kGaussKernel[kKernelDiameter][kKernelDiameter] = {
      {1024, 2048, 1024}, {2048, 4096, 2048}, {1024, 2048, 1024}};
#pragma HLS ARRAY_PARTITION variable = kGaussKernel complete dim = 0

#pragma HLS DATAFLOW
  /* Convert from BAYER to ARGB */
  debayer(demosaiced, in_stream, width, height, in_format);
  /* Split the channels for convolution on each channel */
  split<ARGB32>(splitted_a, splitted_r, splitted_g, splitted_b, demosaiced,
                width, height);
  /* Apply blur filter. A per-channel convolution is needed on color images */
  convolution<kKernelDiameter, kKernelRadius>(convoluted_a, splitted_a,
                                              kGaussKernel, width, height);
  convolution<kKernelDiameter, kKernelRadius>(convoluted_r, splitted_r,
                                              kGaussKernel, width, height);
  convolution<kKernelDiameter, kKernelRadius>(convoluted_g, splitted_g,
                                              kGaussKernel, width, height);
  convolution<kKernelDiameter, kKernelRadius>(convoluted_b, splitted_b,
                                              kGaussKernel, width, height);
  /* Join channels back */
  merge<ARGB32>(argb_joined, convoluted_a, convoluted_r, convoluted_g,
                convoluted_b, width, height);
  /* Auto White Balancing */
  awb(out_stream, argb_joined, width, height);

  /* That's all */
}

File custom_isp.hpp: Header file

#pragma once

#include "core/rr_types.hpp"

void custom_isp_example(RR_stream_port& out_stream, RR_stream_port& in_stream,
                        RR_dim_counter_type width, RR_dim_counter_type height,
                        RR_format_type in_format);

File custom_isp_core.cpp: Wrapper file

#include "core/rr_formats.hpp"
#include "core/rr_wrapper.hpp"

#include "custom_isp.hpp"

/* Accelerator-specific defaults */
const RR_format_type kDefaultOutputFormat = RR_RGB_ARGB32_FMT;
const RR_format_type kDefaultInputFormat = RR_BAYER_SRGGB8_FMT;
/* Other parameters such as width/height are defaulted by the wrapper */

/**
 * Set the accelerator input image properties
 * This sets the properties of the input image and capabilities of the
 * accelerator. The input image should accomplish with the properties
 * established here.
 */
RR_SET_INPUT_PROPERTIES {
  /* Set the dimension limits. By default: min: 8x8, max:4096x2160 */
  SET_WIDTH_MIN(RR_MIN_WIDTH_RESOLUTION);
  SET_WIDTH_MAX(RR_MAX_WIDTH_RESOLUTION);

  SET_HEIGHT_MIN(RR_MIN_HEIGHT_RESOLUTION);
  SET_HEIGHT_MAX(RR_MAX_HEIGHT_RESOLUTION);

  /* Set the number of formats, which will be specified below */
  SET_NUM_FORMATS(4);

  /* Set the supported formats. In this case, all the Bayer 8 */
  ADD_SUPPORTED_FORMAT(0, RR_BAYER_SRGGB8_FMT);
  ADD_SUPPORTED_FORMAT(1, RR_BAYER_SBGGR8_FMT);
  ADD_SUPPORTED_FORMAT(2, RR_BAYER_SGBRG8_FMT);
  ADD_SUPPORTED_FORMAT(3, RR_BAYER_SGRBG8_FMT);
}

/**
 * Set the accelerator output image properties
 * This sets the properties of the output image and capabilities of the
 * accelerator. The output image will accomplish with the properties
 * established here. The accelerator will not send a image with an unsupported
 * property.
 */
RR_SET_OUTPUT_PROPERTIES {
  /* Set the dimension limits. By default: min: 8x8, max:4096x2160 */
  SET_WIDTH_MIN(RR_MIN_WIDTH_RESOLUTION);
  SET_WIDTH_MAX(RR_MAX_WIDTH_RESOLUTION);

  SET_HEIGHT_MIN(RR_MIN_HEIGHT_RESOLUTION);
  SET_HEIGHT_MAX(RR_MAX_HEIGHT_RESOLUTION);

  /* Set the number of formats, which will be specified below */
  SET_NUM_FORMATS(1);
  /* Set the supported formats. The output will be a debayered image */
  ADD_SUPPORTED_FORMAT(0, kDefaultOutputFormat);
}

/**
 * Main module - wrapper
 */
RR_MODULE(out_stream, in_stream) {
  /*
   * Read the properties
   * This properties needs to be validated afterwards
   */
  RR_dim_counter_type width = GET_INPUT_WIDTH;
  RR_dim_counter_type height = GET_INPUT_HEIGHT;
  RR_format_type format_in = GET_INPUT_FORMAT;
  RR_format_type format_out = GET_OUTPUT_FORMAT;

  /*
   * Validate and set defaults
   * In case of an invalid value, override it and set a default value.
   *
   * Use the macros with const values for avoiding WAR conditions and
   * unexpected behaviours.
   */
  if (width < RR_MIN_WIDTH_RESOLUTION || width > RR_MAX_WIDTH_RESOLUTION) {
    width = RR_DEFAULT_WIDTH_RESOLUTION;
    SET_INPUT_WIDTH(RR_DEFAULT_WIDTH_RESOLUTION);
  }
  if (height < RR_MIN_HEIGHT_RESOLUTION || height > RR_MAX_HEIGHT_RESOLUTION) {
    height = RR_DEFAULT_HEIGHT_RESOLUTION;
    SET_INPUT_HEIGHT(RR_DEFAULT_HEIGHT_RESOLUTION);
  }
  if (format_in != RR_BAYER_SBGGR8_FMT && format_in != RR_BAYER_SRGGB8_FMT &&
      format_in != RR_BAYER_SGRBG8_FMT && format_in != RR_BAYER_SGBRG8_FMT) {
    format_in = kDefaultInputFormat;
    SET_INPUT_FORMAT(kDefaultInputFormat);
  }
  if (format_out != kDefaultOutputFormat) {
    format_out = kDefaultOutputFormat;
    SET_OUTPUT_FORMAT(kDefaultOutputFormat);
  }

  /* Mirror properties to the output. The image preserves its dimensionality */
  SET_OUTPUT_WIDTH(width);
  SET_OUTPUT_HEIGHT(height);

  /* The accelerator type is a filter, since we have input/output streams */
  SET_ACCELERATOR_TYPE(RR_ACCELERATOR_FILTER);

  /*
   * For register set operation only
   * This avoids the accelerator to get stuck during registers initialisation
   */
  if (in_stream.empty()) {
    return;
  }

  /* Call your custom accelerator, which will be composed by a daisy chain */
  custom_isp_example(out_stream, in_stream, width, height, format_in);
}

Makefile: Build system

# Add your sources below
ACCELERATOR_SRCS = $(wildcard *.[ch]pp)
ACCELERATOR_SRCS += $(wildcard ../awb/awb.cpp)
ACCELERATOR_SRCS += $(wildcard ../debayer/debayer.cpp)

# Define your project settings below
COMPONENT_NAME = rr_custom_isp_example
PROJECT_NAME = Custom ISP example (RidgeRun FPGA ISP)

include ../../../../../Rules.mk
include ../../core/Xilinx.mk

Disclaimer: This code is only for illustration purposes and it is not guaranteed to work. You can find the complete version of this example when purchasing the product. It will include the custom_isp_example accelerator with this example.

About the throughput

This is a really good question to demonstrate the power of FPGA ISP. Thanks to the pragma HLS DATAFLOW, it is possible to have concurrent execution of all the modules. This means that, when one of the modules has a pixel already computed, the following can start performing operations on that pixel. This execution can be also seen as a pipelining approach.

Error creating thumbnail: Unable to save thumbnail to destination

Figure 2 - Concurrent execution model

Figure 2 shows how each stage of the custom accelerator executes with the time (clock cycles are drawn for illustrative purposes). The result’s latency will be determined by the slowest stage of the pipeline. For a real reference, the slowest part is the Debayer process, giving a performance of up to 30 frames-per-second at 1080p. To increase the throughput of the accelerator, the slowest stage has to be optimized. In this case, adding new stages to the pipeline won’t affect the performance if they don’t have a slower time than the current slowest part. Thus, the debayer determines the throughput of the whole accelerator, which at the end, executes at 30 frames-per-second at 1080p as well.

Another feature to highlight is the pixel-package model, where each module encapsulates pixels in 64-bit transfers. For 8-bit transmissions, 8 pixels are transferred per clock cycle, whereas for ARGB 32, the transfer rate is 2 pixels per clock cycle.

Previous: Examples/FFMPEG

Index

Next: GStreamer_Pipelines

FPGA Image Signal Processor - Examples - Custom Accelerator

Contents

Introduction

Work flow

Source code

Header code

Wrapper code

Makefile

Importing other modules

Adding the header

Add the source code to the build system

Implement the data flow

Example

About the throughput

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Useful Links

Legal

Services

Tools