DeepStream Reference Designs - Project Architecture - High Level Design

From RidgeRun Developer Connection
< DeepStream Reference Designs‎ | Project Architecture
Revision as of 12:09, 17 May 2022 by Felizondo (talk | contribs) (Custom Inference Listener)
Jump to: navigation, search


  Index  
Nvidia-preferred-partner-badge-rgb-for-screen.png




The following diagram shows a high-level overview of the system architecture:

Deepstream Reference Designs Architecture


Main Framework

This subsystem encapsulates all the modules that constitute the general framework of the project. The idea is that each component of the framework is independent of a specific technology or application so that these modules can be reused regardless of the context in which this reference design will be used. This main infrastructure is responsible for driving the application state and logic and may be reused in a wide variety of AI-powered applications. These modules need no modification to implement a new application. Within this type of module, there are the following:

Camera Capture

This subsystem is responsible for managing the entities in charge of receiving the information data. The input data can be a live streaming video or a video file. Due to the flexibility of the design, this module does not depend on a specific camera that captures the information, nor does it have a dependency on the transmission protocol used to deliver the data. Regardless of the custom module used, Camera Capture will manage the control flow of information by interacting with the entities called media. Each media abstracts through its interface, any specific implementation, and provides the basic control operations like create, start, stop, and delete.

Camera capture provides customization options when creating media entities. As long as the behavior established by the interface is respected, the media can be based on different frameworks and libraries for handling multimedia, such as GStreamer, OpenCV, among others.

Camera capture is also responsible for managing the possible errors that could appear during the process of data transmission. If a failure is detected during the data transmission process, the module will execute the necessary actions to maintain stable the system operation.

AI Manager

This module is in charge of receiving the information data coming from Camera Capture and executes the AI operations necessary to obtain useful information from the transmitted data. Within its responsibilities, the AI manager interacts with entities called engines, which abstract specific implementations on how to perform AI video analysis to infer business rules, depending on the current context of the application. Each engine provides methods to manage the flow of processed information, through basic operations such as create, start, stop and delete.

The implementation of the engines is based on the DeepStream SDK, which provides a set of tools to perform AI operations during streaming analysis. This DeepStream implementation is part of the project's framework, so the module is reusable and independent of any custom applications that need to use its services.

Although the AI manager contemplates AI operations, he is not responsible for executing actions depending on the information analyzed. The output of this module will be sent to the Action dispatcher component. Even so, the AI Manager does monitor the necessary flow since the engines process the streamings through a component called Inference Listener, and they know how to parse said information to transmit it correctly to the modules that need it, through a component called Inference Parser. Both the Inference Listener and the Inference Parser provide a well-defined interface, so that they abstract their implementations, without depending on any specific technology or protocol to handle such information.

Action Dispatcher

This block is in charge of executing specific actions on the media entities defined by the user, based on the inference information received from the internal engine depending on the inference model used. This block depends on the use of triggers that will determine whether the actions need to be executed or not, based on provided policies that work as filters. In other words, this block manages the results of the policies, that are evaluated by the triggers, and depending on its result will do the actions defined by the user.

Trigger

This component is responsible to check the inference information into policies, which are a kind of filter. If the inference information complies with the policies, this block will execute the actions. Policies, actions, and triggers are set up by the user in the application part allowing custom configuration of the data that will be processed. A trigger is compound for groups of policies and actions, so the stream sources can have different behavior.

Config Parser

This module is responsible for building the configuration that the application will go to use, by loading the project configuration information. The user is in charge of setting up necessary information such as policies, actions, triggers, and source stream information, depending on the desired application behavior.

Decoupling Interfaces

Interface modules are responsible for establishing the connection between framework components and custom application modules. The existence of these boundaries is important to avoid mixing application-specific business logic with common infrastructure. In addition, this design allows the project to decouple and rearrange components, without affecting its functionality.

Custom Application

As seen in the diagram, custom application blocks represent any type of implementation or specific technology that can be included in the system design. The diagram presents some examples, however, the DeepStream Reference Designs project is not limited to those particular modules. The possibility of extending the design and incorporating specific business rules for each application is what gives this project a high degree of flexibility.

Next, it will be explained which modules can include a custom implementation, with code added by the user, and also it is included some examples of what kind of technologies can be used in the construction of these components.

Custom Media

This component represents the entity in charge of transporting the information data received by the camera. The goal is that the Camera Capture module can use every media instance, regardless of how the video frames are being processed. For example, a media could be used in such a way, that receives data through the RTSP protocol, which is capable of receiving and controlling the flow of information received, both audio and video, in a synchronized manner and in real-time. The requirement in this type of communication is to establish a connection between the host server, in charge of transmitting the information, and the media that capture it. For instance, in a parking system, this type of camera represents a popular choice due to its flexibility to integrate with CCTV systems.

In addition, if specific cameras are used with well-defined interfaces according to the hardware they present, the user can add a custom media that is compatible, such as GigE Vision, which is used for the transmission of video in high-performance industrial cameras that can be used in applications that require strict monitoring such as the aerospace industry. Another option is the MIPI Camera Serial Interface (CSI), which is an interface widely used in embedded systems for communication between digital cameras and target processors, to run tasks on the edge. As long as the media module interface is respected, there are many possibilities of specific implementations that can be used to process the video frames, without affecting the behavior of the general Camera Capture module.

Custom Inference Listener

The inference listener component is responsible for transmitting the information metadata, which is obtained at the output of the neural networks used in the DeepStream pipeline, after executing the inference process according to the specific targets of the application. If we are in a parking lot application, the inference metadata will contain information about the detected vehicle and its respective license plate. If the application corresponds to a security system in a shopping center, the inference results contain essential data about the people detected around the areas of interest in the shopping center.

Therefore, this component has the task of transmitting the metadata that is being obtained in real-time. To achieve this goal, the user can make use of message brokers, which already have well-defined implementations and interfaces to transmit data through the so-called publisher-subscriber pattern. An example of this type of component could be RabbitMQ, which allows establishing communication through a local server, and has developed plugins that can be integrated with the DeepStream framework. However, since DeepStream is based on the GStreamer framework, the user is free to create their custom inference listener element, which can be added to the media pipelines and is capable of obtaining the inferred information.

Custom Inference Parser

Custom Policy

Custom Action

Custom Config

For instance, take a parking lot application. In that scenario, you could use RTSP cameras and perform inference over the captured images and receive the predictions through the MQTT message broker. These notifications trigger the Parking Lot Business Rules which evaluate the current system state and decide if the different actions should be triggered.

Now imagine you wanted to implement a shoplifting detection system. This time a GigE camera is used, however, no modification is needed in the camera capture module since it is protected by the interface. Predictions are made using a different model, which is then sent to the Shoplifting Business Rules. As you can see, there are a lot of possibilities to leverage this reference design as a starting point for your specific project.



  Index