DeepStream Reference Designs - Project Architecture - High Level Design

From RidgeRun Developer Connection
Jump to: navigation, search


Previous: Project Architecture Index Next: Reference Designs
Nvidia-preferred-partner-badge-rgb-for-screen.png




After reading this page you will get a general notion of the main functionality of this project and how it is composed from a design point of view.


The following diagram shows a high-level overview of the system architecture:

Deepstream Reference Designs Architecture

Main Framework

This subsystem encapsulates all the modules that constitute the general framework of the project. The idea is that each component of the framework is independent of a specific technology or application so that these modules can be reused regardless of the context in which this reference design will be used. This main infrastructure is responsible for driving the application state and logic and may be reused in a wide variety of AI-powered applications. These modules need no modification to implement a new application. Within this type of module, there are the following:

Camera Capture

This subsystem is responsible for managing the entities in charge of receiving the information data. The input data can be a live streaming video or a video file. Due to the flexibility of the design, this module does not depend on a specific camera that captures the information, nor does it have a dependency on the transmission protocol used to deliver the data. Regardless of the custom module used, Camera Capture will manage the control flow of information by interacting with the entities called media. Each media abstracts through its interface, any specific implementation, and provides the basic control operations like create, start, stop, and delete.

Camera capture provides customization options when creating media entities. As long as the behavior established by the interface is respected, the media can be based on different frameworks and libraries for handling multimedia, such as GStreamer, OpenCV, among others.

Camera capture is also responsible for managing the possible errors that could appear during the process of data transmission. If a failure is detected during the data transmission process, the module will execute the necessary actions to maintain stable the system operation.

AI Manager

This module is in charge of receiving the information data coming from Camera Capture and executes the AI operations necessary to obtain useful information from the transmitted data. Within its responsibilities, the AI manager interacts with entities called engines, which abstract specific implementations on how to perform AI video analysis to infer business rules, depending on the current context of the application. Each engine provides methods to manage the flow of processed information, through basic operations such as create, start, stop and delete.

The implementation of the engines is based on the DeepStream SDK, which provides a set of tools to perform AI operations during streaming analysis. This DeepStream implementation is part of the project's framework, so the module is reusable and independent of any custom applications that need to use its services.

Although the AI manager contemplates AI operations, he is not responsible for executing actions depending on the information analyzed. The output of this module will be sent to the Action Dispatcher component. Even so, the AI Manager does monitor the necessary flow since the engines process the streamings through a component called Inference Listener, and they know how to parse said information to transmit it correctly to the modules that need it, through a component called Inference Parser. Both the Inference Listener and the Inference Parser provide a well-defined interface, so that they abstract their implementations, without depending on any specific technology or protocol to handle such information.

Action Dispatcher

This block is in charge of executing specific actions on the media entities defined by the user, based on the inference information received from the internal engine depending on the inference model used. This block depends on the use of triggers that will determine whether the actions need to be executed or not, based on provided policies that work as filters. In other words, this block manages the results of the policies, that are evaluated by the triggers, and depending on its result will do the actions defined by the user.

Trigger

This component is responsible to check the inference information received. If the inference information complies with the policies, this block will execute the actions. Policies, actions, and triggers are set up by the user in the application part allowing custom configuration of the data that will be processed. A trigger is compound for groups of policies and actions, so the stream sources can have different behavior.

Config Parser

This module is responsible for building the configuration that the application will use, by loading the project configuration information. The user is in charge of setting up the necessary information such as policies, actions, triggers, and source stream information, depending on the desired application behavior.

Decoupling Interfaces

Interface modules are responsible for establishing the connection between framework components and custom application modules. The existence of these boundaries is important to avoid mixing application-specific business logic with common infrastructure. In addition, this design allows the project to decouple and rearrange components, without affecting its functionality.

Custom Application

As seen in the diagram, custom application blocks represent any type of implementation or specific technology that can be included in the system design. The diagram presents some examples, however, the DeepStream Reference Designs project is not limited to those particular modules. The possibility of extending the design and incorporating specific business rules for each application is what gives this project a high degree of flexibility.

Next, it will be explained which modules can include a custom implementation, with code added by the user, and also it is included some examples of what kind of technologies can be used in the construction of these components.

Custom Media

This component represents the entity in charge of transporting the information data received by the camera. The goal is that the Camera Capture module can use every media instance, regardless of how the video frames are being processed. For example, a media could be used in such a way, that receives data through the RTSP protocol, which is capable of receiving and controlling the flow of information received, both audio and video, in a synchronized manner and in real-time. The requirement in this type of communication is to establish a connection between the host server, in charge of transmitting the information, and the media that capture it. For instance, in a parking system, this type of camera represents a popular choice due to its flexibility to integrate with CCTV systems.

In addition, if specific cameras are used with well-defined interfaces according to the hardware they present, the user can add a custom media that is compatible, such as GigE Vision, which is used for the transmission of video in high-performance industrial cameras that can be used in applications that require strict monitoring such as the aerospace industry. Another option is the MIPI Camera Serial Interface (CSI), which is an interface widely used in embedded systems for communication between digital cameras and target processors, to run tasks on the edge. As long as the media module interface is respected, there are many possibilities of specific implementations that can be used to process the video frames, without affecting the behavior of the general Camera Capture module.

Custom Deep Learning Models

It should be mentioned that the inference logic is also encapsulated in a separate, independent module, called engine, with a well-established interface. This module bases its operation on the DeepStream SDK, and allows, within its configurations, to use different inference models according to the application being developed. So, for instance, in a parking lot system, you could use a cascade of three different networks:

  • A Car detector
  • A License Plate detector
  • An OCR (optical character recognition) system

This configuration will vary from application to application. A shoplifting detection will probably implement a person detector along with a behavior analysis model. A speed limit enforcer will likely use a car detector and a tracker. A neuromarketing-powered billboard will use a face detector and a gaze tracker. As you can see, having the inference logic in an independent module allows you to highly customize your deep learning pipeline without modifying the rest of the architecture.

Custom Inference Listener

The inference listener component is responsible for transmitting the information metadata, which is obtained at the output of the neural networks used in the DeepStream pipeline. If we are in a parking lot application, the inference metadata will contain information about the detected vehicle and its respective license plate. If the application corresponds to a security system in a shopping center, the inference results contain essential data about the people detected around the areas of interest in the shopping center.

Therefore, this component has the task of transmitting the metadata that is being obtained in real-time. To achieve this goal, the user can make use of message brokers, which already have well-defined implementations and interfaces to transmit data through the so-called publisher-subscriber pattern. An example of this type of component could be RabbitMQ, which allows establishing communication through a local server, and can be integrated with the DeepStream framework using plugins like the Gst-nvmsgbroker. However, since DeepStream is based on the GStreamer framework, the user is free to create their custom inference listener element, which can be added to the media pipelines or use another component that is already developed.

Custom Inference Parser

As its name indicates, this element is in charge of parsing the information received by the inference listener. The metadata in question may be encoded using a particular format, such as JSON. However, the amount of information, the name of each field, and the format itself may vary depending on the component used to build the payloads from the metadata received by the inference listener. An example of the components that can be used to build payloads is the plugin that DeepStream provides called Gst-nvmsgconv, which offers different configurations on the type of payload that it is going to build. Each user is free to build their own custom module on how to parse the received data, depending on the transmission format used.

Custom Policy

Policies represent the business rules that make your application unique. They take the predictions made by the inference process and perform informed decisions based on them. The specific implementation of how to make each decision is completely left to the users. The important thing at this point is that you can add or delete the number of policies you want, without affecting the general behavior of the system.

In a parking lot example, you can implement several of them to process the predictions, for instance, to minimize the probability of an erroneous license plate read, you can implement a low pass filter technique that only reports a successful read if it received N matching sequential predictions. Or if a car is detected at the entrance, exit, or any stall, it is only reported once. If a vehicle is moving between stalls, it is flagged as suspicious, etc.

Take, for example, a shoplifting detection. The business rules will receive a raw prediction containing a list of persons and their locations and a predicted behavior for each of them (relaxed, nervous, mad, etc…). It would be insufficient and pretty inaccurate to mark as a shoplifter every person identified as nervous. One can create business rules combining other information to improve the performance of the system, such as the person is classified as nervous and there are no other persons nearby or the person is classified as nervous and is located near items that are popular targets for shoplifters.

As you can see, business rules take predictions as inputs and maintain a running state of the system to perform different tasks accordingly.

Custom Action

This type of module represents the actions that will be executed once the business rules established by the policies are activated. As with the previous component, the user has complete freedom on what type of actions he wants to implement and what tools he will use for its development.

It is important to mention that different policies will trigger different actions. In a parking lot example, you can have several policy-action relationships like:

  • If a car enters the parking lot a picture is taken
  • If a car is detected in any zone, the event is logged to a database
  • If a car is detected in any zone, the event is displayed in a dashboard to the user
  • If a car is flagged as suspicious, a video recording will be triggered

Most of them can be recycled for other applications. Again, being decoupled in independent modules and protected by the action interface allows very custom actions without interfering with the overall architecture.

Custom Config

The config module represents the way in which the general configurations of the project are loaded prior to execution. The user can decide if the configuration is obtained through a file, command line, or through network configuration. Each custom implementation must define the parameters that it considers necessary, such as: URLs, media identifiers, policies, actions, paths, and any other necessary parameters.



Previous: Project Architecture Index Next: Reference Designs