Online Training

Can we answer online user queries consisting of seen as well as unseen subscriptions that include processing of multimedia events? The answer is probably “Online Training” of models with constraints of minimizing response-time and high accuracy.

An example of generalized multimedia event processing that includes processing of seen/unseen subscriptions specifically for the detection of objects is shown in Figure 1. Since Deep Neural Networks (DNN) are popular for image recognition in achieving high-performance results, it is desirable to bring its capabilities to identify the image-based events (like objects), in smart cities. However, the constraint of minimizing response-time with the provision of online training of such neural-networks associate many challenges for multimedia event processing systems, and include the following two scenarios:

Case 1: Classifier for subscription available

This case contains subscriptions (like car, dog, bus) which are previously known to the multimedia event processing system, and their classifiers are already present in the model. Here response-time will depend only on the testing time while excluding training time.

Case 2: Classifier for subscription not available

This scenario includes subscriptions (like person, truck, traffic_light) for which classifiers are not available and unknown to the system. However, by using the similarity of new subscriptions with existing base classifiers, we can further classify the present case as:

  1. Subscriptions require classifiers similar to base classifiers: Consider an example of an unknown subscription “truck”, classifier for truck can be constructed from existing “bus” classifier using domain adaptation based online training.
  2. Subscriptions require classifiers completely different from base classifiers: In such scenario, we assume no base classifiers are similar to incoming subscription and response-time must include cost of training online from scratch.
Figure 1: Generalized Multimedia Event Processing Scenario

Case 1: Classifier for subscription available

On arrival of seen subscriptions, proposed system [1, 2] first finds commonalities among subscriptions for subscription covering based optimization using keywords, then identify classifiers according to the subscribed keyword, apply the object detection, and finally notify the user on the matching of image event with subscription (Please refer to Multimedia Publish Subscribe for more detail). Base classifiers for such seen subscriptions are trained offline on object detection datasets (Pascal VOC and Microsoft COCO) consisting of a smaller number of classes (20 and 80) but have high accuracy.

Case 2: Classifier for subscription not available

To process unseen subscription, we utilize previous multimedia event processing model and remove the limitation of availability of trained classifiers using the online training based adaptive framework [3] shown in Figure 2. The proposed approach identifies if any similar classifier is available to adapt i.e. if there is any possibility for domain adaptation, or we need to train from scratch. In the first case, we utilize Transfer Learning (TL) technique to train a classifier online for unseen class by adapting either from pre-trained models or similar classifiers. In the second model we train classifier online from scratch in real-time by tuning of hyperparameters on the basis of strategies categorized by response-time.

Figure 2: Adaptive Multimedia Event Processing

(a) Subscriptions require classifiers similar to base classifiers:

A functional model [4] has been designed for the domain adaptive multimedia event processing (shown in Figure 3), consisting of various models discussed below:

Event Matcher analyzes user subscriptions (such as bus, car, dog) and image events, and is responsible for the detection of conditions in image events as specified by user query and preparation of notifications that need to be forwarded to users.

Training and Testing Decision Model designed to analyze available classifiers and take the testing and training decision accordingly. It evaluates the relationship of existing classifiers with new/unknown subscription and chooses the transfer learning technique.

Classifier Construction Model phase performs the training of classifiers for subscribed classes, and updates the classifier in the shared resources after allowed response-time. The two options of transfer learning used for classifier construction includes fine-tuning and freezing layers. In the first approach we are performing fine-tuning on a pre-trained model (presently ImageNet), which uses the technique of backpropagation with labels for target domain until validation loss starts to increase. In the second approach, we are using this previously trained classifier to instantiate the network of another classifier required for a similar subscription concept. In this particular scenario, we are freezing the backbone (convolutional and pooling layers) of the neural network and training only top dense fully connected layers, where the frozen backbone is not updated during back-propagation and only fine-tuned layers are getting updated and retrained during the training of classifier. The decision of construction of a classifier for “bus” either from pre-trained models (by fine-tuning) or from “car” classifier (by freezing) is taken with the help of computation of a threshold based on subscriptions relatedness (path operator of WordNet).

Figure 3: Online Training based Multimedia Event Processing

In  Training Data Construction model, if a subscriber subscribes for a class which is not present in any smaller object detection datasets (Pascal VOC, and Microsoft COCO), then a classifier can be constructed by fetching data from datasets (ImageNet, and OID) of more classes using online tools like ImageNet-Utils and OIDv4_ToolKit. Another common approach of online training data construction is to use engines like “Google Images” or “Bing Image Search API” to search for class names and download images.

Feature Extraction of Multimedia Events is responsible for the detection of objects in image events using current deep neural network-based object detection models and incorporating new classifiers. Here we utilize image classification models in backbone network of object-detection models.

Shared Resources component consist of existing image processing modules and training datasets. We use You Only Look Once (YOLO), Single shot multibox detector (SSD), and Focal loss based Dense object detection (RetinaNet) as object detection models. We have some base classifiers trained off-line using established dataset Pascal VOC, which are used in constructing more classifiers using domain adaptation.

(b) Subscriptions require classifiers completely different from base classifiers:

Lastly to support such unseen subscriptions which are completely different from seen or familiar classifiers, we proposed an online training based model that can provide reasonable accuracy in short training time even on training from scratch [5]. In this case, we optimized the online training model by leveraging the hyperparameter tuning based technique which analyzes the accuracy-time trade-off of object detection models and configure learning-rate, batch-size, and number of epochs, using response-time based strategies: (1) Minimum Accuracy and Minimum Response Time, (2) Optimal Accuracy and Optimal Response Time, and (3) Maximum Accuracy and Maximum Response Time, for the such dynamic seen/unseen subscription constraints.


  1. Asra Aslam and Edward Curry. “Towards a generalized approach for deep neural network based event processing for the internet of multimedia things.” IEEE Access 6 (2018): 25573-25587.
  2. Asra Aslam, Souleiman Hasan, and Edward Curry. “Challenges with image event processing: Poster.” Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems. 2017.
  3. Asra Aslam. “Object Detection for Unseen Domains while Reducing Response Time using Knowledge Transfer in Multimedia Event Processing.” Accepted for the Proceedings of the 2020 ACM on International Conference on Multimedia Retrieval (ICMR).
  4. Asra Aslam and Edward Curry. “Reducing Response Time for Multimedia Event Processing using Domain Adaptation.” Accepted for the Proceedings of the 2020 ACM on International Conference on Multimedia Retrieval (ICMR).
  5. Asra Aslam and Edward Curry. “Investigating Response Time and Accuracy in Online Classifier Learning for Multimedia Publish-Subscribe Systems.” Under Review, Springer Multimedia Tools and Applications.