A summary of the common downstream tasks used to evaluate video representation learning for long, instructional videos.

What is Representation Learning?

Representation learning is an area of research that focuses on how to learn compact, numerical representations for different sources of signal. These signals are most often video, text, audio, and image. The goal of this research is to use these representations for other tasks, such as querying for information. A well-known example of this is when searching for videos on YouTube: a user provides text keywords resulting in YouTube returning a set of videos most similar to those words.

In computer vision literature, representations are learned by training a deep learning model to embed (or transform) raw input to a…

This story is a summary of the information you need to understand Covid-19 and how it affects you. It is accumulated from multiple sources as a means of fact checking.


  • Wear a mask, anything is better than nothing
  • Immunity is not necessarily guaranteed
  • Rapid Tests indicate most contagious
  • Be wary of long-term effects

Prevention and Detection

The human Covid-19 is primarily spread by person-to-person contact through respiratory droplets generated by breathing, sneezing, coughing, etc (see La Rosa et al. 2020 and ECDC). …

One of the distinctive differences between information in a single image and information in a video is the temporal element. This has led to improvements of deep learning model architectures to incorporate 3D processing in order to additionally process temporal information. This article summarizes the architectural changes from images to video through the I3D model.


Figure 1. The training process for the two-stream I3D on Kinetics Dataset. Image by author, adapted from Carreira and Zisserman (2017) [1].

The I3D model was presented by researchers from DeepMind and the University of Oxford in a paper called “Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset” [1]. The paper compares previous approaches to the problem of action detection in videos while additionally…

This article will describe some of the state-of-the-art methods in depth predictions in image sequences captured by vehicles that help in the development of new autonomous driving models without the use of extra cameras or sensors.

As mentioned in my previous article “How does Autonomous Driving Work? An Intro into SLAM”, there are many sensors that are used to capture information while a vehicle is driving. The variety of measurements captured include velocity, position, depth, thermal and more. These measurements are fed into a feedback system that trains and utilizes motion models for the vehicle to abide by. This article focuses on the prediction of depth which is often captured by a LiDAR sensor. A LiDAR sensor captures distance from an object using a laser and measuring the reflected light with a sensor. However, a LiDAR…

This paper is a review of research in quantum image processing (QIP), storage, and retrieval. It discusses current issues with silicon based computing on processing big data for machine learning tasks such as image recognition and how quantum computation can address these challenges. First this paper will introduce the challenges Moore’s law presents to traditional computer processors in addition to an introduction to quantum computers and how they address these. The paper will then introduce how quantum computation evolved into the field of image processing followed by a discussion on the advantages and disadvantages found in current research and applications.

I. Introduction

Authors from Google extend prior research using state of the art convolutional approaches to handle objects in images of varying scale [1], beating state-of-the-art models on semantic-segmentation benchmarks.

From Chen, L.-C., Papandreou, G., Schroff, F., & Adam, H., 2017 [1]


One of the challenges in segmenting objects in images using deep convolutional neural networks (DCNNs) is that as the input feature map grows smaller from traversing through the network, information about objects of a smaller scale can be lost.

SLAM is the process where a robot/vehicle builds a global map of their current environment and uses this map to navigate or deduce its location at any point in time [1–3].

Use of SLAM is commonly found in autonomous navigation, especially to assist navigation in areas global positioning systems (GPS) fail or previously unseen areas. In this article, we will refer to the robot or vehicle as an ‘entity’. The entity that uses this process will have a feedback system in which sensors obtain measurements of the external world around them in real time and the process analyzes these measurements to map the local environment and make decisions based off of this analysis.


SLAM is a type of temporal model in which the goal is to infer a sequence of states…


Cyber attacks are on the rise, I do not need to provide much proof of that, as it is in the news almost every day! There are cyber security vendors that do their best to protect organizations’ machines, but there is always gaps that result in the need for human intervention and resolution. There is a need in organizations for cyber professionals to be on the ready to respond to cyber attacks in both prevention and resolution.

From Team Accelerite (2017).

Current approaches to training rely heavily on subject matter experts (SMEs) or Red Teams to provide a challenging and evolving adversary for realistic…


A regularizer is commonly used in machine learning to constrain a model’s capacity to cerain bounds either based on a statistical norm or on prior hypotheses. This adds preference for one solution over another in the model’s hypothesis space, or the set of functions that the learning algorithm is allowed to select as being the solution [1]. The primary aim of this method is to improve the generalizability of a model, or to improve a model’s performance on previously unseen data. Using a regularizer improves generalizability because it reduces overfitting the model to the training data.

The most common practice…

Graph neural networks (GNNs) have emerged as an interesting application to a variety of problems. The most pronounced is in the field of chemistry and molecular biology. An example of the impact in this field is DeepChem, a pythonic library that makes use of GNNs. But how exactly do they work?

What are GNNs?

Typical machine learning applications will pre-process graphical representations into a vector of real values which in turn loses information regarding graph structure. GNNs are a combination of an information diffusion mechanism and neural networks, representing a set of transition functions and a set of output functions. The information diffusion…

Madeline Schiappa

PhD Student in the UCF Center for Research in Computer Vision https://www.linkedin.com/in/madelineschiappa/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store