Life is a game, take it seriously

The Deep Learning Not That Smart List

In AI, Computer Vision, deep learning, Machine Learning, Paper Talk on May 27, 2019 at 12:00 pm

by Li Yang Ku (Gooly)

Deep learning is one of the most successful scientific story in modern history, attracting billions of investment money in half a decade. However, there is always the other side of the story where people discover the less magical part of deep learning. This post is about a few research (quite a few published this year) that shows deep learning might not be as smart as you think (most of the time they would came up with a way to fix it, since it used to be forbidden to accept paper without deep learning improvements.) This is just a short list, please comment below on other papers that also belong.

a) Szegedy, Christian, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. “Intriguing properties of neural networks.”, ICLR 2014

The first non-magical discovery of deep learning has to go to the finding of adversarial examples. It was discovered that images added with certain unnoticeable perturbations can result in mysterious false detections by a deep network. Although technically the first publication of this discovery should go to the paper “Evasion Attacks against Machine Learning at Test Time” by Battista Biggio et al. published in September 2013 in ECML PKDD, the paper that really caught people’s attention is this one that was put on arxiv in December 2013 and published in ICLR 2014. In addition to having bigger names on the author list, this paper also show adversarial examples on more colorful images that clearly demonstrates the problem (see image below.) Since this discover, there have been continuous battles between the band that tries to increase the defense against attacks and the band that tries to break it (such as “Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples” by Athalye et al.), which leads to a recent paper in ICLR 2019 “Are adversarial examples inevitable?” by Shafahi et al. that questions whether it is possible that a deep network can be free of adversarial examples from a theoretical standpoint.

b) Dmitry Ulyanov, Andrea Vedaldi, and Victor Lempitsky. “Deep Image Prior.” CVPR 2018

This is not a paper intended to discover flaws of deep learning, in fact, the result of this paper is one of the most magical deep learning results I’ve seen. The authors showed that deep networks are able to fill in cropped out images in a very reasonable way (see image below, left input, right output) However, it also unveils some less magical parts of deep learning. Deep learning’s success was mostly advertised as learning from data and claimed to work better than traditional engineered visual features because it learns from large amount of data. This work, however, uses no data nor pre-trained weights. It shows that convolution and the specific layered network architecture, (which may be the outcome of millions of grad student hours through trial and error,) played a significant role in the success. In other words, we are still engineering visual features but in a more subtle way. It also raises the question of what made deep learning so successful, is it because of learning? or because thousands of grad students tried all kinds of architectures, lost functions, training procedures, and some combinations turned out to be great?

c) Robert Geirhos, Patricia Rubisch, Claudio Michaelis, Matthias Bethge, Felix A. Wichmann, and Wieland Brendel. “ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness.” ICLR 2019.

It was widely accepted in the deep learning community that CNNs recognize objects by combining lower level filters that represent features such as edges into more complex shapes layer by layer. In this recent work, the authors noticed that contrary to what the community believes, existing deep learning models seems to have a strong bias towards textures. For example, a cat with elephant texture is often recognized as an elephant. Instead of learning how a cat looks like, CNNs seem to take the short cut and just try to recognize cat fur. You can find a detailed blog post about this work here.

d) Wieland Brendel, and Matthias Bethge. “Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet.” ICLR 2019.

This is a paper from the same group as the previous paper. Based on the same observations, this paper claims that CNNs are not that different from bag of feature approaches that classifies based on local features. The authors created a network that only looks at local patches in an image without high level spatial information and was able to achieve pretty good result on ImageNet. The author further shuffled features in an image and existing deep learning models seems to be not sensitive to these changes. Again CNNs seem to be taking short cuts by making classifications based on just local features. More on this work can be found in this post.

e) Azulay, Aharon, and Yair Weiss. “Why do deep convolutional networks generalize so poorly to small image transformations?.” rejected by ICLR 2019.

This is a paper that discovered that modern deep networks may fail to recognize images shifted 1 pixel apart, but got rejected because reviewers don’t quite buy-in on the experiments nor the explanation. (the authors made a big mistake of not providing an improved deep network in the paper.) The paper showed that when the image is shifted slightly or if a sequence of frames from a video is given to a modern deep network, jaggedness appear in the detection result (see example below where the posterior probability of recognizing the polar bear varies a lot frame by frame.) The authors further created a dataset from ImageNet with the same images embedded in a larger image frame at a random location and showed that the performance dropped about 30% when the embedded frame is twice the width of the original image. This work shows that despite modern networks getting close to human performance on image classification tasks on ImageNet, it might not be able to generalize to the real world as well as we hoped.

f) Nalisnick, Eric, Akihiro Matsukawa, Yee Whye Teh, Dilan Gorur, and Balaji Lakshminarayanan. “Do Deep Generative Models Know What They Don’t Know?.” ICLR 2019

This work from DeepMind looks into tackling the problem that when tested on data with a distribution different from training, deep neural network can give wrong results with high confidence. For example, in the paper “Multiplicative Normalizing Flows for Variational Bayesian Neural Networks” by Louizos and Welling, it was discovered that on the MNIST dataset a trained network can be highly confident but wrong when the input number is tilted. This makes deploying deep learning to critical tasks quite problematic. Deep generative models were thought to be a solution to such problems, since it also models the distribution of the samples, it can reject anomalies if it does not belong to the same distribution as the training samples. However, the authors short answer to the question is no; even for very distinct datasets such as digits versus images of horse and trucks, anomalies cannot be identified, and many cases even wrongfully provide stronger confidence than samples that does come from the trained dataset. The authors therefore “urge caution when using these models with out-of-training-distribution inputs or in unprotected user-facing systems.”


Revisiting Behavior-Based Robotics

In AI, Robotics on February 28, 2019 at 9:10 pm

by Li Yang Ku (Gooly)

The maker of the well known Baxter robot, Rethink Robotics, closed their doors last October. Baxter robot, although not perfect, plays an important role in robot history. Its low price tag ($22,000 instead of $100,000) and human safe features (won’t be able to kill grad students) made these robots one of the most common robots among the robotics research community. Unfortunately, that was not enough to survive in the market.

Many of you may heard of Rodney Brooks, the founder and CTO of Rethink Robotics, who was also the director of MIT’s Computer Science & Artificial Intelligence Laboratory (CSAIL) and one of the founders of iRobot, but to me, it would be behavior-based robotics that best describes him. In this post, I am going to revisit Rodney Brooks’ research on behavior-based robotics and explain why it was a big deal back then.

To fully understand behavior-based robotics, we have to go back in time and look at what was happening in the research world before Rodney Brooks started advocating for behavior-based robotics in the 80s. This is right around the time of the early AI winter and before the collapse of the expert system industry. An expert system stores a huge knowledge base of logics describing facts in the world entered by experts. During query the inference engines tries to find a solution based on the given logics. It is not hard to imagine that the robots designed at that time would also be based on this kind of thinking. Shakey, the famous robot build by Stanford Research Institute in the late 60s, uses logic to solve tasks based on a symbolic model of the environment. Despite its national fame, Shakey was designed for an experimental environment consists of big blocks and, as you might know, it was not a technology breakthrough that lead to household robots.

In the late 70s, Rodney Brooks was especially frustrated of these symbolic approaches that tries to model the world in detail. Computers were not fast at that time, and trying to estimate the world model with uncertainty is even more time consuming. In a trip which Rodney was stuck in Thailand, he observed that insects seems to be much more capable than his robots despite having a small nervous system. The realization was that there is no need to model the world because the world is always there, the robot can always sense the world and use it as its own model. This simple idea is basically the core concept of behavior-based robotics.

Rodney went on and proposed the subsumption robotic architecture that is composed of different layers of state machines, which the higher layers subsumes lower layers to create more complicated behaviors. Brooks claims that this approach is radically different from tradition approach that follows the sense-model-plan-act framework. The subsumption architecture is capable of reacting to the world in real-time since the lower layers can produce outputs directly. Instead of executing actions in a pre-planned sequence, the next actions can simply be activated by new observations from the world. Rodney argues that this new approach have a very different decomposition compared to the traditional sequential information flow. In the subsumption architecture, each layer itself connects from sensing to action. Higher layers may rely on lower layers, but does not call lower layers as subroutines. Several robots were built based on this architecture, including robot Allen that can move to a goal while avoiding obstacles, robot Herbert that can pick up soda cans, insect like robot Genghis, etc.

These work were quite influential and provided a very different perspective on how to approach AI. Unlike other robots at that time, robots under the subsumption architecture can react in real-time in a human environment. Rodney went on to promote this concept and published a series of papers (with some of the best titles) such as “Planning is just a way of avoiding figuring out what to do next” and “Elephants don’t play chess.” Two crucial ideas were emphasized in these papers. 1) Situatedness: The robots should not deal with abstract descriptions, but with the environment that directly influences the robot and 2) Embodiment: The robots should experience the world directly so that their actions have immediate feedback on the robots’ own sensations. These are the central ideas that led to behavior-based solutions.

Today, computers are much faster and robots now are capable of running the good old fashion sense-model-plan-act sequence close to if not yet in real-time. Model heavy approaches such as physics-based approaches were one of the most popular topics and planning algorithms are ubiquitous among robot arms and self-driving cars. So is behavior-based robotics still relevant in 2019? Some of the concept still exists in many robots, but in a more hybrid fashion, such as having a lower level loop that allows the robot to react faster under a high level AI planning layer. Although behavior-based robotics is not mentioned as often nowadays, I am pretty sure we will revisit it when the sense-model-plan-act approach fails again.


  • Brooks, Rodney A. “New approaches to robotics.” Science253, no. 5025 (1991): 1227-1232.
  • Brooks, Rodney A. “Elephants don’t play chess.” Robotics and autonomous systems 6, no. 1-2 (1990): 3-15.
  • Brooks, Rodney A. “Planning is just a way of avoiding figuring out what to do next.” (1987).
  • Talking Robots Podcast with Rodney Brooks
  • Wikipedia: subsumption architecture

Paper Picks: IROS 2018

In AI, deep learning, Paper Talk, Robotics on December 30, 2018 at 4:18 pm

By Li Yang Ku (Gooly)

I was at IROS in Madrid this October presenting some fan manipulation work I did earlier (see video below), which the King of Spain also attended (see figure above.) When the King is also talking about deep learning, you know what is a hype the trend in robotics. Madrid is a fabulous city, so I am only able to pick a few papers below to share.


a) Roberto Lampariello, Hrishik Mishra, Nassir Oumer, Phillip Schmidt, Marco De Stefano, Alin Albu-Schaffer, “Tracking Control for the Grasping of a Tumbling Satellite with a Free-Floating Robot”

This is work done by folks at DLR (the German Aerospace Center). The goal is to grasp a satellite that is tumbling with another satellite. As you can tell this is a challenging task and this work presents progress extended from a series of previous work done by different space agencies. Research on related grasping tasks can be roughly classified as feedback control methods that solves a regulation control problem and optimal control approaches that computes a feasible optimal trajectory using an open loop approach. In this work, the authors proposes a system that combines both feedback and optimal control. This is achieved by using a motion planner which is generated off-line with all relevant constraints to provide visual servoing a reference trajectory. Servoing will deviate from the original plan but the gross motion will be maintained to avoid motion constraints (such as singularity.) This approach is tested on a gravity free facility. If you haven’t seen one of these zero gravity devices, they are quite common among space agencies and are used to turn off gravity (see figure above.)

b) Josh Tobin, Lukas Biewald , Rocky Duan , Marcin Andrychowicz, Ankur Handa, Vikash Kumar, Bob McGrew, Alex Ray, Jonas Schneider, Peter Welinder, Wojciech Zaremba, Pieter Abbeel, “Domain Randomization and Generative Models for Robotic Grasping.”

This is work done at OpenAI (mostly) that tries to tackle grasping with deep learning. Previous works on grasping with deep learning are usually trained on at most thousands of unique objects, which is relatively small compared to datasets for image classification such as ImageNet. In this work, a new data generation pipeline that cuts meshes and combine them randomly in simulation is proposed. With this approach the authors generated a million unrealistic training data and show that it can be used to learn grasping on realistic objects and achieve similar to state of the art accuracy. The proposed architecture is shown above, α is a convolutional neural network, β is a autoregressive model that generates n different grasps (n=20), and γ is another neural network trained separately to evaluate the grasp using the likelihood of success of the grasp calculated by the autoregressive model plus another observation from the in-hand camera. This use of autoregressive model is an interesting choice where the authors claimed to be advantageous since it can directly compute the likelihood of samples.

c) Barrett Ames, Allison Thackston, George Konidaris, “Learning Symbolic Representations for Planning with Parameterized Skills.”

This is a planning work (by folks I know) that combines parameterized motor skills with higher level planning. At each state the robot needs to select both an action and how to parameterize it. This work introduces a discrete abstract representation for such kind of planning and demonstrated it on Angry Birds and a coffee making task (see figure above.) The authors showed that the approach is capable of generating a state representation that requires very few symbols (here symbols are used to describe preconditions and state estimates), therefore allow an off-the-shelf probabilistic planner to plan faster. Only 16 symbols are needed for the Angry Bird task (not the real Angry Bird, a simpler version) and a plan can be found in 4.5ms. One of the observation is that the only parameter settings needed to be represented by a symbol are the ones that maximizes the probability of reaching the next state on the path to the goal.