Some recent artificial neural networks (ANNs) claim to model aspects of primate neural and human performance data. Their success in object recognition is, however, dependent on exploiting low-level features for solving visual tasks in a way that humans do not. As a result, out-of-distribution or adversarial input is often challenging for ANNs. Humans instead learn abstract patterns and are mostly unaffected by many extreme image distortions. We introduce a set of novel image transforms inspired by neurophysiological findings and evaluate humans and ANNs on an object recognition task. We show that machines perform better than humans for certain transforms and struggle to perform at par with humans on others that are easy for humans. We quantify the differences in accuracy for humans and machines and find a ranking of difficulty for our transforms for human data. We also suggest how certain characteristics of human visual processing can be adapted to improve the performance of ANNs for our difficult-for-machines transforms.

[BioCyb] [NeuroVision@CVPR] [Facilitating Robust Representations]

Below are a few examples of transforming an image with Extreme Image Transformations.

Extreme Image Transformations
Extreme Image Transformations applied to an Imagenette image of category Golf Ball. a non-transformed baseline image, b Full Random Shuffle with probability 0.5, c Grid Shuffle with grid size 40×40, d Within Grid Shuffle with block size 40×40 and probability 0.5, e Local Structure Shuffle with block size 80×80 and probability 0.5, f Segmentation Within Shuffle with 16 segments and probability 1.0, g Segmentation Displacement Shuffle with 64 segments, h Color Flatten