Hearing is like seeing for our brains and for machines

325

There is an array of neural net machine learning approaches that are simply more than just “deep.” In a time when neural networks are increasingly popular for advancing voice technologies and AI, it’s interesting that many of the current approaches were originally developed for image or video processing.

One of those methods, convolutional neural networks (CNNs), makes it easy to see why image-processing neural nets are strikingly similar to the way our brains process audio stimuli. CNNs, therefore, nicely illuminate that our audio and visual processes are connected in more ways than one.

What you need to know about CNNs

As human beings, we recognize a face or an object regardless of where in our visual field (or in a picture) it appears. When you try to model that capability in a machine, by teaching it how to search for visual features (like edges or curves at a lower level of a neural network or eyes and ears at a higher level, in the example of face recognition), you typically do so locally, as all relevant pixels are close to each other. In human visual perception, this is reflected by the fact that a cluster of neurons is focused on a small receptive field, which is part of the much larger entire visual field.

Read the source article at TechCrunch