When it comes to the theory of artificial neural networks in mathematical terms, the universal approximation theorem brings forward and states that a feed-forward network that comes with a single hidden layer comprising of a finite number of neurons that is actually nothing but multilayer perceptron, under mild assumptions on the activation function can result in the approximation of continuous functions on compact subsets of Rn. The theorem thus in short states that simple neural networks can represent a wide variety of interesting functions when provided with appropriate parameters; however, it does not touch upon the algorithmic learnability of those parameters.
One of the first versions of the universal approximation theorem for sigmoid activation functions was proved by George Cybenko in the year 1989.
In the same context, Hornik showed in 1991 that it is not the specific choice of the activation function itself, but rather the choice of the multilayer feed forward architecture itself which provides the neural networks with the potential of being universal approximators. The output units of the same are always assumed to be linear. For the convenience of notations, only the single output case will be put forward and the general case can easily be deduced from the single output case.
In this video by Micheal Nielsen, it is all about a theorem about artificial neural networks. What the theorem basically says in simple terms, as mentioned before, is that, given any continuous function at all, no matter how complicated it might be it's always possible to find an artificial neural network which can approximate that function as well as you would like now to explain the proof of the theorem, it is assumed that the viewer is already familiar with the basics of artificial networks. And if not, the video provides a few links to do the same.
Moving on with the same, the video also provides its viewers with the basics of how an artificial neuron works so in particular with the help of the plotting of a graph and hence puts forward a very simple, basic idea of the proof of the universal approximation theorem, including caveats that didn't make it into the video.
In relation to the same, given below are a few links to books, full proof and more to help the viewers:
For an introduction to artificial neural networks, see Chapter 1 of the free online book by Micheal Nielson: Click Here
The full proof of the theorem, go to chapter 4: Click Here
A good series of videos on neural networks is by 3Blue1Brown: Click Here
This video was made as part of a larger project, on media for mathematics: cognitivemedium
The Universal Approximation Theorem for neural networks:
Video Source: Michael Nielsen