Neural Networks
Credit: 3Blue1Brown

Neural Networks

Machine learning (ML) is a branch of Artificial intelligence (AI), the name given to the field developing technologies concerned with enabling a machine to simulate human behaviour. ML refers to the subfield of AI aimed at studying computer algorithms that are designed to operate and self-improve based on data and pattern recognition. There are 3 main ML algorithms categories:

  • Supervised learning: the algorithm is presented with a data set containing a set of inputs and their associated outputs or labels. The task is to achieve an accurate mapping between the two while being predictive on new inputs. This category can be further subdivided into classification and regression algorithm types. In the former, the output is a discrete function or simply a category, while in the latter the output is a real continuous function of the inputs.
  • Unsupervised learning: the algorithm is presented with an unlabelled data. The task is to describe its hidden structure and correlations. One example of this category is the clustering algorithms that try to achieve a grouping of the data according to the patterns it can detect or is sensitive to.
  • Reinforcement learning: the algorithm is provided by a desired performance within a specific context or constraints. The task is self-tuning based on trial and error trying to maximise the reward and minimise the penalty by approaching to the desired output.

As of the late 20th century, ML has been heavily applied to various problems in particle physics, such as simulations, real time analysis and triggering, object recon- struction, identification, and calibration, image recognition based on raw data in LHC analyses, approximating matrix elements, classifying the Standard Model events and finally determination of non-perturbative objects in QCD.

One of the most used ML algorithm in supervised learning, not only in particle physics, is the artificial neural networks (ANN). This algorithm initially was inspired from the mode of operation biological neural networks has in the human brain. Particularly w.r.t. features like receiving inputs (Dendrites), structure (Axon), connectivity (Synapses) and processing of information (Soma). A more important similarity, is the ability for ANN to continuously update its parameters based on the data it trains on and the patterns that it encounters.

One of the most important features of NNs is its compliance with the universal approximation theorem that states that a feed forward NN with a single layer is sufficient to represent any function within the range of the given inputs.

The forward propagation is the mode in which an ANN evaluates its outputs based on a given input by a successive evaluation of the layers. Backward propagation is the mode in which the NN learn by propagating back into the parameters the error found between its output and the data. Backward propagation relies on computing the gradient of a loss function w.r.t. the weights and biases of the NN by means of a repeated application of the chain-rule. This gradient is then used by a minimisation algorithm that adopts a gradient-descent strategy.

In NNAD we present a C++ implementation of the analytic derivative of a feed-forward neural network with respect to its free parameters for an arbitrary architecture, known as back-propagation. We dubbed this code NNAD (Neural Network Analytic Derivatives) and interfaced it with the widely-used ceres-solver minimiser to fit neural networks to pseudodata in two different least-squares problems (See NNAD-Interface). The first is a direct fit of Legendre polynomials. The second is a somewhat more involved minimisation prob- lem where the function to be fitted takes part in an integral. Finally, using a consistent framework, we assess the efficiency of our analytic derivative formula as compared to numerical and automatic differentiation as provided by ceres-solver. We thus demonstrate the advantage of using NNAD in problems involving both deep or shallow neural networks.