Neural Networks Demystified?

Two years ago, I built a backpropagation feedforward network, based on a one-page flowchart, and coded in C. It had one hidden layer. It used 34 years of CYHZ pressure data, only. One third to train, one third to stop overtraining, one third to test. The objective was to make 12-hour predictions of pressure. Compared its predictions to persistence and extrapolation. Measured error with standard deviation.

Gave the system 12 predictors, 10 dp/dt values for distinct periods preceding t-0, and time of day, and time of year. I did not try to narrow down the predictor set. (I suppose, that might have lead to better results.)

Tested increasing number of hidden nodes and found accuracy leveled off with 8. So the architecture was 12-8-1. This is interesting because there are conflicting rules of thumb about how may hidden nodes to use.

Tested decreasing values of learning coefficient. The trick is to make it small enough, but not too small. Small enough not to zero-in on the wrong values too soon. Not so small that iterative improvements take for ever to level off.

The "momentum method" greatly accelerated learning.

Results:

     Error   s.d. mb
     -----   ---
     extrp   7.6
     per     6.2
     n.n.    4.6

It impressed me that a neural network, put together in such an ad hoc manner and using only the pressure data itself, could do so well.

However, the graphed values of n.n. predictions looked unrealistic. Jagged. Someone issuing predictions like that would be labeled unstable.

A colleague suggested ARMA would be better than persistence. All I know about ARMA is it stands for Autoregressive Moving Average.

I suppose I could have made the n.n. better by using better predictors, but then I guessed the 12-hour s.d. error of NWP is 2 mb (?), so why bother.

It was difficult for me to convince myself that I had built "the optimal neural network" for the available data. And there was less chance I'd convince anyone else. How can you be sure you've not fallen into a local minimum?

How can you envision a local minimum? I've seen elegant 3-D charts, but weather is more than 3-D, and hard to see.

So I abandoned neural networks, hoping that eventually objective means of building optimal networks would be developed.

Besides which, the AI focus in my thesis had to narrow, and neural networks are barely AI.