Deep neural systems based on Transformer Architecture (TA, also called multi-headed attention models) have revolutionized natural language processing (NLP). TA systems were designed to deal with sequence-to-sequence problems, such as translating English text to German text. TA systems can also handle sequence-to-value problems, such as sentiment analysis.

I came across an interesting example in the Keras library documentation that used Transformer Architecture to perform time series classification. This is a sequence-to-value problem where the sequence data is numeric rather than word-tokens in a sentence.

Specifically, the example program created a binary classifier for the Ford time series data. The Ford A dataset has 3601 training items and 1320 test items. Each data item has 500 time series values between about -5.0 and +5.0 that represent a measurement of engine noise. Each of the 500 measurement values were captured at an even interval (like perhaps 10 milliseconds). Each time series item is classified as -1 (no engine symptom) or +1 (engine symptom).

Note: I tracked down the source research paper for the Ford time series data but I don't remember the details. The important idea is that there is numeric time series data and each series has a class label to predict. This is not at all the same as a time series regression problem where each time series is unlabeled and the goal is to predict the next numeric value in the series.



The first class -1 and first class +1 time series items. The demo program converts -1 labels to 0 labels.


Whenever I find an interesting code example that I want to explore, my first step is to refactor the example. This forces me to examine every line of code. So that's what I did.



I only ran the demo for 10 epochs, which took about 6000 seconds = 100 minutes = an hour and a half. Running longer would improve the accuracy on the test data to about 95%.



The model summary. TA systems are not simple.


As expected, the demo code is extremely complicated. Relative to the number of lines of code, Transformer Architecture systems are by far the most complex software systems I work with.

I fiddled with the demo TA example for several hours. How, or even "if", this exploration will eventually pay off is not clear. But that's what doing research is all about. If I get some time, my next step will be to refactor the Keras code to PyTorch.



Three covers for "A Princess of Mars", by Edgar Rice Burroughs. The book is one of the most influential in the history of science fiction. The story first appeared in serialized form in "All-Story Magazine" in 1912, and was compiled to book form in 1917. Over time, different artists produced similar but clearly different styles. Left: By artist Frank Schoonover (1917). Center: By Robert Abbett (1963). Right: By Gino D'Achille (1973).


Code below. Long. Continue reading "Time Series Classification Using a Keras Transformer Model"