I wrote an article titled "Differential Evolution Optimization" in the September 2021 edition of the Microsoft Visual Studio Magazine. See https://visualstudiomagazine.com/articles/2021/09/07/differential-evolution-optimization.aspx.

The most common type of optimization for neural network training is some form of stochastic gradient descent (SGD). SGD has many variations such as Adam (adaptive momentum estimation) and Adagrad (adaptive gradient). All SGD-based optimization algorithms use the Calculus derivative (gradient) of an error function. But there are alternative optimization techniques that don't use gradients. Examples include bio-inspired optimization techniques such as genetic algorithms and particle swarm optimization and geometry-inspired techniques such as Nelder-Mead and spiral dynamics.

My article explains how to implement a bio-inspired optimization technique called differential evolution optimization (DEO).

An evolutionary algorithm is any algorithm that loosely mimics biological evolutionary mechanisms such as mating, chromosome crossover, mutation and natural selection. Standard evolutionary algorithms can be implemented using dozens of specific techniques. Differential evolution is a special type of evolutionary algorithm that has a relatively well-defined structure:

  create a population of possible solutions  loop    for-each possible solution      pick three other random solutions      combine the three to create a mutation      combine curr solution with mutation = candidate      if candidate is better than curr solution then        replace current solution with candidate      end-if    end-for  end-loop  return best solution found  

The "differential" term in "differential evolution" is somewhat misleading. Differential evolution does not use Calculus derivatives. The "differential" refers to a specific part of the algorithm where three possible solutions are combined to create a mutation, based on the difference between two of the possible solutions.

Differential evolution optimization was originally designed for use in electrical engineering problems. But DEO has received increased interest as a possible technique for training deep neural networks. The biggest disadvantage of DEO is performance. DEO typically takes much longer to train a deep neural network than standard stochastic gradient descent (SGD) optimization techniques. However, DEO is not subject to the SGD vanishing gradient problem. At some point in the future, it's quite possible that advances in computing power (through quantum computing) will make differential evolution optimization a viable alternative to SGD training techniques.



There are quite a few interesting science fiction movies that involve alien DNA altering evolution. Here are three, all from 1995. Left: In "Species", scientists use instructions sent by aliens to splice alien DNA with human DNA. The result was not so good. Center: In "Mosquito", an alien spacecraft crashes in a forest. A regular mosquito ingests some alien DNA and . . the result was not so good for campers in the area. Right: In "Village of the Damned", 10 women are mysteriously impregnated by alien DNA. The resulting 10 children don't turn out to be very friendly.