The Wasserstein distance has many different variations. In its simplest form the Wasserstein distance function measures the distance between two discrete probability distributions For example, if:
double[] P = new double[] { 0.6, 0.1, 0.1, 0.1, 0.1 }; double[] Q1 = new double[] { 0.1, 0.1, 0.6, 0.1, 0.1 }; double[] Q2 = new double[] { 0.1, 0.1, 0.1, 0.1, 0.6 }; Wasserstein(P, Q1) = 1.00 Wasserstein(P, Q2) = 2.00
Conceptually, if P is considered to be piles of dirt and Q is considered to be holes, then Wasserstein(P, Q) is the minimum amount of work (amount of dirt times distance moved) needed to transfer all dirt to the holes. Or you can think of Wasserstein as the effort required to transform P into Q.
I use the Python language for most of my machine learning projects, but sometimes I use the C# language. I coded up a quick demo of a highly simplified Wasserstein distance function:
using System; namespace Wasserstein { class Program { static void Main(string[] args) { Console.WriteLine("\nBegin demo \n"); double[] P = new double[] { 0.6, 0.1, 0.1, 0.1, 0.1 }; double[] Q1 = new double[] { 0.1, 0.1, 0.6, 0.1, 0.1 }; double[] Q2 = new double[] { 0.1, 0.1, 0.1, 0.1, 0.6 }; double wass_p_q1 = MyWasserstein(P, Q1); double wass_p_q2 = MyWasserstein(P, Q2); Console.WriteLine("Wasserstein(P, Q1) = " + wass_p_q1.ToString("F4")); Console.WriteLine("Wasserstein(P, Q2) = " + wass_p_q2.ToString("F4")); Console.WriteLine("\nEnd demo "); Console.ReadLine(); } // Main static int FirstNonZero(double[] vec) { int dim = vec.Length; for (int i = 0; i 0.0) return i; return -1; } static double MoveDirt(double[] dirt, int di, double[] holes, int hi) { double flow = 0.0; int dist = 0; if (dirt[di] holes[hi]) { flow = holes[hi]; dirt[di] -= flow; holes[hi] = 0.0; } dist = Math.Abs(di - hi); return flow * dist; } static double MyWasserstein(double[] p, double[] q) { double[] dirt = (double[])p.Clone(); double[] holes = (double[])q.Clone(); double totalWork = 0.0; while (true) { int fromIdx = FirstNonZero(dirt); int toIdx = FirstNonZero(holes); if (fromIdx == -1 || toIdx == -1) break; double work = MoveDirt(dirt, fromIdx, holes, toIdx); totalWork += work; } return totalWork; } } // Program } // ns
There are many complex variations of Wasserstein. My C# Wasserstein demo works only with discrete probability distributions where each data item is a single-valued probability.
Python is the programming language of choice for most machine learning systems. But C# is often the language of choice for business-related systems so it's nice to be able to implement ML functions in C# when needed, rather than try to glue a Python ML system to a C# business system.
Ray Cummings (1887-1957) was one of the early pioneers of science fiction. Decades ago, the conceptual distance between science fiction and reality was much greater than today. For example, in the early 1900s when the first airplanes could barely fly, stories about space travel must have seemed impossible. Today, sending a probe to Mars is almost commonplace.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.