Modeling Simple Genetic Algorithms

Michael D. Vose, The University of Tennessee, 1996

"Evolutionary Computation", Volume 3, Number 4

Summary by Chaim Linhart (chaim@math.tau.ac.il) 8/8/98

 

Abstract

The infinite- and finite-population models of the simple genetic algorithm are extended and unified. The result incorporates both transient and asymptotic GA behavior. This leads to an interpretation of genetic search that partially explains population trajectories. In particular, the asymptotic behavior of the large-population simple genetic algorithm is analyzed.


Introduction

In earlier work, Vose described the evolutionary path taken by a genetic algorithm (GA) with infinite population using deterministic equations. However, real GAs are based on finite populations and are stochastic. This paper gives an exact Markov-chain model for finite-population GAs, making the interdependency between the finite- and infinite-population models explicit.

Populations of a GA correspond to points on a surface, which is described by some fitness function. As time passes and new generations are formed, the populations move, hopefully converging to the global optimum of the fitness function.

Vose proved (1991) that for large populations the evolutionary path of a finite-population GA follows very closely, with large probability, and for a long period of time that path predicted by the infinite population model. Thus, transient behavior of a finite-population GA is determined by the fixed points of the infinite model and their basins of attraction. When the mutation rate is greater than zero, a finite-population GA typically forms an ergodic Markov chain, visiting every state infinitely often. The interesting asymptotic question is: Where is it most likely to be?


Random Heuristic Search

An instance of random heuristic search (RHS) can be thought of as an initial collection of elements P0 chosen from some search space W , together with a stochastic transition rule t , which from Pi will produce another collection Pi+1. In other words, t will be iterated to produce a sequence of generations.

Let n be the cardinality of W and identify the integer i with the ith element of W . Define the simplex to be the set of population descriptors:

L = { p=<p0,...,pn-1> : 1tp=1, " j pj³ 0 }.

An element p of L corresponds to a population according to the rule:

pi = the proportion of i contained in the population.

The cardinality of each population is a constant r, called the population size. Given r, a population descriptor p unambiguously determines a population.

Given the current population P, the next population Q=t (P) results from r independent, identically distributed random choices. Let G:L ® L be a function that given the current population vector p produces a new vector q whose ith component is the probability that i is the result of a random choice. Thus, G(p) is the probability vector that specifies the distribution from which the aggregate of r choices forms the subsequent generation Q.

An instance of RHS is focused if G is continuously differentiable and for every pÎ L the sequence p, G(p), G2(p), ... converges to some fixed point x. Clearly, G(x)=x.

Most simple GAs fit into the framework of RHS. The genetic operators, together with the method of selecting the elements of the current generation to breed, determine the heuristic G. Most GAs are conjectured (if not proven) to be focused and possess the rest of the characteristics assumed in this paper.


The Finite Population Model

According to the law of large numbers, when the population size is infinite, the next generation matches the expectation (with probability 1), i.e. q=G(p). Finite models are more complex.

Identifying a population P with the descriptor p that represents it, the possible populations of size r correspond to a set Sr of points in the simplex L (Sr becomes dense in L as r® ¥ ). An exact model of RHS is provided by the following Markov chain. The states are the elements of Sr, and the transition probabilities are given by the matrix Q, where:

(this formula is a trivial interpretation of G)

Using Stirling’s theorem yields:

 

.

Lemma 1: For any UÌ L :

Lemma 2: If G maps L into its interior, there exist positive constants b and d (depending on G) such that:

Intuitively, a u,v is the distance between v (the state we reached) and G(u) (the expected state, which is the state we would have reached with an infinite population). In other words, this is the "price" we pay for moving from u to v.


The Fixed Point Graph

We shall now introduce a Markov chain Cr, which approximately models the behavior of RHS. Let r = x0,...,xk be a path of length k in L . Define its cost:

Let the stable fixed points of G in L be F = {w 0, ..., w w} (the paper also defines unstable fixed points, but we will not discuss them here, since they do not affect the main result). The fixed point graph is the complete directed graph Á on vertices F with edge i® j (for i¹ j) having weight:

This weight is the minimal "price" we have to pay in order to move from w i to a state in w j's basin of attraction. In other words, this is the amount of "fuel" we need in order to escape w i and reach the "gravitational field" of w j (from where we will be pulled towards w j). This is an approximation of the minimal cost of a path from w i to w j.

A tributary is a tree containing every vertex and such that all edges point toward the root. Figure 1 shows a sample tributary on F. Let Tk be the set of tributaries rooted at k, and for tÎ Tk let its cost |t| be the sum of its edge weights.

Figure 1: An example of a tributary rooted at w 5 (in blue); the fixed points are shown in red, and their basins of attraction are sketched in black.

Theorem: Let A be a transition matrix for a Markov chain Cr with states F, and define its corresponding graph to have edge i® j weighted by -ln Ai,j . A steady-state solution x, i.e. one for which: xTA = xT, has components:

.

 

Let:

.

If Á has a unique minimum-cost tributary, rooted at k’, then it follows from the theorem that the steady-state distribution of Cr converges as r® ¥ to point mass at w k’.


Asymptotic Approximation (main result)

Further analysis shows that Cr captures the asymptotic behavior of RHS as the population size r® ¥ , in the sense that only fixed points of G can have non-vanishing probability. Therefore, for large populations, RHS will with large probability be asymptotically near the fixed point w k’ of G that corresponds to the minimum-cost tributary of Á .


Conclusions

Under some assumptions (some of which have been proved for GA’s with proportional selection, crossover, and positive mutation), it is shown that:

Short-term behavior of G is determined by the local basin in which the initial population finds itself.

Long-term behavior is determined by that local fixed point having the largest basin (i.e., minimum spanning intree) - with probability close to 1, G will be asymptotically near that fixed point.


My Criticism

 

 

 

 

Chaim Linhart

chaim@math.tau.ac.il
August 1998.