In this paper we propose 10,000,000 algorithms: The (semi-)automatic design of (heuristic) algorithms 02/28/2020 Introducing John Woodward Edmund Burke, Gabriela Ochoa, Jerry Swan, Matt Hyde, Graham Kendall, Ender Ozcan, Jingpeng Li, Libin Hong [email protected] John Woodward University of Stirling 1 Conceptual Overview Combinatorial problem e.g. TSP salesman Exhaustive search? Genetic Algorithm heuristic permutations Travelling Salesman Tour Genetic Programming code fragments in for-loops. Travelling Salesman Instances TSP algorithm Single tour NOT EXECUTABLE!!! Give a man a fish and he will eat for a day. Teach a man to fish and he will eat for a lifetime. EXECUTABLE on MANY INSTANCES!!! 02/28/2020

John Woodward University of Stirling 2 Plan: From Evolution to Automatic Design 1. Generate and test method 2 2. Natural (Artificial) Evolution 2 3. Genetic Algorithms and Genetic Programming 2 4. Motivations (conceptual and theoretical) 3 5. One (or 2) example of automatic generation 6. Genetic Algorithms 17 (meta-learning, mutation operators, representation, problem classes, experiments). 7. Bin packing 6 8. Wrap up. Closing comments to the jury 5 9. Questions (during AND after) Please :) Now is a good timeJohntoWoodward say you are in the wrong room 02/28/2020 University of Stirling 3 Generate and Test (Search) Examples Feedback loop Humans Computers Generate Test 02/28/2020 Manufacturing e.g. cars Evolution survival of the fittest The best way to have a good idea is to have lots of ideas (Pauling).

Computer code (bugs+specs), Medicines Scientific Hypotheses, Proofs, John Woodward University of Stirling 4 Fit For Purpose Adaptation? 1. Evolution generates organisms on a particular environment (desert/arctic). 2. Ferrari vs. tractor (race track or farm) 3. Similarly we should design metaheuristics for particular problem class. 4. in this paper we propose a new mutation operator what is it for Colour? 02/28/2020 John Woodward University of Stirling 5 (Natural / Artificial) Evolutionary Process 1. variation (difference) 2. selection (evaluation) 3. inheritance (a)sexual reproduction) Mutation/Variation New genes introduced into gene pool Inheritance Off-spring have similar Genotype

(phenotype) PERFECT CODE Selection Survival of the fittest Peppered moth evolution 02/28/2020 John Woodward University of Stirling 6 (Un) Successful Evolutionary Stories 1. 40 minutes to copy DNA but only 20 minutes to reproduce. 2. Ants know way to nest. 3. 3D noise location 1. Bull dogs 2. Giant Squid brains 3. Wheels? 02/28/2020 John Woodward University of Stirling 7 Meta-heuristics / Computational Search 123 132 213 x1 231 312 321 Space of

permutation solutions e.g. test case Prioritisation (ORDERING). 00 01 x1 10 11 Space of bit string solutions e.g. test case Minimization SUBSET SELECTION incR1 decR2 x1 clrR3 Space of programs e.g. robot control, Regression, classification A set of candidate solutions Permutations O(n!), Bit strings O(2^n), Program O(?) For small n, we can exhaustively search. For large n, we have to sample with Metaheuristic /Search algorithm. Combinatorial explosion. Trade off sample size vs. time. John Woodward University of Stirling GA 02/28/2020

and GP are biologically motivated meta-heuristics. 8 Genetic Algorithms (Bit-strings) Breed a population of BIT STRINGS. Combinatorial Problem / encoding Representation TRUE/FASLE in Boolean satifiability (CNF), Knapsack (weight/cost), Subset sum=0 02/28/2020 1. Assign a Fitness value to each bit string, Fit(01010010000) = 99.99 01110010110 01110010110 01011110110 01011111110 01010010000 3. Mutate each individual 01110010110 One point mutation

01110011110 2. Select the better/best with higher probability from the population to be promoted to the next generation John Woodward University of Stirling 9 (Linear) Genetic Programming (Programs) Breed a population of PROGRAMS. weight height Programs for Classification, and Regression. output 02/28/2020 input Inc R1, R2=R4*R4 Rpt R1, Clr R4,Cpy R2 R4 R4=Rnd, R5=Rnd, If R1

Fit(Rpt R1, Clr R4,) = 99.99 3. Mutate each 2. Select the better/best individual with higher probability Rpt R1 R2, Clr from the population to R4, Cpy R2 R4 be promoted to the next One point generation. Discard the mutation worst. Rpt R1 R2, Clr R4, John STPWoodward University of Stirling 10 Theoretical Motivation 1 Search Algorithm a Search space Objective Function f P (a, f) 1. X1. Y1. 2.

x1 X2. x1 Y2. x1 3. X3. Y3. SOLUTION PROBLEM 1. A search space contains the set of all possible solutions. 2. An objective function determines the quality of solution. 3. A search algorithm determines the sampling order (i.e. enumerates i.e. without replacement). It is a (approximate) permutation. 4. Performance measure P (a, f) depend only on y1, y2, y3 5. Aim find a solution with a near-optimal objective value using a search algorithm. ANY QUESTIONS BEFORE NEXT SLIDE? 02/28/2020 John Woodward University of Stirling 11 Theoretical Motivation 2 Search Algorithm a Search space

Objective Function f 1. 1. 1. 1. 1. 2. x1 2. x1 2. x1 2. x1 2. x1 3. 3. 3.

3. 3. P P (A, F) = P (A,F) (a, f) = P (a, f) P is a performance measure, (based only on output values). , are a permutation and inverse permuation. A and F are probability distributions over algorithms and functions). F is a problem class. ASSUMPTIONS IMPLICATIONS 1. Algorithm a applied to function ( that is ) 2. Algorithm a applied to function precisely identical. 02/28/2020 John Woodward University of Stirling 12 One Man One Algorithm 1. Researchers design heuristics by hand and test them on problem instances or arbitrary benchmarks off internet. 2. Presenting results at conferences and publishing in journals. In this talk/paper we propose a new algorithm 3. What is the target problem class they have in mind? Human 1 Heuristic Algorithm1 Human2 Heuristic Algorithm2 Human3

Heuristic Algorithm3 02/28/2020 John Woodward University of Stirling 13 One Man .Many Algorithms Space of Algorithms GA mutations Algorithmic framework 1. Challenge is defining an . Facebook . bubble sort . MS word Mutation 1 X Mutation 2 X Mutation 10,000 X . Random Number . Linux operating system Search Space

02/28/2020 algorithmic framework (set) that includes useful algorithms, and excludes others. 2. Let Genetic Programming select the best algorithm for the problem class at hand. Context!!! Let the data speak for itself without imposing our assumptions. John Woodward University of Stirling 14 Meta and Base Learning 1. At the base level we are Meta level learning about a specific function. Mutation Function operator 2. At the meta level we are class designer learning about the problem class. 3. We are just doing generate and test on generate and Function to GA optimize test 4. What is being passed with each blue arrow? base level

5. Training/Testing and Conventional GA Validation 02/28/2020 John Woodward University of Stirling 15 Compare Signatures (Input-Output) Genetic Algorithm Designer Genetic Algorithm [(B^n -> R)] -> (B^n -> R) -> B^n ((B^n -> R) -> B^n) Input is an objective Input is a list of functions mapping function mapping bitbit-strings of length n to a real-value strings of length n to a real(i.e. sample problem instances from value. the problem class). Output is a (near optimal)Output is a (near optimal) mutation bit-string operator for a GA i.e. the solution to the i.e. the solution method (algorithm) to the problem class problem instance We are raising the level of generality at which we operate. Give a man a fish and he will eat for a day, teach a man to fish and 02/28/2020 John Woodward University of Stirling 16 Two Examples of Mutation Operators One point mutation flips ONE single bit in the genome (bit-string). (1 point to n point mutation)

Uniform mutation flips ALL bits with a small probability p. No matter how we vary p, it will never be one point mutation. Lets invent some more!!! NO, lets build a general method 02/28/2020 BEFORE 0 1 1 0 0 0 1 0 0 0 0 1 AFTER 0 1

1 0 BEFORE 0 1 1 0 AFTER 0 0 John Woodward University of Stirling 1 0 17 Off-the-Shelf Genetic Programming to Tailor Make Genetic Algorithms for Problem Class Meta-level Genetic Programming Iterative Hill Climbing (mutation operators) Fitness value 02/28/2020

Mutation operator search space x x x novel mutation heuristics x One Uniform Point mutation Genetic Algorithm mutation Base-level Commonly used Mutation operators18 John Woodward University of Stirling Building a Space of Mutation Operators Inc 0 Dec 1 Add 1,2,3

If 4,5,6 Inc -1 Dec -2 Program counter pc 2 WORKING REGISTERS 110 -1 +1 43 INPUT-OUTPUT REGISTERS -20 -1 +1 20 A program is a list of instructions and arguments. A register is set of addressable memory (R0,..,R4). Negative register addresses means indirection. A program can only affect IO registers indirectly.

+1 (TRUE) -1 (FALSE) +/- sign on output register. Insert bit-string on IO register, and extract from IO register 02/28/2020 John Woodward University of Stirling 19 Arithmetic Instructions These instructions perform arithmetic operations on the registers. Add Ri Rj + Rk Inc Ri Ri + 1 Dec Ri Ri 1 Ivt Ri 1 Ri Ri Clr Ri 0 Rnd Ri Random([1, +1]) //mutation rate Set Ri value Nop //no operation or identity 02/28/2020 John Woodward University of Stirling 20 Control-Flow Instructions These instructions control flow (NOT ARITHMETIC). They include branching and iterative imperatives. Note that this set is not Turing Complete! If if(Ri > Rj) pc = pc + |Rk| why modulus IfRand if(Ri < 100 * random[0,+1]) pc = pc + Rj//allows us to build mutation probabilities WHY? Rpt Repeat |Ri| times next |Rj| instruction Stp terminate 02/28/2020 John Woodward University of Stirling 21

Expressing Mutation Operators Line 0 Rpt, 33, 18 1 Nop Nop 2 Nop Nop 3 Nop Nop 4 Inc, 3 5 Nop

Nop 6 Nop Nop 7 Nop Nop 8 IfRand, 3, 6 9 Nop Nop 10 Nop Nop 11 Nop Nop 12 Ivt,3 13 Nop Stp 14 Nop Nop 15 Nop Nop 02/28/2020 Nop 16 Nop UNIFORM ONE POINT MUTATION Rpt, 33, 18

Uniform mutation Flips all bits with a fixed probability. Inc, 3 4 instructions One point mutation IfRand, 3, 6 flips a single bit. 6 instructions Why insert NOP? Ivt,3 We let GP start with these programs and mutate John Woodward University them. of Stirling 22 Parameter settings for (meta level) Genetic Programming Register Machine Parameter Value restart hill-climbing 100 hill-climbing iterations 5 mutation rate 3 program length 17 Input-output register size 33 or 65 working register size 5 seeded uniform-mutation fitness best in run, averaged over 20 runs

Note that these parameters are not optimized. 02/28/2020 John Woodward University of Stirling 23 Parameter settings for the (base level) Genetic Algorithm Parameter Value Population size 100 Iterations 1000 bit-string length 32 or 64 generational model steady-state selection method fitness proportional fitness function see next slide mutation register machine (NOT one point/uniform ) These parameters are not optimized except for the mutation operator (algorithmic PARAMETER). 02/28/2020 John Woodward University of Stirling 24 7 Problem Instances 7 realvalued functions, we will convert to discrete binary optimisations problems for a GA. number function 1 x 2 sin2(x/4 16) 3 (x 4) Ri (x 12) 4 (x Ri x 10 Ri cos(x))

5 sin(pi Ri x/644) Ri cos(pi Ri x/6412) 6 sin(pi Ri cos(pi Ri x/64 12)/4) 7 1/(1 + x /64) 02/28/2020 John Woodward University of Stirling 25 Function Optimization Problem Classes 1. To test the method we use binary function classes 2. We generate a Normally-distributed value t = 0.7 + 0.5 N (0, 1) in the range [-1, +1]. 3. 3. We linearly interpolate the value t from the range [-1, +1] into an integer in the range [0, 2^numbits 1], and convert this into a bit-string t.. 4. 4. To calculate the fitness of an arbitrary bit-string x, the hamming distance between x and the target bit-string t is calculated (giving a value in the range [0,numbits]). This value is then fed into one of the 7 functions. 02/28/2020 John Woodward University of Stirling 26 Results 32 bit problems Problem classes Means and standard deviations p1 mean p1 std-dev p2 mean p2 std-dev p3 mean p3 std-dev p4 mean

p4 std-dev p5 mean p5 std-dev p6 mean p6 std-dev p7 mean p7 std-dev 02/28/2020 Uniform One-point Mutation mutation RM-mutation 30.82 30.96 31.11 0.17 0.14 0.16 951 959.7 984.9 9.3 10.7 10.8 506.7 512.2 528.9 7.5 6.2 6.4 945.8 954.9 978 8.1 8.1 7.2 0.262 0.26

0.298 0.009 0.013 0.012 0.432 0.434 0.462 0.006 0.006 0.004 0.889 0.89 0.901 0.002 0.003 0.002 John Woodward University of Stirling 27 Results 64 bit problems Problem classes Means and stand dev p1 mean p1 std-dev p2 mean p2 std-dev p3 mean p3 std-dev p4 mean p4 std-dev p5 mean p5 std-dev p6 mean p6 std-dev p7 mean p7 02/28/2020 std-dev

Uniform Mutation One-point mutation 55.31 0.33 3064 33 2229 31 3065 36 0.839 0.012 0.643 0.004 0.752 0.0028 John Woodward University of Stirling RM-mutation 56.08 56.47 0.29 0.33 3141 3168 35 33 2294 2314 28 27 3130 3193 24

28 0.846 0.861 0.01 0.012 0.643 0.663 0.004 0.003 0.7529 0.7684 0.004 0.0031 28 p-values T Test for 32 and 64-bit functions on the7 problem classes class 32 bit 32 bit 64 bit 64 bit Uniform One-point Uniform One-point p1 1.98E-08

0.0005683 1.64E-19 1.02E-05 p2 1.21E-18 1.08E-12 1.63E-17 0.00353 p3 1.57E-17 1.65E-14 3.49E-16 0.00722 p4 4.74E-23 1.22E-16 2.35E-21 9.01E-13 p5 9.62E-17

1.67E-15 4.80E-09 4.23E-06 p6 2.54E-27 4.14E-24 3.31E-24 3.64E-28 p702/28/2020 1.34E-24 3.00E-18 1.45E-28 5.14E-23 29 John Woodward University of Stirling Rebuttal to Reviews comments 1. Did we test the new mutation operators against standard operators (one-point and uniform mutation) on different problem classes? NO the mutation operator is designed (evolved) specifically for that class of problem. Do you want to try on my tailor made trousers ? I think I should now, to demonstrate this. 2. Are we taking the training stage into account?

NO, we are just comparing mutation operators in the testing phase Anyway how could we meaningfully compare brain power (manual design) against processor power (evolution). I02/28/2020 think. John Woodward University of Stirling 30 Impact of Genetic Algorithms Genetic Algorithms are one of the most referenced academic works. http://citeseerx.ist.psu.edu/stats/authors?all=true D. Goldberg 16589 Genetic Algorithms in Search, Optimization, and Machine Learning D Goldberg Reading, MA, Addison-Wesley 47236 1989 http://www2.in.tu-clausthal.de/~dix/may2006_allcited.php 64. D. Goldberg: 7610.21 http://www.cs.ucla.edu/~palsberg/h-number.html 84 David E. Goldberg (UIUC) 02/28/2020 John Woodward University of Stirling 31 Additions to Genetic Programming 1. final program is part human constrained part (for-loop) machine generated (body of for-loop). 2. In GP the initial population is typically randomly created. Here we (can) initialize the population with already known good solutions (which also confirms that we can express the solutions). (improving rather than evolving from scratch) standing on shoulders of giants. Like genetically modified crops we start from existing crops. 3. Evolving on problem classes (samples of problem instances drawn from a problem class) not instances. 02/28/2020

John Woodward University of Stirling 32 Problem Classes Do Occur 1. Travelling Salesman 1. Distribution of cities over different counties 2. E.g. cf. USA is square, Japan is long and narrow. 2. Bin Packing & Knapsack Problem 1. The items are drawn from some probability distribution. 3. Problem classes do occur in the real-world 4. Next 6 slides demonstrate problem classes and scalability with on-line bin packing. 02/28/2020 John Woodward University of Stirling 33 On-line Bin Packing A sequence of pieces is to be packing into as few a bins or containers as possible. Bin size is 150 units, pieces uniformly distributed between 20-100. Different to the off-line bin packing problem where the set of pieces to be packed is available for inspection at the start. The best fit heuristic, puts the current piece in the space it fits best (leaving least slack). It has the property that this heuristic does not open a new bin unless it is forced to. Array of bins Range of piece size 20-100 150 =

Bin capacity Pieces packed so far 02/28/2020 Sequence of pieces to be packed John Woodward University of Stirling 34 Genetic Programming applied to on-line bin packing Not immediately obvious how to link Genetic Programming to apply to combinatorial problems. See previous paper. The GP tree is applied to each bin with the current piece put in the bin which gets maximum score capacity fullness size 02/28/2020 emptiness Terminals supplied to Genetic Programming Fullness is Initial representation {C, F, S} irrelevant The space is Replaced with {E, S}, E=C-F important We can possibly reduce this to one variable!!

John Woodward University of size 35 Stirling How the heuristics are applied % C + C S 70 85 30 02/28/2020 60 -15 -3.75 3 F 4.29 1.88 70 90 120 30 John Woodward University of Stirling

45 36 Robustness of Heuristics = all legal results = some illegal results 02/28/2020 John Woodward University of Stirling 37 The Best Fit Heuristic Best fit = 1/(E-S). Point out features. Pieces of size S, which fit well into the space remaining E, score well. Best fit applied produces a set of points on the surface, The bin corresponding to the maximum score is picked. 150 100 50 100-150 50-100 0-50 -50-0 -100--50 -150--100 0 -50 02/28/2020

60 50 40 30 20 30 44 58 72 86 100 142 128 114 -150 70 -100 10 2 0 16

emptiness John Woodward University of Piece size Stirling 38 Our best heuristic. pieces 20 to 70 15000 10000 5000 0 10 0 20 30 40 50 60 -5000 70 80 emptine ss 90 100 -10000 110 120 130 140

65 68 62 56 150 59 50 pie ce size 53 47 41 44 35 38 32 26 29 20 23

-15000 Similar shape to best fit but curls up in one corner. Note that this is rotated, relative to previous slide. 02/28/2020 John Woodward University of Stirling 39 Compared with Best Fit Amount the heuristics beat best fit by Amount evolved heuristics beat best fit by. 700 600 500 400 evolved on 100 evolved on 250 evolved on 500 300 200 100 0 0 20000 40000 60000

80000 -100 Number of pieces 100000 packed so far. Averaged over 30 heuristics over 20 problem instances Performance does not deteriorate The larger the training problem size, the better the bins are packed. 02/28/2020 John Woodward University of Stirling 40 Compared with Best Fit Amount the heuristics beat best fit by 2 Zoom in of previous slide 1.5 1 Amount evolved heuristics beat best fit by. 0.5

evolved on 100 evolved on 250 evolved on 500 0 0 50 100 150 200 250 300 350 400 -0.5 -1 -1.5 The heuristic seems to learn the number of pieces in the problem Analogy with sprinters running a race accelerate towards end of race. The break even point is approximately half of the size of the training problem size If there is a gap of size 30 and a piece of size 20, it would be better to wait for a better piece to come along later about 10 items (similar effect at upper bound?).

02/28/2020 John Woodward University of Stirling 41 A Brief History (Example Applications) 1. 2. 3. 4. 5. 6. 7. Image Recognition Roberts Mark Travelling Salesman Problem Keller Robert Boolean Satifiability Fukunaga, Bader-El-Den Data Mining Gisele L. Pappa, Alex A. Freitas Decision Tree - Gisele L. Pappa et. al. Selection Heuristics Woodward & Swan Bin Packing 1,2,3 dimension (on and off line) Edmund Burke et. al. & Riccardo Poli et. al. 02/28/2020 John Woodward University of Stirling 42 A Paradigm Shift? Algorithms investigated/unit time One person proposes a family of algorithms and tests them in the context of a problem class.

One person proposes one algorithm and tests it in isolation. Human cost (INFLATION) conventional approach machine cost MOORES LAW new approach Previously one person proposes one algorithm Now one person proposes a set of algorithms Analogous to industrial revolution from hand to machine made. Automatic Design. 02/28/2020 John Woodward University of Stirling 43 Conclusions 1. Algorithms are reusable, solutions arent (e.g. TSP). 2. We can automatically design algorithms that consistently outperform human designed algorithms (on various domains). 3. Heuristic are trained to fit a problem class, so are designed in context (like evolution). Lets close the feedback loop! Problem instances live in classes. 4. We can design algorithms on small problem instances and scale them apply them to large problem instances (TSP, child multiplication). 02/28/2020 John Woodward University of Stirling 44