Summary of Genetic Algorithms for QWOP Gait Evolution

Summary Genetic Algorithms for QWOP Gait Evolution arxiv.org

6,716 words - PDF document - View PDF document

One Line

Genetic algorithms and keystrokes were used to optimize gaits for QWOP, with the help of dynamic mutation and the cellular model to enhance performance.

Slides

Slide Presentation (10 slides)

Copy slides outline Copy embed code Download as Word

Genetic Algorithms for QWOP Gait Evolution

Source: arxiv.org - PDF - 6,716 words - view

Introduction to QWOP Gait Evolution

• QWOP is a browser-based game controlled by 'Q', 'W', 'O', and 'P' keys

• Difficulty and unintuitive gameplay

• Goal: Optimize gaits using genetic algorithms

Programmatically Playing QWOP with Totter

• Totter program uses selenium library to control mouse and keyboard

• Translates genotype into phenotype (sequence of key presses)

• Fitness evaluation based on distance run

Testing Different Representations for Gaits

• Keystroke sequences found to be the most effective representation

• Other representations (keyup-keydown, bitmask) did not perform as well

Visual: Comparison of different representations

Importance of Seeding the Initial Population

• Seeding with better-than-random individuals improves performance

• Pool size of 500 sufficient for good results

Visual: Comparison of seeded vs. random initial populations

Investigating Dynamic Parameter Control Mechanisms

• Dynamic mutation has a positive effect on performance

• Dynamic replacement does not yield better results

Visual: Graph showing performance with dynamic mutation

Comparing Different Types of Genetic Algorithms

• Cellular model consistently yields stable and fast gaits

• Steady-state model also performs well but has higher variability

Visual: Comparison of different genetic algorithm types

Evolution of Gaits vs. Human Players

• Evolved gaits similar to those used by human players

• Did not reach proficiency of best human players

Visual: Comparison of evolved gaits vs. human player gaits

Improving Efficiency of Genetic Algorithms

• Using surrogate fitness function or faster evaluation times can improve efficiency

• Acknowledgment of limitations in fitness evaluation time

Visual: Efficiency improvement potential

Key Takeaways

• Genetic algorithms optimize gaits for QWOP

• Keystroke sequences are the most effective representation

• Seeding with better-than-random individuals improves performance

• Dynamic mutation has a positive effect on performance

• Cellular model consistently yields stable and fast gaits

• Evolved gaits did not reach the level of best human players

• Efficiency can be improved with surrogate fitness function or faster evaluation times

• Genetic algorithms offer insights for evolving gaits in QWOP

Key Points

Genetic algorithms are used to discover effective gaits for the game QWOP.
The authors developed a program called Totter to play the game programmatically.
Different representations for the gaits were tested, with keystroke sequences being the most effective.
Seeding the initial population with better-than-random individuals improved the performance of the genetic algorithms.
Dynamic parameter control mechanisms were investigated, with dynamic mutation having a positive effect on performance.
The cellular model of genetic algorithms consistently yielded stable and fast gaits.
The evolved gaits did not reach the level of proficiency achieved by the best human players.
Using a surrogate fitness function or faster evaluation times could improve the efficiency of the genetic algorithms.

Summaries

24 word summary

Genetic algorithms were used to find effective gaits for QWOP. Keystrokes were the most effective representation. Dynamic mutation and the cellular model improved performance.

77 word summary

This paper explores using genetic algorithms to discover effective gaits for the game QWOP. The authors developed a program called Totter to play the game programmatically. Different representations for the gaits were tested, with keystrokes being the most effective. Seeding with better-than-random individuals improved performance. Dynamic mutation had a positive effect. The cellular model consistently yielded stable and fast gaits. The authors suggest that dynamic sensory feedback would improve control. Limitations in fitness evaluation time were acknowledged.

163 word summary

This paper explores the use of genetic algorithms to discover effective gaits for the game QWOP. The authors developed a program called Totter to play the game programmatically, using the selenium library to control the mouse and keyboard. Different representations for the gaits were tested, and the keystroke representation was found to be the most effective. Seeding the initial population with better-than-random individuals greatly improved the performance of the genetic algorithms. Dynamic parameter control mechanisms were investigated, with dynamic mutation having a positive effect on performance. The cellular model consistently yielded stable and fast gaits. The authors were able to evolve gaits similar to those used by human players, but not at the same proficiency level. They suggest that dynamic sensory feedback would improve control. Limitations due to fitness evaluation time were acknowledged, and suggestions for improvement were made. Overall, this paper demonstrates the effectiveness of genetic algorithms for evolving gaits in QWOP and provides insights into various strategies and representations to use.

421 word summary

This paper explores the use of genetic algorithms to discover effective gaits for the game QWOP. QWOP is a difficult browser-based game where the player controls an Olympic sprinter using the 'Q', 'W', 'O', and 'P' keys.

The authors developed a program called Totter to play the game programmatically. Totter uses the selenium library to control the mouse and keyboard. The program translates the genotype of an individual into a phenotype, which is a sequence of key presses, holds, and releases. The fitness of each individual is evaluated by running their phenotype in the game and measuring the distance run.

Different representations for the gaits were tested, including keystroke sequences, keyup-keydown sequences, bitmask sequences, and bitmask-duration sequences. The keystroke representation, which encodes a sequence of key presses, was found to be the most effective in evolving stable and fast gaits. The other representations, allowing for more complex control patterns, did not perform as well.

The authors experimented with different initialization strategies and found that seeding the initial population with better-than-random individuals greatly improved the performance of the genetic algorithms. A seeding pool size of 500 was sufficient to achieve good results.

Dynamic parameter control mechanisms were investigated. Varying the mutation rate and selection pressure over time encouraged exploration in the early stages and convergence in the later stages. Dynamic mutation had a positive effect on performance, but dynamic replacement did not yield better results compared to static parameters.

Different types of genetic algorithms were compared, including steady-state, generational, and cellular models. The cellular model consistently yielded stable and fast gaits with low variability across trials. The steady-state model also performed well but had higher variability in fitness across trials.

The authors were able to evolve gaits similar to those used by human players of QWOP. However, the gaits evolved by the genetic algorithms did not reach the level of proficiency achieved by the best human players. The authors suggest that a system with dynamic sensory feedback would have a better chance of achieving human-like control of the runner.

The authors acknowledge limitations in their experiments due to the time required for fitness evaluation. They suggest that using a surrogate fitness function or building a simulation of QWOP with faster evaluation times could greatly improve the efficiency of the genetic algorithms.

In conclusion, this paper demonstrates the effectiveness of genetic algorithms for evolving gaits in the game QWOP. The authors provide insights into the best representations, initialization strategies, dynamic parameter control mechanisms, and types of genetic algorithms to use for this problem.

483 word summary

In this paper, the authors explore the use of genetic algorithms to automatically discover effective gaits for the game QWOP. QWOP is a browser-based game where the player controls an Olympic sprinter using the 'Q', 'W', 'O', and 'P' keys. The goal is to advance the runner to the end of the 100-meter race as quickly as possible. The game is known for its difficulty and unintuitive gameplay.

To play the game programmatically, the authors developed a program called Totter, which uses the selenium library to control the mouse and keyboard. The program translates the genotype of an individual into a phenotype, which is a sequence of key presses, holds, and releases. The fitness of each individual is evaluated by running their phenotype in the game and measuring the distance run.

The authors tested different representations for the gaits, including keystroke sequences, keyup-keydown sequences, bitmask sequences, and bitmask-duration sequences. They found that the keystroke representation, which simply encodes a sequence of key presses, was the most effective in evolving stable and fast gaits. The other representations, which allowed for more complex control patterns, did not perform as well.

The authors also experimented with different initialization strategies. They found that seeding the initial population with better-than-random individuals greatly improved the performance of the genetic algorithms. They tested different sizes of seeding pools and found that a pool size of 500 was sufficient to achieve good results.

In addition, the authors investigated dynamic parameter control mechanisms. They varied the mutation rate and selection pressure over time to encourage exploration in the early stages of evolution and convergence in the later stages. They found that dynamic mutation had some positive effect on performance, but dynamic replacement did not yield better results compared to static parameters.

Finally, the authors compared different types of genetic algorithms, including steady-state, generational, and cellular models. They found that the cellular model consistently yielded stable and fast gaits with low variability across trials. The steady-state model also performed well, but had higher variability in fitness across trials.

Overall, the authors were able to evolve gaits that were similar to those used by human players of QWOP. However, the gaits evolved by the genetic algorithms did not reach the level of proficiency achieved by the best human players. The authors suggest that a system with dynamic sensory feedback would have a better chance of achieving human-like control of the runner.

The authors acknowledge that their experiments were limited by the time required for fitness evaluation. They suggest that using a surrogate fitness function or building a simulation of QWOP with faster evaluation times could greatly improve the efficiency of the genetic algorithms.

Raw indexed text (42,699 chars / 6,716 words / 955 lines)

Genetic Algorithms for Evolution of QWOP Gaits

Zachary Jones Mohammad Al-Saad Ankush Vavishta

Institute for Artificial Intelligence

University of Georgia

[email protected] Department of Computer Science

University of Georgia

[email protected] Department of Computer Science

University of Georgia

[email protected]

I. I NTRODUCTION

QWOP is a browser-based, 2-dimensional flash game in

which the player controls an Olympic sprinter competing

in a simulated 100-meter race. The goal of the game is to

advance the runner to the end of the 100-meter race as quickly

as possible using the ‘Q’, ‘W’, ‘O’, and ‘P’ keys, which

control the muscles in the sprinter’s legs. Despite the game’s

simple controls and straightforward goal, it is renowned for

its difficulty and unintuitive gameplay.

In this paper, we attempt to automatically discover effective

QWOP gaits. We describe a programmatic interface developed

to play the game, and we introduce several variants of a genetic

algorithm tailored to solve this problem. We present experi-

mental results on the effectiveness of various representations,

initialization strategies, evolution paradigms, and parameter

control mechanisms.

II. B ACKGROUND

A. QWOP

QWOP is a browser-based, 2-dimensional, free-to-play flash

game developed by Bennett Foddy [1]. The player controls an

Olympic sprinter competing in a simulated 100-meter race,

and the goal of the game is to advance the runner to the end

of the race as quickly as possible without falling over. The

game is won when the simulated sprinter reaches or exceeds

the 100-meter finish line. The game is considered a loss if the

sprinter’s upper body touches the ground at any time before

the finish line.

Players control the simulated sprinter using the ‘Q’, ‘W’,

‘O’, and ‘P’ keys. Pressing and holding a particular key causes

the muscles one part of the sprinter’s legs to contract. The

‘Q’ and ‘W’ keys control the contraction of the the sprinter’s

left and right thigh muscles, while the ‘O’ and ‘P’ keys

control the contraction of the left and right hamstring and

calf muscles. The game allows multiple keys to be pressed at

the same time. Precise timing and careful coordination of key

presses are required to successfully advance the runner without

falling - a task that has proven to be difficult and unintuitive

for human players. Shortly after its development, QWOP

achieved widespread infamy for its intentionally frustrating

control system and unpredictable ragdoll physics engine [2]

[3].

Achieving a realistic bipedal gait is often quickly abandoned

by new players. Most human players tend to resort to finding

a repeatable pattern of inputs sufficient enough to scoot the

sprinter to the finish line by balancing on one knee and kicking

the free foot forward [4]. Nevertheless, some determined

players have achieved incredible proficiency at the game and

found fast, stable bipedal gaits. The current world record is

held by Roshan Ramachandra, who completed the game in 67

seconds [5].

B. Genetic Algorithms

Genetic Algorithms are a family of search algorithms

loosely inspired by the process of biological evolution. A

genetic algorithm (GA) maintains an artificial population of

encoded candidate solutions. Each individual in the population

is evaluated by a fitness function. The encoded solution is

referred to as the individual’s genotype, and the decoded

solution is referred to as the individual’s phenotype.

A genetic algorithm begins by initializing a population

with N individuals having randomly generated genotypes. The

algorithm then proceeds through successive generations. Each

generation begins by selecting two or more parents from the

population. Parents with better fitness are more likely to be

selected. After parents are selected, a crossover operation is

applied to pairs of parents with probability P c . The crossover

operation generates n new candidate solutions referred to as

children. Then a mutation operator is applied to each child

with some probability P m . The generation ends by selecting

selecting n members of the population for replacement with

the new children. The GA iterates the genetic operations

through generations until it reaches some terminating con-

dition, such as a sufficiently optimized solution or a fixed

number of generations.

GAs have been shown to be robust problem solvers, partic-

ularly for optimization problems where the fitness landscape

might have several local optima. In this paper, we use three

varieties of genetic algorithm - a generational model, a cellular

model, and a steady-state model.

In a generational GA, the entire population is replaced

by a new population each iteration. In this type of GA, no

replacement mechanism need be specified, since all children

survive to replace all parents. Compared to the other varieties,

generational GAs may converge more slowly, but they often

maintain diversity in the population for a longer period of time.

This allows generational GAs to easily escape local optima.

By contrast, steady-state GAs replace only a few members

of the population at a time. The parenthood selection mecha-nism typically selects a small number of highly-fit individuals

to reproduce and generate a small number of children. In some

versions, the children replace the weakest members of the

population. Steady-state GAs often exhibit faster convergence

rates than their generational counterparts since strong individ-

uals quickly take over the population. However, it is more

difficult for steady-state GAs to escape local optima.

Cellular GAs are a specialized version of generational GAs.

In a cellular GA, the population is organized along a 2-

dimensional grid. During the crossover phase, each individual

selects a mate from the set of adjacent individuals. This me-

chanic simulates isolation by distance within the population.

Discoveries of high-fitness individuals take time to propagate

across the grid. This helps to prevent premature convergence

and promotes niches of specialized subpopulations which

cover a wider variety of optima [6].

The fitness landscape for QWOP is difficult to explore using

most algorithmic optimization techniques. The solution space

of QWOP gaits is enormous, and only a small fraction of the

solution space results in gaits that do not immediately fail.

Due to the game’s sensitive physics engine, small changes

in control sequence can produce disastrous results. Genetic

algorithms pose a promising technique for the automatic

discovery of QWOP gaits given their ability to explore large

and jagged fitness landscapes.

III. R ELATED W ORK

Much research has explored discovery and optimization of

robotic gaits using genetic algorithms. Quadruped and hexapod

robots are particularly common research platforms due to

their physical stability and wide availability. Clune et al. [7]

created a genetic algorithm which evolves neural networks for

quadruped gaits. Gallagher et al. [8] used a similar approach to

evolve gaits for a hexapod robot. Many other researchers have

successfully employed genetic algorithms to evolve many-

legged, neurocontrolled robots [9] [10] [11].

Rather than evolving weights for neural networks, many

researchers tackle the problem more directly. Parker and

Tarimo [12] introduced the “Cyclic Genetic Algorithm” (CGA)

and showed that it can effectively discover semi-optimal gaits

for quadruped robots within 2000 generations. Their CGA

encodes a solution as a sequence of control inputs to be sent

to the robot, and the inputs are looped until evaluation has

completed. Hornby et al. [13] successfully employed genetic

algorithms to automate the discovery of walking and running

gaits on consumer-grade quadruped robots and included some

of the resultant gaits in Sony’s AIBO robotic dog.

Genetic algorithms have also been applied to bipedal robots.

The problem of evolving successful bipedal gaits can be

particularly tricky compared to the quadruped or hexapod

case. As Ray et al. [4] note, “[W]alking with only two legs

necessarily involves periods of time during which the robot is

supported entirely by only one leg. Thus, bipedal gaits require

more careful balance than gaits in which multiple legs are

supporting the robot at any given moment.”

Wolff and Nordin [14] ran an online genetic programming

algorithm to optimize gaits for stability and straightness on a

humanoid robot. Their algorithm refined gaits using on-board

visual feedback. Zhang et al. [15] discovered stable biped gaits

for flat ground and for stair-climbing using a genetic algorithm

to generate trajectories for the hip and ankle joints. Cheng et

al. [16] designed a GA capable of optimizing bipedal gaits

for stability, speed, or inclined-surface walking. Many other

researchers have successfully evolved human-like gaits using

evolutionary computing [17] [18] [19] [20].

The problem of evolving gaits for the simulated runner in

QWOP is significantly easier than most of the gait evolution

problems encountered in the literature. First, the QWOP runner

interacts with a purely simulated 2-dimensional environment.

QWOP gaits need not contend with unexpected obstacles,

elevation shifts, or unexpected bumps in the walking surface.

Second, the QWOP runner has only four degrees of freedom.

The search space for QWOP gaits is much smaller than that

of 3D bipedal or quadrupedal robots, which typically have 8

- 12 degrees of freedom.

At the time of this paper’s writing, only one publication

exists which considers the specific problem of evolving gaits

for QWOP. Ray et al. [4] compare two GAs for QWOP.

The first is a simple genetic algorithm (SGA) [21] which

represents solutions as sequences of key presses, holds, and

releases. The second is a cellular genetic algorithm [22] which

represents solutions as sequences of characters from a 16-

letter alphabet. Each letter in the alphabet corresponded to

a certain configuration of keys to be held and released. In

both cases, the authors used two-point crossover and uniform

mutation. Fitness was assigned based on the speed that the

simulated runner achieved. Experimental results indicated that

the cellular genetic algorithm outperformed the simple genetic

algorithm. The cellular genetic algorithm was able to evolve a

stable knee-scooting gait similar to the strategy employed by

many human players.

IV. M ETHODS

A. QWOP Interface

Individuals in our genetic populations are assigned fitness

based on their performance in QWOP. Unfortunately, QWOP

is a closed-source, browser-based game. It does not expose an

API which could be used to programmatically play the game.

Instead, we have designed and implemented a Python program

capable of automatically playing QWOP by controlling the

mouse and keyboard. We call our software “Totter” [23].

To evaluate an individual’s fitness, Totter first uses the

selenium [24] library to open a web browser and navigate to

Bennett Foddy’s website, where QWOP is hosted. The QWOP

window is then resized and centered on the user’s primary

monitor. Totter starts the game by programmatically moving

the mouse to the center of the browser window and clicking

to give it keyboard focus.

Totter then translates the individual’s genotype into a phe-

notype, which is a subroutine that executes a sequence of

key presses, holds, and releases. The individual’s phenotype is(a) The simulated runner on the starting line

(b) The “Game Over” screen after the runner falls

Fig. 1: Screenshots from QWOP

executed in a loop using the pyautogui [25] library. Execution

terminates when one of the following conditions is met:

1) The runner falls to the ground before the 100m mark

and loses the game

2) The runner reaches the 100m mark and wins the game

3) The time limit expires

Termination conditions 1 and 2 are detected using a color-

sampling algorithm. QWOP’s “game over” screen, depicted

in Figure 1b, contains a very different distribution of colors

compared to the game’s normal background. Totter samples

the color distribution of the window every 0.25 seconds, and

terminates execution if the distribution matches the game over

screen.

The third termination condition serves a dual purpose. First,

it is possible to enter a sequence of inputs which causes

the simulated runner to neither fall nor advance. If fitness

evaluation were not time-limited, then such a sequence could

stall the program indefinitely. Second, it is useful to limit

the maximize time for a fitness evaluation. Evaluating an

individual’s fitness is the bottleneck for performance in our

algorithms. Fitness evaluations can take several minutes if a

particularly slow but stable gait is discovered. Imposing a time

limit allows evolution to continue at a reasonable pace.

After a termination condition is detected, Totter captures a

small window of the screen which contains a description of the

runner’s performance. Our program uses the Tesseract Optical

Character Recognition [26] library to convert the image into

text data. Totter extracts the distance run and reports this value

to the individual. The individual then updates its fitness. This

process is repeated for each individual that a genetic algorithm

needs to evaluate.

B. Representation

QWOP gaits can be described by a sequence of key presses,

holds, and releases. We explored four different representa-

tions for QWOP gaits, each of which are described below.

Regardless of representation, Totter forces key presses to be

at least 50 milliseconds apart. This ensures that a computer-

controlled player does not execute a key sequence that would

be impossible for a human player. The experiment described

in section V-A 5.1 is designed to assess the performance of

each representation.

1) Representation 1: Keystroke sequence: The simplest

possible representation encodes a gait as a sequence of keys

to be pressed. Each individual is a variable-length string from

the alphabet { Q, W, O, P}. Each character in the sequence

indicates that the corresponding key should be pressed for 150

milliseconds, then released. We refer to this representation as

the “Keystroke” representation. An example individual is show

in Figure 2.

[W, W, P, Q, P, W, O, O, P, Q, W]

Fig. 2: Example runner using Representation 1

The individual in Figure 2 would be translated as follows:

Press W and hold for 150 ms, release W, press W and hold

for 150 ms, release W, press P and hold for 150 ms, release P,

press Q and hold for 150 ms, . . . and so on. This representation

has two major benefits: it rarely contains sensitive patterns

that could be disrupted by crossover and mutation, and it is

highly interpretable. However, it is impossible to represent

sequences with multiple buttons pressed at the same time, and

it is impossible to vary the length of the button holds.

2) Representation 2: Keyup-Keydown Sequence: The sec-

ond representation expands on the search space by both

allowing variable lengths of key presses and allowing multiple

keys to be pressed at the same time. This representation

encodes gaits as sequences of characters from the alphabet {

Q, W, O, P, q, w, o, p, + }. Under this representation, a capital

letter means that the corresponding key should be pressed, a

lowercase letter means that the corresponding key should be

released, and a ‘+’ symbol indicates that the current configura-

tion should be maintained for another 150 milliseconds. A 150

millisecond delay is inserted between interpretation of each

character in the sequence. This representation is very similarTABLE I: Mappings from alphabet to bitmask

Alphabet

to “Encoding 1” considered by Ray et al [4]. We refer to it as

the “Keyup-Keydown” representation. An example individual

is shown in Figure 3.

[P, p, +, W, Q, O, w, +, W, q, o, P, +, +, q, o, O, w, O, W, o]

Fig. 3: Example runner using Representation 2

The individual in Figure 3 would be translated as follows:

Press and hold P, release P, wait 150 ms, press and hold

W, press and hold Q, press and hold O, release W, . . .

and so on. Like representation 1, this representation is highly

interpretable. Representation 2 allows individuals to produce

more complicated patterns of keystrokes than representation

1, but it forces the GA to search a larger space of possible

solutions. Another downside of this representation is that

the control state at any given point on an individual using

this encoding is highly dependent on its context within the

sequence. Crossover and mutation operations are likely to

produce “non-coding DNA” - patterns of inputs that cause

no change in the current control state.

3) Representation 3: Bitmask Sequence: The third repre-

sentation encodes a gait as a variable-length string of charac-

ters from the alphabet { A, B, C, . . . , P}. Each character in

the alphabet is mapped to a binary mask that sets the input

state for all four keys. Table I shows the mapping of each

letter in the alphabet to its corresponding bitmask.

Similarly to representations 1 and 2, each character in the

sequence indicates that the corresponding control state should

be held for 150 milliseconds. We refer to this representation as

the “Bitmask” representation. An example individual is show

in Figure 4.

[B, D, A, M, P, D, B, M, H, F, G, E, I, J, M, P]

Fig. 4: Example runner using Representation 3

The individual in Figure 4 would be translated as follows:

press and hold Q, W, and O, then continue to hold Q and

W while releasing O, then press and hold O and P, then

release Q and W, then release O and P, . . . and so on. This

representation allows individuals to press and hold multiple

keys at the same time, but it does not allow for variable-length

holds of a specific control sequence. Unlike representation 2,

the Bitmask representation is not prone to heavy disruption

by crossover and mutation operators, and will almost never

produce non-coding DNA.

Alphabet

4) Representation 4: Bitmask-Duration Sequence: The final

representation we consider is a simple modification of repre-

sentation 3 which allows control states to be held for varying

lengths of time. It encodes gaits as sequences of (X, duration)

pairs, where X is one of the characters from the alphabet in

table I, and duration is an integer in the range [50, ∞). An

entry in the sequence indicates that control state X should be

held for duration milliseconds. Figure 5 shows an example

individual encoded with this representation

[L, 117], [F, 433], [A, 350], [E, 440], [O, 468], [J, 349], [E, 76]

Fig. 5: Example runner using Representation 4

The individual in Figure 5 would be translated as follows:

Press and hold W for 117 ms, release W, press and hold Q

and O for 433 ms, press and hold W and P and hold for

350 ms, ... and so on. Encoding a gait as (bitmask, duration)

pairs allows the GA to attain the vast representational power of

the “Keyup-Keydown” representation, without allowing for the

potential of non-coding DNA. We refer to this representation

as “Bitmask-Duration” representation.

C. Fitness Function

A naive approach to fitness assignment is to maximize dis-

tance achieved by the virtual sprinter. Such a fitness landscape

selects for maximally stable individuals but ignores speed.

This tends to produce runners who are very stable but also very

slow. Another naive approach is to maximize running speed.

But such an objective disproportionately allocates fitness to

runners who take one lunging step at the beginning of the

race and instantly crash.

To prevent over-rewarding falling lunges, other researchers

[4] assigned a fitness of zero to any runner who crashes. Our

approach is slightly different. We use the following piece-wise

function of distance ran and minutes elapsed:

F (distance, minutes) =

(

distance,

distance +

distance

minutes ,

distance < 10

distance ≥ 10

This function awards speed only for runners who are stable

enough to advance past the 10-meter mark without falling.

Applying the speed bonus in units of meters per minute

ensures that speed and distance are on roughly the same scale.

Unless otherwise noted, this is the fitness function used for all

experiments in this paper.(a) Single-point “cut-and-splice” crossover

(b) Two-Point crossover

Fig. 6: Crossover operations

D. Initialization

We use two different initialization strategies in this paper.

The first is random initialization. For representations 1, 2, and

3, an individual is assigned a random genome by first sampling

an integer k ∈ [20, 40], then choosing k letters from the

representation’s alphabet. Representation 4 follows the above

process, then randomly assigns a duration for each character

by sampling from the normal distribution N (150, 25).

During experimentation, we noticed that nearly all

randomly-generated QWOP gaits immediately fail. What’s

more, an overwhelming majority of the children created by

the crossover of unfit parents also immediately fail. Thus an

initial population with no stable individuals sometimes fails

to improve within a reasonable number of generations. Many

GAs that used randomly generated initial populations were

unable to evolve an individual that advanced more than one

meter before falling.

To solve this problem, we introduced a second initialization

strategy. We seed the population by first generating a large

pool of M randomly-initialized individuals. Each individual in

the pool is assigned a temporary fitness equal to the number of

meters that individual is able to run within a 20-second time

limit. The fittest N individuals are selected to be the initial

population for the genetic algorithm. We refer to this strategy

as “Seeding M,N initialization”.

E. Selection and Replacement

To select parents, we used tournament selection with tour-

naments with size equal to 4 1 of the population size. When

selecting individuals for replacement, we randomly chose one

of the worst five members of the population. This greedy

approach was chosen for its simplicity, elitism, and quick

convergence.

F. Crossover

Except where otherwise noted, crossover was applied with

a 90% probability. In the cases where crossover was not

performed, the two parents were carried forward as if they

were children. If crossover was to be performed, we applied

either single-point “cut-and-splice” crossover or two-point

crossover with equal probability. Figure 6 shows a visual

representation of these two operations.

G. Mutation

Except where otherwise noted, mutation was applied to each

child’s genome with a 15% probability. When mutation was

to be performed, we applied one of the following operations,

selected with equal probability:

1) Uniform replacement: change one of the characters in

the sequence to another randomly selected character

from the relevant alphabet

2) Insertion: randomly select a character from the relevant

alphabet and insert it at a randomly-selected position in

the genome

3) Swap: randomly select two characters in the sequence

and swap their position

4) Deletion: remove a randomly-selected character from the

sequence

For representation 4, we also applied a Gaussian creep

mutation to the duration of a randomly selected (bitmask,

duration) pair from the sequence.

V. E XPERIMENTS

The experiments we describe are designed to gauge the

effectiveness of differing representations, initialization strate-

gies, dynamic parameter control mechanisms, and various

types of genetic algorithm. Because genetic algorithms are

stochastic in nature, each experiment consists of several trials.

Each trial starts with a freshly initialized population and runs

until reaching 1000 fitness evaluations. In each trial, fitness

evaluation for a each individual was subject to a time limit

of 45 seconds. Unless otherwise noted, each trial uses the

crossover and mutation operators described in Sections 4.6

and 4.7 with crossover probability fixed at 90% and mutation

probability fixed at 15%.

The bottleneck in our algorithmic pipeline is fitness eval-

uation. The average time required to run a single trial for

1000 fitness evaluations was nearly six and a half hours.

Furthermore, trials cannot be run in parallel because Totter

requires control of the keyboard and mouse while running each

trial. Because of the intense amount of time required for each

trial, experiments were limited to five trials each. As such, the

results we obtain should be viewed as preliminary and subject

to more rigorous statistical significance testing in the future.

For each experiment, we report the mean best fitness (MBF),

standard deviation in MBF (α M BF ), and mean average fitnessFig. 7: Fitness improvement for various representations Fig. 8: Fitness improvement with varying M

(MAF) achieved by each experimental configuration across all

trials. Figures 7 - 10 plot MBF versus the number of fitness

evaluations. The vertical lines in Figures 7 - 10 correspond to

standard deviation in MBF across trials. The length of each

vertical line equals the standard deviation in MBF measured

at that particular generation. The shape of each improvement curve is notable. The

Keystroke GA improved quickly but “saturated” after 600

fitness evaluations. The Bitmask GA improved at a slower

rate but did not saturate. It may be the case that the Bitmask

GA would attain a higher fitness than the Keystroke GA if it

were allowed to run for more than 1000 generations.

A. Representations

The first experiment tests the effectiveness of each repre-

sentation. We vary representation while holding all other GA

parameters constant. In each trial, we evolved populations of

size 25 using a steady-state genetic algorithm. The population

was initialized using Seeding 500,25 initialization as described

in Section IV-D. Figure 7 and Table II summarize the results

of Experiment 1.

TABLE II: Results for Experiment 1

Representation

Keystroke

Keyup-Keydown

Bitmask

Bitmask-Duration

MBF

43.90

1.0

34.92

1.32

MAF

36.29

0.93

26.78

0.88

α M BF

1.83

0.0

20.42

0.72

The GAs using Keyup-Keydown representation and

Bitmask-Duration representation fail to evolve solutions which

are better than the best randomly-generated solutions. Both

of these representations expand the search space to allow

for variable-length holding of control states. We suspect that

the GAs used for this experiment were unable to explore

this expanded search space within 1000 fitness evaluations.

These representations may do better when combined with

more greedy operators or when initialized with a larger seeding

pool.

By contrast, the GAs using Keystroke representation and

Bitmask representation evolved populations that achieve re-

spectable speed and stability. The Keystroke representation

more reliably produced good runners, achieving both the

highest MBF and MAF. From these results we can surmise

that the ability to press multiple keys at once is relatively

unimportant. The representation that achieved the best fitness

is the one which most simplifies the search space.

B. Initial Population Seeding

Our second experiment tests the effectiveness of the

Seeding M,N initialization mechanism described in Section

IV-D. We compare the results of random initialization with the

results obtained using Seeding M,N initialization with the fol-

lowing values of M : 50, 250, 500, 1000. This experiment ran

steady-state genetic algorithms using Bitmask representation.

Figure 8 and Table III summarize the results of Experiment 2.

TABLE III: Results for Experiment 2

Initialization Strategy

Random

Seeded 50,25

Seeded 250,25

Seeded 500,25

Seeded 1000,25

MBF

1.04

1.02

1.0

34.92

25.14

MAF

0.88

0.90

0.94

26.78

18.24

α M BF

0.09

0.04

0.01

20.42

13.47

These results demonstrate the importance of having better-

than-random individuals in the initial population. When the

population is randomly initialized, or when the population was

seeded from a small pool of random individuals, the GA failed

to improve upon the fitness of the initial population. When the

GA was seeded from a pool of 500 individuals, it was able

to make significant improvements to the best members of the

initial population.

Increasing the size of the seeding pool past 500 seems to

result in diminishing returns. The GA which drew from a

seeding pool of size 1000 often got stuck in a local maximum

and failed to achieve a fitness higher than the GA which used a

smaller seeding pool. It may be the case that greedy selection

in the initialization phase inhibits the GA’s ability to evolve

innovative new solutions.TABLE IV: Results for Experiment 3

Genetic Algorithm

Static

Dynamic M

Dynamic R

Dynamic R-M

Fig. 9: Fitness improvement for the GAs in experiment 3

C. Dynamic Parameter Control

In all experiments considered thus far, parameters such

as the crossover rate, mutation rate, tournament size, and

replacement operator have been kept constant throughout the

run. Experiment 3 tests the effect of dynamically varying

mutation rate and selection pressure with time.

First, we modify the steady-state GA by treating the muta-

tion rate as a function of time. Generally speaking, a have

a high probability of mutation during the early stages of

evolution helps encourage exploration of the fitness landscape.

In the late stages of evolution, a low mutation rate can help

encourage convergence around the best solutions found so

far. We use the following function to set the mutation rate:

m(t) = 1 − (0.85 ∗ T t ), where t is the current generation and

T is the maximum number of generations.

Second, we modify selection pressure as a function of time.

In previous experiments, we have randomly replaced one of

the five worst members of the population. In experiment 3, we

replace this mechanism with a probabilistic, time-dependent

replacement strategy. Let F̄ be the total fitness of the popu-

lation. For an individual x with fitness f x , its probability of

being selected for replacement is P r (x) = (1 − f F̄ x ) α(t) , where

α(t) = 0.5 + 1.5 T t . This replacement strategy is similar to

inverse roulette-wheel selection, except that probabilities are

scaled by a time-dependent factor. Early in the search, α(t)

is less than one, causing selection pressure to be less steep.

α(t) gradually increases as time advances, causing selection

pressure to slowly rise.

We test these dynamic mechanisms by comparing among

the following four genetic algorithms:

• Steady-state GA with static parameters (Static)

• Steady-state GA with dynamic mutation only (Dynamic

• Steady-state GA with dynamic replacement only (Dy-

namic R)

• Steady-state GA using both dynamic replacement and

dynamic mutation (Dynamic R-M)

In each case, we used a population of size 25, Bitmask

representation, and Seeding 500,25 initialization. Figure 9 and

Table IV summarize the results of Experiment 3.

MBF

34.92

31.73

1.4

1.0

MAF

26.78

25.50

0.51

0.34

α M BF

20.42

14.04

0.01

0.02

Interestingly, the genetic algorithms that used the dynamic

replacement mechanism both failed to evolve any better-

than-random individuals. It may be that shallow selection

pressure at the beginning of the search is harmful. Most initial

populations have only one or two individuals who are better-

than-random. If those individuals are replaced early, then the

GA can fail to find any effective runners. This seems to be

what has happened for the GAs who use dynamic replacement.

Dynamic mutation, on the other hand, does not seem to

significantly help or hinder search progress. The GA only

dynamic mutation improved more quickly than the GA with

static parameters, but as the mutation rate declined, perfor-

mance improvement significantly slowed and the static GA

took over. Nevertheless, it seems that the dynamic mutation

GA was able to more effectively explore the search space in

early generations.

D. Comparison of Genetic Algorithm Types

Experiment 4 tests the effectiveness of the various types

of genetic algorithm described in Section II-B. The results of

Experiment 2 suggest that seeding the initial population helps

kickstart evolution but may stall the GA in a local optimum.

To control for this possibility, we run trials with each of the

following configurations:

• Steady-state GA, random initialization (SS-R)

• Steady-state GA, Seeding 500,25 initialization (SS-S)

• Generational GA, random initialization (Gen-R)

• Generational GA, Seeding 500,25 initialization (Gen-S)

• Cellular GA, random initialization (Cell-R)

• Cellular GA, Seeding 500,25 initialization (Cell-S)

In each case, we used a population of size 25 and the

Bitmask representation. Figure 10 and Table V summarize the

results of Experiment 4.

Fig. 10: Fitness improvement for various types of GATABLE V: Results for Experiment 4

GA Strategy

SS-R

SS-S

Gen-R

Gen-S

Cell-R

Cell-S

MBF

1.04

34.92

1.0

19.57

1.0

36.02

MAF

0.88

26.78

0.87

14.67

0.44

7.48

α M BF

0.09

20.42

0.01

22.43

0.02

8.95

The cellular genetic algorithm most consistently yielded

stable and fast gaits. Although MBF is similar between the

steady-state and cellular algorithms, the cellular GA achieved a

much lower standard deviation across trials. This phenomenon

can likely be attributed to the increased diversity maintenance

provided by the cellular paradigm. In the steady-state algo-

rithm, a good individual quickly takes over the population.

Thus, if the steady-state GA finds a local optimum early, it

may converge to that optimum and never escape. By contrast,

the discovery of a good individual takes many generations

to propagate across the grid in the cellular GA. Notably, the

MAF for the steady-state GA is quite high compared to the

cellular GA. This indicates that our hypothesis is correct: by

the final generation, the best individuals have taken over the

steady-state algorithm, but the cellular algorithm continues to

explore the search space.

These results also confirm the findings of Experiment 2.

Seeding the population with better-than-random individuals

is important regardless of the genetic algorithm used. When

random initialization was used, no GA managed to evolve an

individual which was better than-random.

VI. C ONCLUSION

This project experimented with several variants of genetic

algorithm for the task of finding effective gaits for the game of

QWOP. Our experiments suggest that steady-state and cellular

genetic algorithms work well for this problem, particularly

when quick convergence is desirable, but most algorithms need

a population that already contains a few better-than-random

individuals if evolution is to proceed successfully.

Many of the genetic algorithms we tested were able to

evolve very stable gaits that, given enough time, would cer-

tainly complete the game by reaching the 100-meter mark.

The best algorithms managed to find gaits that were both

very stable and relatively fast. The gaits discovered by our

algorithms are similar to the strategy that most novice- and

intermediate-level human players employ. The gait begins

by finding a stable configuration for the runner, usually by

dropping to one knee or by spreading the runner’s legs. The

runner maintains this position while kicking forward with his

frong leg to generate forward momentum. A short video of the

best runners we discovered is available for viewing using the

following link: TODO: youtube video. The Python source code

used to evolve runners and generate the figures in this paper is

available at the following link: https://github.com/zachdj/totter.

Although our algorithm finds relatively good solutions, it

does not find gaits which approach the aptitude of the best

human players. Humans who compete for the world record

employ a realistic sprinting gait that requires careful balance,

impeccable timing, and improvisation. Human-like control of

the runner is unlikely to be achieved by a genetic algorithm

similar to any considered in this paper. A system that can

respond dynamically to sensory feedback would have a better

chance of success than a simple sequence of inputs that is

continuously looped.

Our work could be expanded in several interesting ways.

First, the experiments in this paper only scratch the surface

of the multitude of genetic algorithm variants which could

be brought to bear on this problem. Our experiments also

considered only one type of crossover and mutation operator. It

may be the case that domain-specific operators would exhibit

better performance. As noted by Ray et al. [4], the GA would

almost certainly benefit from a fitness function that provided

more information about the current state of the runner, such

the configuration of his legs, arms, and body relative to the

ground at any point in time.

As previously noted, the immense time required for fitness

evaluation limits both the number of experiments that can be

considered and the number of generations which a GA can

be allowed to run. To find near-optimal gaits, a GA would

likely need months of real-time execution. The temporal costs

of evolution could be dramatically lowered by employing

a surrogate fitness function. It may be possible to build a

simulation of QWOP that mimics the real game closely but

gives the programmer more fine-tuned control over the in-

game clock. This would allow several thousand times more

fitness evaluations per second.

Considering QWOP’s unpredictable ragdoll physics engine

and deviously challenging control scheme, the gaits achieved

by the algorithms in this paper are satisfactory. Our results

can be considered a proof-of-concept in applying evolutionary

computing to challenging games that require careful timing of

controls.

R EFERENCES

[1] Bennett Foddy, ““qwop” [flash game],” 2018. [Online]. Available:

http://www.foddy.net/Athletics.html

[2] Liam

Boylan-Pett,

“This

may

the

hardest

run-

ning

video

game

ever

made,”

2018.

[On-

line].

Available:

https://www.runnersworld.com/news/a20853551/

this-may-be-the-hardest-running-video-game-ever-made/

[3] Clayton Purdom, “Qwop turned failure into comedy and found

viral immortality,” 2018. [Online]. Available: https://www.avclub.com/

qwop-turned-failure-into-comedy-and-found-viral-immorta-1826394272

[4] S. Ray, V. S. Gordon, and L. Vaucher, “Evolving qwop gaits,” in Pro-

ceedings of the 2014 Annual Conference on Genetic and Evolutionary

Computation. ACM, 2014, pp. 823–830.

[5] Roshan Ramachandra, “Fastest 100m run, qwop (flash

game),”

2018.

[Online].

Available:

https://web.archive.org/

web/20130315155736/http://challengers.guinnessworldrecords.com/

challenges/160-fastest-100m-run-qwop-flash-game

[6] H. Muhlenbein and T. Mahnig, “A comparison of stochastic local search

and population based search,” in Proceedings of the 2002 Congress on

Evolutionary Computation. CEC’02 (Cat. No. 02TH8600), vol. 1. IEEE,

2002, pp. 255–260.

[7] J. Clune, B. E. Beckmann, C. Ofria, and R. T. Pennock, “Evolving

coordinated quadruped gaits with the hyperneat generative encoding,”

in 2009 IEEE Congress on Evolutionary Computation. IEEE, 2009,

pp. 2764–2771.[8] J. C. Gallagher, R. D. Beer, K. S. Espenschied, and R. D. Quinn,

“Application of evolved locomotion controllers to a hexapod robot,”

Robotics and Autonomous Systems, vol. 19, no. 1, pp. 95–103, 1996.

[9] V. K. Valsalam and R. Miikkulainen, “Modular neuroevolution for

multilegged locomotion,” in Proceedings of the 10th Annual Conference

on Genetic and Evolutionary Computation. ACM, 2008, pp. 265–272.

[10] R. A. Téllez, C. Angulo, and D. E. Pardo, “Evolving the walking

behaviour of a 12 dof quadruped using a distributed neural architec-

ture,” in International Workshop on Biologically Inspired Approaches to

Advanced Information Technology. Springer, 2006, pp. 5–19.

[11] M. A. Lewis, A. H. Fagg, A. Solidum et al., “Genetic programming

approach to the construction of a neural network for control of a walking

robot,” in IEEE International Conference on Robotics and Automation.

IEEE, 1992, pp. 2618–2623.

[12] G. B. Parker and W. T. Tarimo, “Using cyclic genetic algorithms to

learn gaits for an actual quadruped robot,” in 2011 IEEE International

Conference on Systems, Man, and Cybernetics. IEEE, 2011, pp. 1938–

1943.

[13] G. S. Hornby, S. Takamura, T. Yamamoto, and M. Fujita, “Autonomous

evolution of dynamic gaits with two quadruped robots,” IEEE transac-

tions on Robotics, vol. 21, no. 3, pp. 402–410, 2005.

[14] K. Wolff and P. Nordin, “Evolution of efficient gait with an autonomous

biped robot using visual feedback,” in Proceedings of the Mechatronics

Conference. University of Twente, Enschede, the Netherlands, vol. 10,

no. 1.822. Citeseer, 2002, p. 5922.

[15] R. Zhang, P. Vadakkepat, and C.-M. Chew, “An evolutionary algorithm

for trajectory based gait generation of biped robot,” in Proceedings of

the International Conference on Computational Intelligence, Robotics

and Autonomous Systems, vol. 144. Singapore, 2003.

[16] M.-Y. Cheng and C.-S. Lin, “Genetic algorithm for control design of

biped locomotion,” Journal of Robotic Systems, vol. 14, no. 5, pp. 365–

373, 1997.

[17] L. Yang, C.-M. Chew, A. N. Poo, and T. Zielinska, “Adjustable bipedal

gait generation using genetic algorithm optimized fourier series formula-

tion,” in 2006 IEEE/RSJ International Conference on Intelligent Robots

and Systems. IEEE, 2006, pp. 4435–4440.

[18] G. Dip, V. Prahlad, and P. D. Kien, “Genetic algorithm-based optimal

bipedal walking gait synthesis considering tradeoff between stability

margin and speed,” Robotica, vol. 27, no. 3, pp. 355–365, 2009.

[19] T. Arakawa and T. Fukuda, “Natural motion trajectory generation

of biped locomotion robot using genetic algorithm through energy

optimization,” in 1996 IEEE International Conference on Systems,

Man and Cybernetics. Information Intelligence and Systems (Cat. No.

96CH35929), vol. 2. IEEE, 1996, pp. 1495–1500.

[20] G. Capi, S.-i. Kaneko, K. Mitobe, L. Barolli, and Y. Nasu, “Optimal

trajectory generation for a prismatic joint biped robot using genetic

algorithms,” Robotics and Autonomous Systems, vol. 38, no. 2, pp. 119–

128, 2002.

[21] J. H. Holland, Adaptation in natural and artificial systems: an intro-

ductory analysis with applications to biology, control, and artificial

intelligence. MIT press, 1992.

[22] V. S. Gordon and D. Whitley, “Serial and parallel genetic algorithms as

function optimizers,” in ICGA, 1993, pp. 177–183.

[23] Zach Jones, “Totter: Genetic algorithms for qwop,” 2018. [Online].

Available: https://github.com/zachdj/totter

[24] Selenium HQ, “Selenium browser automation tool,” 2018. [Online].

Available: https://www.seleniumhq.org/

[25] Al Sweigart, “Pyautogui,” 2018. [Online]. Available: https://pyautogui.

readthedocs.io/en/latest/

[26] Stefan Weil, Egor Pugin, Z Denop, and Ray Smith, “Tesseract:

Optical character recognition,” 2018. [Online]. Available: https:

//github.com/tesseract-ocr/tesseract