Erik's blog: Taking Copilot to difficult terrain

Taking Copilot to difficult terrain

09 June 2023

Posted in Coding

Logo of the Google Open Source Peer Bonus program

Seeing all the experiments with GitHub Copilot around me I decided to take Copilot to some difficult terrain (pun intended, see below). A lot of the positive experience is reported with very common programming languages, JavaScript and Python especially, writing code related to web applications. But how would Copilot fare with a less common language and code that’s involving more complicated data structures? To find out I turned to Crellinor, my genetic programming / artificial life simulator written in Rust, which I have talked about before; and I set out to fix some todo’s in the Terrain class.

Warm-up

Before trying something complicated, I used Copilot to write a function to square the argument passed in. I typed pub fn square and asked Copilot to do the rest.

pub fn square  ➤➤  <T>(x: T) -> T
    where T: Copy + Mul<Output = T>  {
    x * x
}

Here and in the following examples ➤➤ marks the point where I invoked Copilot. In this case it wrote the function exactly like it needed to be.

Verdict: Somewhat uncommon language, textbook problem, and Copilot shines.

Writing an entire test

Next up were two methods that I knew didn’t have unit tests. Copilot is often said to be helpful when writing tests for existing code. So, let’s see what happens.

The first is a method to return a random position in the terrain. (The terrain is a square with square fields.)

pub fn rand_pos(&self, rng: &mut RNG) -> (u32, u32) {
    (rng.next_u32(self.size), rng.next_u32(self.size))
}

The method returns the position as a tuple of two integers of type u32. Following good practice, the function takes a random number generator (RNG) as an argument, so that the RNG can be stubbed in tests in order to avoid non-deterministic behaviour.

First attempt at generating the full test

#[test]
fn rand_pos_returns_random_positions()  ➤➤  {
    let t = Terrain::with_size(10);
    let mut rng = RNG::new(123);
    let mut seen = vec![false; 100];
    for _ in 0..100 {
        let p = t.rand_pos(&mut rng);
        seen[t.pos_to_idx(p)] = true;
    }
    assert!(seen.iter().all(|&b| b));
}

The first line is okay. It creates a 10x10 terrain. The next line make sense, too. We need a random number generator. But there is a small problem already: the RNG’s constructor doesn’t take an argument. There’s a different method to pass a seed, but the new method definitely should not be given an argument. This will result in a compile time error, and the problem is easy to spot and fix for a programmer.

The next few lines are interesting. First off, an array (vector) is created. Its size is 100, which matches the overall number of fields in the terrain. Not sure whether that’s a coincidence. Then, the code runs 100 iterations of a loop. Each time it asks for a random position and uses a method from the codebase to convert the two coordinates to an index. (The pos_to_index function simply returns x * size + y.) The generated test code then sets the corresponding entry in the array to true, basically remembering which positions the method under test has returned.

At the end it uses all to assert that all elements in the array were set to true or, in other words, that each position was returned. Given that it doesn’t stub the random values, there’s practically no chance that the real random number generator will actually return all 100 positions in 100 tries.

Verdict: Copilot picks up sensible pieces from the codebase. It does something with intent. But what it does, doesn’t make sense as a test.

Second attempt

Copilot can provide alternative suggestions. In this case there was one alternative:

#[test]
fn rand_pos_returns_random_positions()  ➤➤  {
    let t = Terrain::with_size(10);
    // this test is not deterministic, but it should pass most of the time
}

Same first line but then… I can only assume the following: Copilot notices that a random number generator is used, and the training set for Copilot contains many badly written unit tests for code with random numbers, which have comments like the one Copilot generated here. There was no further code generated, which means that the comment is actually wrong, but that’s beside the point, I guess.

Verdict: Garbage in, garbage out?

Providing a bit more context manually

Now, getting Copilot to write the full test was clearly too ambitious. So I started with providing more context. I’m also adding a comment to instruct Copilot what to do next, namely to set up the random number generator to stub two values, knowing that the method under test will consume two values.

#[test]
fn rand_pos_returns_random_positions() {
    let t = Terrain::with_size(10);
    let mut rng = RNG::new();
    // set two stubbed values in RNG and assert that position matches those values
    ➤➤
    rng.stubbed_values = vec![0.1, 0.2];
}

Unfortunately, somehow the types get mixed up, and it tries to set the values on a field directly. There are two problems with this: the field’s name is actually stubbed_seq and not stubbed_values, and the values are wrapped in an option, which means even if the name were right, and the values were integers, the code still wouldn’t compile.

Knowing that the method to stub values is called set_next_values I tried the following:

    // set next values in RNG for random position.
    ➤➤
    rng.set_next(0.0);
}

A bit closer, but the method name is wrong, and it still passes a floating point number.

Providing even more context manually

As I couldn’t get Copilot to set up the stubbed values, I wrote that line myself, too, and then asked it to complete the rest.

#[test]
fn rand_pos_returns_random_positions() {
    let t = Terrain::with_size(10);
    let mut rng = RNG::new();
    rng.set_next_values(&[2, 3]);  ➤➤
    assert_eq!((2, 3), t.rand_pos(&mut rng));
}

This is exactly the assert I would have written, too. We’re stubbing 2 and 3 as the next “random” values, knowing that the method under test will take these two random numbers to create the random pos. Copilot gets this completely right. I write three lines; it writes the fourth.

Verdict: It seems Copilot can “understand” complex code, but it can’t generate complex code (or the test name and the hints were too vague).

Writing a test for a more complicated method

Next up was another untested method. This one builds on the one discussed so far. It tries to find a random field in the terrain that is not occupied by a creature.

pub fn rand_free_pos(&self, rng: &mut RNG) -> Option<(u32, u32)> {
    for _ in 0..20 {
        let p = self.rand_pos(rng);
        if self.creature_at(p).is_none() {
            return Some(p);
        }
    }
    None
}

The code simply makes 20 attempts. Each time it creates a random position and checks whether there is a creature at the position. If there’s no creature it returns the position. If after 20 attempts it hasn’t succeeded the method returns None to indicate that there is no free position.

Of course, there are “more correct” versions to implement this but in the real simulation less than 10% of the fields are occupied, which means that this simplistic implementation works well enough.

Generating the full test

After the experience with the previous method I wasn’t too hopeful but I just had to try:

#[test]
fn rand_free_pos_returns_none_if_all_occupied  ➤➤  () {
    let params = Params::for_testing();
    let mut t = Terrain::with_size(10);
    t.set_creature_at(Some(Creature::new(vec![NOP], &params)), (1, 0));
    t.set_creature_at(Some(Creature::new(vec![NOP], &params)), (2, 1));
    t.set_creature_at(Some(Creature::new(vec![NOP], &params)), (1, 2));
    t.set_creature_at(Some(Creature::new(vec![NOP], &params)), (0, 1));
    let mut rng = RNG::new();
    rng.set_next_values(&[2, 3]);
    assert_eq!(None, t.rand_free_pos(&mut rng));
}

This surprised me. The structure of the test makes a lot of sense, and it contains all the elements required.

That said, it doesn’t work as it should. In detail: In the first two lines it creates the terrain and an instance of the parameters object. It even uses the for_testing() constructor for the parameters. It then creates four creatures (with correct arguments!) and places them onto the terrain at different and valid positions. After these lines we have a 10x10 terrain with four creatures on it. Next, it creates the RNG and uses the method from the previous test to provide stubbed values for one position.

Then it asserts that rand_free_pos returns None, which will obviously fail. The method under test will first consume the stubbed values and check position (2, 3), which is unoccupied, and therefore will return it.

Verdict: Near perfect structure. Could be turned into a very sensible test with a few tweaks.

Generating the test in smaller steps

Now I wanted to see how Copilot would fare with more guidance.

#[test]
fn rand_free_pos_returns_none_if_all_occupied() {
    let params = Params::for_testing();
    let mut t = Terrain::with_size(2);

I liked the idea of using four creatures to fill up the terrain, but to fill up the terrain with four creatures the terrain must be 2x2.

    ➤➤
    t.set_creature_at(Some(Creature::new(vec![NOP], &params)), (0, 0));

Asking for a completion resulted in only the line shown above. It creates a creature and places it at a valid position.

    ➤➤
    t.set_creature_at(Some(Creature::new(vec![NOP], &params)), (1, 0));

Next up it creates another creature at another valid position.

    ➤➤
    t.set_creature_at(Some(Creature::new(vec![NOP], &params)), (0, 1));

And another creature at another valid position.

    ➤➤
    t.set_creature_at(Some(Creature::new(vec![NOP], &params)), (1, 1));
    let mut rng = RNG::new();
    rng.set_next_values(&[0, 0, 0, 0]);
    assert_eq!(None, t.rand_free_pos(&mut rng));
}

This surprised me again. Breaking from the pattern, Copilot not only generates the code to create a creature at the fourth and last valid position, it completes the test in a meaningful way. This test will pass, and it does roughly what it should.

There’s only one wrinkle: the RNG is stubbed with values for two positions, (0, 0) twice. The method under test will consume those and test those. It will then use 18 truly random positions. In either case – because all positions are occupied – this doesn’t matter, and chances are that all four positions are checked.

Ideally, I would have stubbed the RNG values as follows, to make sure that all four positions are definitely tried, but that was an easy change to make.

    rng.set_next_values(&[0, 0, 0, 1, 1, 0, 1, 1]);

Verdict: Definitely helpful.

Summary

This is an excerpt of a longer session that exemplifies my experience, which was a bit of a rollercoaster ride. Some things just worked, others failed miserably, and sometimes I was truly surprised. Overall, though, even for a not-so-common programming language like Rust, with a codebase that uses more complicated data structures I found Copilot helpful.