Posted in Coding
Before trying something complicated, I used Copilot to write a function to square the argument passed in. I typed
pub fn square and asked Copilot to do the rest.
Here and in the following examples
➤➤ marks the point where I invoked Copilot. In this case it wrote the function exactly like it needed to be.
Verdict: Somewhat uncommon language, textbook problem, and Copilot shines.
Writing an entire test
Next up were two methods that I knew didn’t have unit tests. Copilot is often said to be helpful when writing tests for existing code. So, let’s see what happens.
The first is a method to return a random position in the terrain. (The terrain is a square with square fields.)
The method returns the position as a tuple of two integers of type
u32. Following good practice, the function takes a random number generator (RNG) as an argument, so that the RNG can be stubbed in tests in order to avoid non-deterministic behaviour.
First attempt at generating the full test
The first line is okay. It creates a 10x10 terrain. The next line make sense, too. We need a random number generator. But there is a small problem already: the RNG’s constructor doesn’t take an argument. There’s a different method to pass a seed, but the
new method definitely should not be given an argument. This will result in a compile time error, and the problem is easy to spot and fix for a programmer.
The next few lines are interesting. First off, an array (vector) is created. Its size is 100, which matches the overall number of fields in the terrain. Not sure whether that’s a coincidence. Then, the code runs 100 iterations of a loop. Each time it asks for a random position and uses a method from the codebase to convert the two coordinates to an index. (The
pos_to_index function simply returns x * size + y.) The generated test code then sets the corresponding entry in the array to true, basically remembering which positions the method under test has returned.
At the end it uses
all to assert that all elements in the array were set to true or, in other words, that each position was returned. Given that it doesn’t stub the random values, there’s practically no chance that the real random number generator will actually return all 100 positions in 100 tries.
Verdict: Copilot picks up sensible pieces from the codebase. It does something with intent. But what it does, doesn’t make sense as a test.
Copilot can provide alternative suggestions. In this case there was one alternative:
Same first line but then… I can only assume the following: Copilot notices that a random number generator is used, and the training set for Copilot contains many badly written unit tests for code with random numbers, which have comments like the one Copilot generated here. There was no further code generated, which means that the comment is actually wrong, but that’s beside the point, I guess.
Verdict: Garbage in, garbage out?
Providing a bit more context manually
Now, getting Copilot to write the full test was clearly too ambitious. So I started with providing more context. I’m also adding a comment to instruct Copilot what to do next, namely to set up the random number generator to stub two values, knowing that the method under test will consume two values.
Unfortunately, somehow the types get mixed up, and it tries to set the values on a field directly. There are two problems with this: the field’s name is actually
stubbed_seq and not
stubbed_values, and the values are wrapped in an option, which means even if the name were right, and the values were integers, the code still wouldn’t compile.
Knowing that the method to stub values is called
set_next_values I tried the following:
A bit closer, but the method name is wrong, and it still passes a floating point number.
Providing even more context manually
As I couldn’t get Copilot to set up the stubbed values, I wrote that line myself, too, and then asked it to complete the rest.
This is exactly the assert I would have written, too. We’re stubbing 2 and 3 as the next “random” values, knowing that the method under test will take these two random numbers to create the random pos. Copilot gets this completely right. I write three lines; it writes the fourth.
Verdict: It seems Copilot can “understand” complex code, but it can’t generate complex code (or the test name and the hints were too vague).
Writing a test for a more complicated method
Next up was another untested method. This one builds on the one discussed so far. It tries to find a random field in the terrain that is not occupied by a creature.
The code simply makes 20 attempts. Each time it creates a random position and checks whether there is a creature at the position. If there’s no creature it returns the position. If after 20 attempts it hasn’t succeeded the method returns
None to indicate that there is no free position.
Of course, there are “more correct” versions to implement this but in the real simulation less than 10% of the fields are occupied, which means that this simplistic implementation works well enough.
Generating the full test
After the experience with the previous method I wasn’t too hopeful but I just had to try:
This surprised me. The structure of the test makes a lot of sense, and it contains all the elements required.
That said, it doesn’t work as it should. In detail: In the first two lines it creates the terrain and an instance of the parameters object. It even uses the
for_testing() constructor for the parameters. It then creates four creatures (with correct arguments!) and places them onto the terrain at different and valid positions. After these lines we have a 10x10 terrain with four creatures on it. Next, it creates the RNG and uses the method from the previous test to provide stubbed values for one position.
Then it asserts that
None, which will obviously fail. The method under test will first consume the stubbed values and check position (2, 3), which is unoccupied, and therefore will return it.
Verdict: Near perfect structure. Could be turned into a very sensible test with a few tweaks.
Generating the test in smaller steps
Now I wanted to see how Copilot would fare with more guidance.
I liked the idea of using four creatures to fill up the terrain, but to fill up the terrain with four creatures the terrain must be 2x2.
Asking for a completion resulted in only the line shown above. It creates a creature and places it at a valid position.
Next up it creates another creature at another valid position.
And another creature at another valid position.
This surprised me again. Breaking from the pattern, Copilot not only generates the code to create a creature at the fourth and last valid position, it completes the test in a meaningful way. This test will pass, and it does roughly what it should.
There’s only one wrinkle: the RNG is stubbed with values for two positions, (0, 0) twice. The method under test will consume those and test those. It will then use 18 truly random positions. In either case – because all positions are occupied – this doesn’t matter, and chances are that all four positions are checked.
Ideally, I would have stubbed the RNG values as follows, to make sure that all four positions are definitely tried, but that was an easy change to make.
Verdict: Definitely helpful.
This is an excerpt of a longer session that exemplifies my experience, which was a bit of a rollercoaster ride. Some things just worked, others failed miserably, and sometimes I was truly surprised. Overall, though, even for a not-so-common programming language like Rust, with a codebase that uses more complicated data structures I found Copilot helpful.