Pragmatic Unit Testing for Scientific Codes

It is widely accepted that unit tests are a useful tool to ensure high quality and maintainable software. Some of the main benefits are:

Increasing robustness of code
Improving ease and efficiency of later reworks
Improving modularity and well-defined interfaces

However, doing unit tests on scientific code is not always so simple, nor is it so popular. In my experience of the countless unit testing examples available for non-scientific applications, the unit to be tested is usually something with obvious right and wrong answers to a human. This is quite common for codes that are designed to automate rather than simulate.

A simulation code often numerically analyses something that it would be impractical for a human to calculate themself in any kind of reasonable time. Take for example a unit code that finds the root of a numerical function; it is perhaps not obvious to say what the root is given a certain set of inputs, although there's a good chance the scientist or engineer will know an incorrect result when they see one. This kind of behaviour leads scientists or engineers to say things like "scientific code is not suitable for unit testing" or "unit testing brings so little value in checking for good results I'd rather spend my time on validation".

However, I believe that it is essential for a modern scientific code to be well unit tested for the reasons stated in the first paragraph. The question really is "what can we do to bring more value to scientific unit testing given that it's tricky"!

I'm going to suggest now two types of test we can do on a unit of scientific code, which if well implemented can give us high probability that it is working correctly. The two types of test are:

Special Cases
Trends

I will use as an example a function that calculates the gravitational force between two bodies given their masses and the distance between them (Newton's law of universal gravitation is the obvious implementation but as a tester we are not interested in the implementation, only the results it gives).

Special Cases

Here, we want to test the return value of the function in certain special cases where the result is evident (at least to someone who understands the science of the code they're writing!). In the case, of a gravitational force, there are at least two clear special cases:

Test that when either of the masses is zero, the force is also zero.
Test what happens when the distance is zero. Depending on your case, this might either be to throw an exception or error or to return infinity.

You may also want to test that a certain exception is thrown when the distance is negative.

A temptation can be to also test the result in a few non-special cases (e.g. by making up two masses and a distance) by calculating it yourself manually. I am advocating here that you do not do this. This is mainly because I don't believe it's very effective at catching incorrect implementation. For example if you have got the equation or methodology wrong then testing your function by manually calculating according to the wrong function or methodology is going to lead to a false positive. We want our tests to be independent of the implementation. A better approach is to implement trend testing.

Trend Testing

Once we have tested our function for a few special cases, we now need to test that it follows the expected form in the domain between these special values. For example, in the case of the gravitational force function I would test the following:

Result is linear (and monotonically increasing) as a function of each input mass.
Result monotonically decreases with distance.
Result asymptotically tends towards zero as the distance goes large.

By numerical evaluation, you cannot test these trends for the full infinitesimal resolution of floating point values, but testing for a few values is probably going to give you reasonable certainty. For example, to test the effect of mass you could test that doubling the mass doubles the force, or even set up a loop to test that the mass keeps doubling or tripling or whatever factor of your choosing.

Re-implementing trend tests for every unit that you need to test could be quite time consuming. For this reason, Bitbloom wrote our own library PhySpecs in C++ to provide reusable and robust trend measures to make trend testing quick and effective. This includes tests for the following trends and properties:

Polynomial order and coefficients (of which linearity is a special case)
Monotonicity
Asymptotes (including singularities)

Contact me at philip@bitbloom.tech if you would like to find out more about our PhySpecs library.

Conclusion

Hopefully I have managed to demonstrate that testing of scientific functions can be quick and powerful in testing the building blocks of which a complex scientific code is built. Of course, these methods don't provide 100% certainty in guaranteeing correct results, but they get us a lot closer than not doing these unit tests and will reduce the time spent finding and fixing bugs in the light of validation or other higher-level tests.

Philip Bradstock

Co-founder, CSO

Philip BradstockFebruary 2, 2021blog, unit testing, scientific code