Introduction#

What Is Numerical Software#

ENIAC, the first electronic digital computer, was invented to quickly obtain artillery range tables. The history of the computers started with solving numerical problems. Taking advantage of the speed of light, the machines are used for a huge amount of calculation that cannot be done otherwise. Admittedly, computers are now much more widely used for its versatility for information technology, but its use for mathematics and science remains equally if not more important.

The general flow of making software to solve the mathematical and scientific problems is to translate the mathematics to numerical methods, and the numerical methods to computer code. The former two disciplines are much more established than the latter. But we do not really have universal laws to guide the code develop, for which software engineering is required to reliably produce trustworthy results. What we have are the “best practices”, which are very different from how science works. This is the setting for numerical software, which is also known as technical software, scientific software, engineering software, etc.

Numerical software always has an application domain attached, and cannot be handled solely in computer science. But it cannot exist without computer science, since it uses computers anyway. Naturally it is cross-discipline and requires knowledge and skills in two or more fields from the practitioners. Some famous open-source scientific software systems:

Infrastructure: NumPy
Data analytics: Pandas, Arrow, PyTables
Linear algebra: BLAS, LAPACK
Geometry: boost.geometry
Visualization: VTK, Matplotlib
Machine learning: PyTorch

Despite the versatility, numerical software shares common traits:

Not visually pleasant, oftentimes no graphical user interface
Knowledge-intensive, unintuitive to code
Computation-intensive, often incorporating parallelism, distributed computing, and special hardware

Why Develop Numerical Software#

Numerical software is developed to solve problems that are impracticable without it. The solution allows us to study the impracticable problems, e.g., those in the fields of fluid dynamics and astrophysics. The software may significantly reduce the cost to solution, so that industrial products can be derived from the solution, e.g., machine learning, visualization, communication, etc.

Note

Like developing any software, the true driver must be identified so that the system can be properly specified.

A pattern is usually found in the development of numerical software:

Observation
Generalize to a theory in math
Obtain analytical solutions for simple setup
Get stuck with complex setup
Numerical analysis comes to rescue
… a lot of code development …
Release a software package

Hybrid Architecture#

A hybrid architecture is employed to achieve the highest-possible performance and the most flexible computation that are simultaneously required by numerical software. The system is composed of a fast, low-level computing engine and an easier-to-use, high-level scripting layer. It is usually developed as a platform, working like a library that provides data structures and helpers for problem solving. The users will use a scripting engine it provides to build applications. Assembly is allowed in the low-level computing engine to utilize every drop of hardware: multi-core, multi-threading, cache, vector processing, etc.

Note

Performance and flexibility are usually mutually exclusive. The hybrid architecture achieves both in the same time by sacrificing ease of use.

As such, numerical software rarely provides decent graphical user interface (GUI). Using numerical software is like using a library, rather than a off-the-shelf software product. Even if you do purchase a commercial software package for numerical calculation, the full capabilities need to be accessed by scripting.

A general description of the architecture is like the following layers, from high-level to low-level:

External result
- This is presented in a non-technical way to people outside the problem-solving team. They can be stakeholders for business or general public. The result has to be generated in some way, which may or may not be included in the numerical software we make.
Problem presentation: physics, math, or equations
- Users use the software or associated tools to present the technical result.
Scripting or configuration
- Users follow the example scripts to configure the problems to solve. Configuration files may also be used.
Library interface
- This defines the application programming interface (API) for the numerical software. Scripts should not touch anything below this layer.
Library structure
- This is where we architect the software. Good book-keeping code is here to separate the interface and the computing kernel. Data structures are designed at this layer to make sure no time is wasted in copying or converting data.
Computing kernel
- This is the place the does the heavy-lifting, and where we do most of the optimization.

Pattern 1: Research Code#

For a research code, the boundary between external result, problem presentation, and scripting, and that between library interface, library structure, and computing kernel, may be less clear. The architecture is usually like:

Problem presentation: high-level description, physics, and scripting / code configuration
Library implementation

But sometimes if we don’t pay attention to architecture, there may be no boundary between anything.

Pattern 2: Full-Fledged Application#

For a commercial grade package, each of the layers will include more sub-layers. It is a challenge to prevent those layers or sub-layers from interweaving. From users’ point of view, the sophistication appears in the problem presentation and the scripting layers. Developers, on the other hand, take care of everything below problem presentation, so that users can focus on problem solving.

Pattern 3: Scripting for Modularization#

At this point, it should be clear that the scripting layer is the key glue in the system architecture. The high-level users, who use the code for problem solving, wouldn’t want to spend time in the low-level implementation. Instead, they will specify the performance of the API exposed in the scripting layer. The performance may be about the quality of result and runtime (including memory).

The scripting layer can separate the programming work between the high-level problem presentation and the low-level library implementation. A scripting language is usually dynamically typed, while for speed, the low-level implementation language uses static typing system. In the dynamic scripting language, unit-testing is required for robustness. In a statically typed language like C++, the compiler and static analyzers are very good at detecting errors before runtime. But the great job done by the compiler makes it clumsy to use C++ to quickly write highly flexible code for problem presentation.

It is tempting to invent one programming language to rule them all. That approach needs to convince both the high-level problem solvers and the low-level implementers to give up the tools they are familiar with. The new language will also need to provide two distinct styles for both use cases. It will be quite challenging, and before anyone succeeds with the one-language approach, we still need to live with a world of hybrid systems.

Numerical Software = C++ + Python#

The key to a successful numerical software system is make it uncompromisingly fast and extremely flexible. It should be flexible enough so that users, i.e., scientists and engineers, can easily write lengthy programs to control everything. It should be noted that, although the users program in the system, they by no means know about computer science.

Not all programming languages can meet the expectation. To this point, the most suitable scripting language is Python, and the most suitable low-level language may be C++. The choice of C++ can be controversial, but considering the support it received from the industry, it’s probably difficult to find another language of higher acceptance. Our purpose here is to introduce the skills for developing numerical software, not to analyze programming languages. We will focus on C++ and Python.

More Reasons to Use Python

Python provides a better way to describe the physical or mathematical problem.
Python can easily build an even higher-level application, using GUI, scripting, or both.
Is there alternative for C++? No. For Python? Yes. But Python is the easiest choice for its versatility and simplicity.
A numerical software developer sees through the abstraction stack:
- The highest-level application is presented as a Python script.
- The Python script drives the number-crunching C++ library.
- C++ is the syntactic sugar for the machine code.