Skip to content

RFC: a convention for error handling #25

@shailesh1729

Description

@shailesh1729

This RFC proposes a convention for structuring methods in SciRust
which can cater to the conflicting needs of efficiency, easy of use
and effective error handling.

For the impatient:

// Efficient access without bound checks
unsafe fn get_unchecked(&self, r : usize, c : usize) -> T;
// Safe access with bound checks, raises error if invalid address
fn get_checked(&self, r : usize, c : usize) -> Result<T, Error>;
// User friendly version. Panics in case of error
fn get(&self, r : usize, c : usize) -> T;

// Efficient modification without bound checks
unsafe fn set_unchecked(&mut self, r : usize, c : usize, value : T);

// Safe modification with bound check
fn set(&mut self, r : usize, c : usize, value : T);

Detailed discussion

The audience of SciRust can be possibly divided into
two usage scenarios.

  • A script style usage, where the objective is to quickly
    do some numerical experiment, get the results and analyze them.
  • A library development usage, where more professional libraries
    would be built on top of fundamental building blocks provided
    by SciRust (these may be other modules shipped in SciRust itself).

While the first usage scenario is important for getting new users hooked
to the library, the second usage scenario is also important for justifying
why Rust should be used for scientific software development compared
to other scientific computing platforms.

In context of the two usage scenarios, the design of SciRust has three conflicting goals:

  • Ease of use
  • Efficiency
  • Well managed error handling

While ease of use is important for script style usage,
efficiency and well managed error handling are important
for serious software development on top of core components
provided by SciRust.

We will consider the example of a get(r,c) method
on a matrix object to discuss these conflicting goals.
Please note that get is just a representative method
for this discussion. The design ideas can be applied in
many different parts of SciRust once accepted.

If get is being called in a loop, usually the code
around it can ensure that the conditions for accessing
data within the boundary of the matrix are met correctly.
Thus, a bound checking within the implementation of get
is just an extra overhead.

While this design is good for writing efficient software,
it can lead to a number of memory related bugs and goes
against the fundamental philosophy of Rust (Safety first).
There are actually two different options for error handling:

  • Returning either Option<T> or Result<T, Error>.
  • Using the panic mechanism.

Option<T> or Result<T, Error> provides the users a
fine grained control over what to do when an error occurs.
This is certainly the Rusty way of doing things. At the
same time, both of these return types make the user code
more complicated. One has to add extra calls to .unwrap()
even if one is sure that the function is not going to fail.

Users of scientific computing tend to prefer an environment
where they can get more work done with less effort. This is
a reason of the success of specialized environments like
MATLAB. Open source environments like Python (NumPy, SciPy)
try to achieve something similar.

While SciRust doesn't intend to compete at the level of
simplicity provided by MATLAB/Python environments, it does
intend to take an extra effort wherever possible to address
the ease of use goal.
In this context, the return type of a getter should
be just the value type T. This can be achieved
safely by using a panic if the access boundary
conditions are not met.

The discussion above suggests up to 3 possible ways of
implementing methods like get.

  • An unchecked (and unsafe) version for high efficiency code
    where the calling code is responsible for ensuring that
    the necessary requirements for correct execution of the
    method are being met.
  • A safe version which returns either Option<T> or
    Result<T, Error> which can be used for professional
    software development where the calling code has full control
    over error handling.
  • Another safe version which panics in case of error but provides
    an API which is simpler to use for writing short scientific
    computing scripts.

Proposed convention

We propose that a method for which these variations
need to be supported, should follow the convention defined below:

  • A method_unchecked version should provide basic implementation
    of the method. This should assume that necessary conditions
    for successful execution of the methods are already being
    ensured by the calling code. The unchecked version of method
    MUST be marked unsafe. This ensures that the calling code
    knows that it is responsible for ensuring the right conditions
    for calling the unchecked method.
  • A method_checked version should be implemented on top of
    a method_unchecked method. The checked version should
    check for all the requirements for calling the method safely.
    The return type should be either Option<T> or
    Result<T, Error>. In case the required conditions for
    calling the method are not met, a None or Error
    should be returned. Once the required conditions are met,
    method_unchecked should be called to get the result
    which would be wrapped inside Option or Result.
  • A method version should be built on top of method_checked version.
    It should simply attempt to unwrap
    the value returned by method_checked and return as T.
    If method_checked returns an error or None, this version
    should panic.

First two versions are suitable for professional development
where most of the time we need a safe API while at some times
we need an unsafe API for efficient implementation.
The third version is suitable for script style usage scenario.

The convention has been illustrated in the three versions of
get at the beginning of this document.

API bloat

While this convention is expected to lead into an API bloat,
but if the convention is followed properly across the library,
then it should be easy to follow (both from the perspective
of users of the library and from the perspective of developers
of the library).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions