Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
121 changes: 17 additions & 104 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,16 @@

File-checking made simple


## Documentation

See the [Read The Docs page](https://checksit.readthedocs.io/en/latest) for more
details on how to install and run checksit.

Visit the [JASMIN help page](https://help.jasmin.ac.uk/docs/software-on-jasmin/community-software-checksit/)
for guidance on how to use checksit on JASMIN.


## Installation

Create a venv, then install, either directly from GitHub:
Expand All @@ -20,113 +30,16 @@ pip install .

## Usage

A brief description of how to use checksit is given here. For more detail, visit the [documentation site](https://checksit.readthedocs.io/en/latest).

checksit is comprised of four key components - [check](#checksit-check), [describe](#checksit-describe), [show-specs](#checksit-show-specs), and [summary](#checksit-summary)

A brief description of how to use checksit is given here. For more detail, visit the
[documentation site](https://checksit.readthedocs.io/en/latest).

## checksit check
### checksit check

Check file against a template.

### Basic Usage
To check a file:

```
checksit check /badc/ukcp18/data/land-cpm/uk/2.2km/rcp85/01/rss/day/latest/rss_rcp85_land-cpm_uk_2.2km_01_day_20671201-20681130.nc
```
* Checks format of file.
* checksit searches its template cache for a similar file to compare against


### Main Features

#### Define template
```
checksit check --template=template-cache/rls_rcp85_land-cpm_uk_2.2km_01_day_19801201-19811130.cdl /badc/ukcp18/data/land-cpm/uk/2.2km/rcp85/01/rss/day/latest/rss_rcp85_land-cpm_uk_2.2km_01_day_20671201-20681130.nc
```
* Use `--template` flag to define a template to use
* Template can be in template-cache or any file user has access to
* Note: cdl files are a representation of a netCDF file, being the output from `ncdump -h` on the netCDF file


#### Map variable names
```
checksit check -m cltAnom=cloud_area_fraction /gws/nopw/j04/cmip6_prep_vol1/ukcp18/data/land-prob/v20211110/uk/25km/rcp85/sample/b8110/30y/cltAnom/mon/v20211110/cltAnom_rcp85_land-prob_uk_25km_sample_b8110_30y_mon_20091201-20991130.nc
```
* Allows mapping of variable name, for the case that the name of a variable is different between the file to be checked and the template
* Format - `-m <template variable name>=<file variable name>`
* Multiple mappings should be comma separated


#### Ignore attributes
```
checksit check --ignore-attrs=global_attributes:time_coverage_start,global_attributes:time_coverage_end,global_attributes:tracking_id /neodc/esacci/sea_ice/data/sea_ice_thickness/L3C/envisat/v2.0/SH/2012/ESACCI-SEAICE-L3C-SITHICK-RA2_ENVISAT-SH50KMEASE2-201202-fv2.0.nc
```
* Define attributes to ignore in checking


#### Define additional rules for checking
```
checksit check --rules=global_attributes:id=rule-func:match-file-name:lowercase:no-extension /neodc/esacci/sea_ice/data/sea_ice_thickness/L3C/envisat/v2.0/SH/2012/ESACCI-SEAICE-L3C-SITHICK-RA2_ENVISAT-SH50KMEASE2-201202-fv2.0.nc
```
* Check items against defined rules
* Format - `<what to check>=<rule type>:<function/check>[:<extras>[:<extras>...]]`
* Four options for `<rule type>`:
* `rule-func` - check item against a defined function, 4 options:
* `match-file-name` - item must be the same as the file name, allowing for formatting through `<extras>` - `lowercase`, `uppercase`, `no_extension` - example: `global_attributes:id=rule-func:match-file-name:lowercase:no-extension`
* `match-one-of` - item must be the same as one of the `<extras>` given. Multiple options should be separated by a `|` and surrounded by double quotation marks - example: `global_attributes:project=rule-func:match-one-of:"ukcp18|ukcp09"`
* `match-one-or-more-of` - item must be the same as one or more of the `<extras>` given. Multiple options should be separated by a `|` and surrounded by double quotation marks - example: `global_attributes:contact=rule-func:match-one-or-more-of:"ukcpproject@metoffice.gov.uk|UKCP Team|MOHC"`
* `string-of-length` - item must be the same length as given `<extra>` or greater if `+` is given at end of `<extra>` - example: `global_attributes:project=rule-func:string-of-length:10,global_attributes:contact=rule-func:string-of-length:100+`
* `type-rule` - check item is of type as defined in `<extra>` - example: `transverse_mercator:false_northing=type-rule:integer`
* `regex` - check item for regular expression match - example: `global_attributes:project=regex:ukcp18`
* `regex-rule` - check item matches pre-defined regex rule, name of which is given in `<extra>`
* current options are `integer`,`valid-email`,`valid-url`,`valid-url-or-na`,`match:vN.M`,`datetime`,`datetime-or-na`,`number`


### Additional Options

#### specs
```
checksit check --specs=ceda-base /badc/ukcp18/data/land-cpm/uk/2.2km/rcp85/01/rss/day/latest/rss_rcp85_land-cpm_uk_2.2km_01_day_20671201-20681130.nc
```
* Checks file against a given specification. For more info, see [checksit show-specs](#checksit-show-specs)


#### auto-cache
```
checksit check --auto-cache --template=/badc/ukcp18/data/land-cpm/uk/2.2km/rcp85/08/rss/day/latest/rss_rcp85_land-cpm_uk_2.2km_08_day_20671201-20681130.nc /badc/ukcp18/data/land-cpm/uk/2.2km/rcp85/01/rss/day/latest/rss_rcp85_land-cpm_uk_2.2km_01_day_20671201-20681130.nc
```
* Create a cache of the given template to add to add to checksit's template_cache


#### verbose
```
checksit check --verbose /group_workspaces/jasmin2/ukcp18/incoming-astephen/ukcordex-example/tasmax_rcp85_land-rcm_uk_12km_EC-EARTH_r12i1p1_HIRHAM5_day_19801201-19901130.nc
```
* Print additional information



## checksit describe

```
checksit describe
```
* Prints docstring of rules that can be used in `checksit check --rules`
* Individual rules can be printed out, e.g. `checksit describe match-one-of`



## checksit show-specs

```
checksit show-specs <spec-id>
```
* Prints out specs for a given spec-id, e.g. `ceda-base`
* sped-ids are saved in checksit/specs/groups



## checksit summary

* Summarises output from a number of log files created through `checksit check`
* Attempts to find best checks to use for this file, and then runs checks.
* A specific template can be defined using the `-t/--template` flag, or specific specs
can be defined using the `-s/--specs` flag (see docs for info).
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information

project = 'checksit'
copyright = '2025, Ag Stephens, Hugo Ricketts, Joshua Hampton'
copyright = '2026, Ag Stephens, Hugo Ricketts, Joshua Hampton'
author = 'Ag Stephens, Hugo Ricketts, Joshua Hampton'
release = '0.1'

Expand Down
4 changes: 4 additions & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,9 @@ On a basic level a user can point the checksit tool at a given file and it will
some basic checks based on some matches that it will try to perform.
Other options include specifying the particular checks to run or to compare with known 'good' files.

.. note::
For a quick overview on how to run checksit on JASMIN, follow the link in the side bar.

.. toctree::
:hidden:
:maxdepth: 1
Expand Down Expand Up @@ -36,3 +39,4 @@ Other options include specifying the particular checks to run or to compare with
:hidden:

GitHub <https://github.com/cedadev/checksit>
checksit on JASMIN <https://help.jasmin.ac.uk/docs/software-on-jasmin/community-software-checksit/>
16 changes: 10 additions & 6 deletions docs/source/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,22 +5,26 @@ Source
------

It is recommended to create a fresh Python virtual environment for installing
``checksit``, which can be installed directly from GitHub:
``checksit``:

.. code-block::
.. code-block:: bash

python -m venv checksit-venv
source checksit-venv/bin/activate

Then, ``checksit`` can be installed from the source code on GitHub:

.. code-block:: bash

pip install git+https://github.com/cedadev/checksit.git

or by cloning the repository and installing that:

.. code-block::
.. code-block:: bash

git clone https://github.com/cedadev/checksit.git
cd checksit
pip install .

----

Other installation methods might be added later.


82 changes: 55 additions & 27 deletions docs/source/specifics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,50 +3,58 @@ File specific actions

``checksit`` has some specific actions depending on the file given.

NCAS-GENERAL
------------
NCAS Data
---------

Files that are designed to the NCAS-GENERAL standard are recognised by ``checksit``\ , and specs
referring to the correct version of the standard are automatically searched for and used by
``checksit``\ , with specs to include checking file name format, global attributes, dimensions
and variables for the used deployment mode and data product. For example, for a file with data
from an automatic weather station (\ ``ncas-aws-10``\ ) using version 2.0.0 of the standard,
If the ``checksit check`` command is given a file and no template or specs are
specified, then ``checksit`` will try to identify if the file is meant to comply with
one of the NCAS standards (NCAS-General, NCAS-Radar or NCAS-Image). ``checksit`` will
designate a file as an "NCAS standard" file if one of the following conditions is met:

.. code-block::
* The file contains the global attribute "Conventions" and the value of this attribute
contains "NCAS-" (case insensitive match).
* The file contains the "XMP-photoshop:Instructions" metadata tag and the value of this
tag contains "National Centre for Atmospheric Science" (case insensitive match).
* The name of the file starts with "ncas-" (case sensitive match).

checksit check ncas-aws-10_iao_20231117_surface-met_v1.0.nc
If any of these conditions match, then ``checksit`` will try to identify which NCAS
standard the file is meant to comply with.

is the same as

.. code-block::
NCAS-General
^^^^^^^^^^^^

checksit check -t off -s ncas-amof-2.0.0/amof-file-name,ncas-amof-2.0.0/amof-common-land,ncas-amof-2.0.0/amof-surface-met,ncas-amof-2.0.0/amof-global-attrs ncas-aws-10_iao_20231117_surface-met_v1.0.nc
If the name of the file ends with `.nc`, and the file contains the global attribute
"Conventions" with a value that contains one of "NCAS-General", "NCAS-AMOF", or
"NCAS-AMF" (case insensitive match), then the file is designated as an NCAS-General
file. ``checksit`` then determines which specs are needed to perform the correct
checks, including checking file name format, global attributes, dimensions, and
variables used for the deployment mode and data product.

NCAS-IMAGE
----------
For example, for a file with data from an automatic weather station
(\ ``ncas-aws-10``\ ) using version 2.0.0 of the standard,

The NCAS-IMAGE standard is also identified by ``checksit``\ , and the appropriate specs can be
found to check both global tags and photo or plot specific tags, i.e.
.. code-block:: bash

.. code-block::

checksit check ncas-cam-9_cao_20231117_photo_v1.0.nc
checksit check ncas-aws-10_iao_20231117_surface-met_v1.0.nc

is the same as

.. code-block::
.. code-block:: bash

checksit check -t off -s ncas-amof-2.0.0/amof-file-name,ncas-amof-2.0.0/amof-common-land,ncas-amof-2.0.0/amof-surface-met,ncas-amof-2.0.0/amof-global-attrs ncas-aws-10_iao_20231117_surface-met_v1.0.nc

checksit check -t off -s ncas-image-1.0.0/amof-image-global-attrs,ncas-image-1.0.0/amof-photo ncas-cam-9_cao_20231117_photo_v1.0.nc

NCAS-Radar
----------
^^^^^^^^^^

The NCAS-Radar standard is also recognised by ``checksit``\ , with the correct specs identified and
used if no template or spec options are specified. Unlike the NCAS-GENERAL and NCAS-IMAGE standards,
NCAS-Radar does not have specific data product specs, instead there are a number of different spec
files covering different areas of the standard. These spec files are:
If the file name ends with `.nc`, and the file contains the global attribute
"Conventions" with a value that contains "NCAS-Radar" (case insensitive match), then
the file is identified as an NCAS-Radar file. There are a number of different spec
files that cover different areas of the standard which ``checksit`` will use to check
against the files. These spec files are:

.. code-block::
.. code-block:: bash

coordinate-variables
dimensions
Expand All @@ -59,3 +67,23 @@ files covering different areas of the standard. These spec files are:
radar-parameters
sensor-pointing-variables
sweep-variables


NCAS-Image
^^^^^^^^^^

If the name of the file ends with one of `.png`, `.jpg`, or `.jpeg` (case insensitive
match), and the file contains the "XMP-photoshop:Instructions" metadata tag with a
value that contains "National Centre for Atmospheric Science" (case insensitive match),
then the file is identified as an NCAS-Image file. The appropriate specs are then found
to check both global tags and photo or plot specific tags. For example,

.. code-block:: bash

checksit check ncas-cam-9_cao_20231117_photo_v1.0.nc

is the same as

.. code-block:: bash

checksit check -t off -s ncas-image-1.0.0/amof-image-global-attrs,ncas-image-1.0.0/amof-photo ncas-cam-9_cao_20231117_photo_v1.0.nc
Loading