Skip to content

macmarrum/rumar

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Rumar

A file-backup utility

Creates a directory named as the original file, containing a tarred copy of the file, optionally compressed.

Files are added to the tar archive only if they were changed, i.e. modification time is greater as compared to the last archive and size (or checksum) is different.

The directory containing tar files is placed in a mirrored directory tree. Each backup is a separate tar file.

How to use it

  1. Install Python (at least 3.10)
  2. Download rumar.py
  3. Download rumar.toml to the same directory as rumar.py
  4. Edit rumar.toml and adapt it to your needs – see settings details
  5. Open a console/terminal (e.g. PowerShell) and change to the directory containing rumar.py
  6. If your installed Python version is below 3.11, run python -m pip install tomli to install the module tomli
  7. Run python rumar.py list-profiles → you should see your profile name(s) printed in the console
  8. Run python rumar.py create --profile "My Documents" to create a backup using the profile "My Documents"
  9. Optionally, add the create command to Task Scheduler or cron, to be run at an interval (e.g. each day/night)

How to sweep old backups

  1. Run python rumar.py sweep --profile "My Documents" --dry-run and verify the files to be removed
  2. Run python rumar.py sweep --profile "My Documents" to remove old backups
  3. Optionally, add the sweep command to Task Scheduler or cron, to be run at an interval (e.g. each day/night)

Note: when --dry-run is used, rumar.py counts the backup files and selects those to be removed based on settings, but no files are actually deleted.

Settings

Unless specified by --toml path/to/your/settings.toml, settings are loaded from rumar.toml in the same directory as rumar.py or located in rumar/rumar.toml inside $XDG_CONFIG_HOME ($HOME/.config if not set) on POSIX, or inside %APPDATA% on NT (Windows).

Settings example

rumar.toml

# schema version
version = 2
# settings common for all profiles
backup_base_dir = 'C:\Users\Mac\Backup'

# setting for individual profiles - override any common ones

["My Documents"]
source_dir = 'C:\Users\Mac\Documents'
excluded_top_dirs = ['My Music', 'My Pictures', 'My Videos']
excluded_files_as_glob = ['desktop.ini', 'Thumbs.db']

[Desktop]
source_dir = 'C:\Users\Mac\Desktop'
excluded_files_as_glob = ['desktop.ini', '*.exe', '*.msi']

["# this profile's name starts with a hash, therefore it will be ignored"]
source_dir = "this setting won't be loaded"

For Python 3.13 or higher

# schema version
version = 3
# settings common for all profiles
backup_base_dir = 'C:\Users\Mac\Backup'

# setting for individual profiles - override any common ones

["My Documents"]
source_dir = 'C:\Users\Mac\Documents'
excluded_files = ['My Music\**', 'My Pictures\**', 'My Videos\**', '**\desktop.ini', '**\Thumbs.db']

[Desktop]
source_dir = 'C:\Users\Mac\Desktop'
excluded_files = ['**\desktop.ini', '**\*.exe', '**\*.msi']

["# this profile's name starts with a hash, therefore it will be ignored"]
source_dir = "this setting won't be loaded"

Settings details

Each profile whose name starts with a hash # is ignored when rumar.toml is loaded.
version indicates the schema version – currently 3.

  • backup_base_dir: str     used by: create, sweep
    path to the base directory used for backup; usually set in the global space, common for all profiles
    ⓘ note: backup directory for each profile, i.e. backup_dir, is constructed as {backup_base_dir}/{profile}, unless backup_dir is set, which takes precedence
  • backup_dir: str = None     used by: create, extract, sweep
    path to the backup directory used for the profile
    ⚠️ caution: usually left unset; if so, its value defaults to {backup_base_dir}/{profile}
  • archive_format: Literal['tar', 'tar.gz', 'tar.bz2', 'tar.xz', 'tar.zst'] = 'tar.zst'     used by: create, sweep
    format of archive files to be created
    'tar.zst' requires Python 3.14 or higher or backports.zstd
  • compression_level: int = 3     used by: create
    0 to 9 for 'tar.gz', 'tar.bz2', 'tar.xz'
    0 to 22 for 'tar.zst'
  • no_compression_suffixes_default: str = '7z,zip,zipx,jar,rar,tgz,gz,tbz,bz2,xz,zst,zstd,xlsx,docx,pptx,ods,odt,odp,odg,odb,epub,mobi,cbz,png,jpg,gif,mp4,mov,avi,mp3,m4a,aac,ogg,ogv,opus,flac,kdbx'     used by: create
    comma-separated string of the default lower-case suffixes for which to use no compression
  • no_compression_suffixes: str = ''     used by: create
    extra lower-case suffixes in addition to no_compression_suffixes_default
  • tar_format: Literal[0, 1, 2] = 1 (tarfile.GNU_FORMAT)     used by: create
    see also https://docs.python.org/3/library/tarfile.html#supported-tar-formats and https://www.gnu.org/software/tar/manual/html_section/Formats.html
  • source_dir: str     used by: create, extract
    path to the directory which is to be archived
  • included_files: list[str]     used by: create, sweep
    ⚠️ caution: uses PurePath.full_match(...), which is available on Python 3.13 or higher
    a list of glob patterns, also known as shell-style wildcards, i.e. ** * ? [seq] [!seq]; ** means zero or more segments, * means a single segment or a part of a segment (as in My*)
    if present, only the matching files will be considered, together with included_files_as_regex, included_files_as_glob, included_top_dirs, included_dirs_as_regex
    the paths/globs can be absolute or relative to source_dir (or backup_dir in case of sweep), e.g. C:\My Documents\*.txt, my-file-in-source-dir.log
    absolute paths start with a root (/ or {drive}:\)
    on Windows, global-pattern matching is case-insensitive, and both \ and / can be used
    see also https://docs.python.org/3.13/library/pathlib.html#pathlib-pattern-language
  • excluded_files: list[str]     used by: create, sweep
    ⚠️ caution: uses PurePath.full_match(...), which is available on Python 3.13 or higher
    the matching files will be ignored, together with excluded_files_as_regex, excluded_files_as_glob, excluded_top_dirs, excluded_dirs_as_regex
    see also included_files
  • included_top_dirs: list[str]     used by: create, sweep
    ❌ deprecated: use included_files instead, if on Python 3.13 or higher, e.g. ['top dir 1/**',]
    a list of top-directory paths
    if present, only the files from the directories and their descendant subdirs will be considered, together with included_dirs_as_regex, included_files, included_files_as_regex, included_files_as_glob,
    the paths can be relative to source_dir or absolute, but always under source_dir (or backup_dir in case of sweep)
    absolute paths start with a root (/ or {drive}:\)
  • excluded_top_dirs: list[str]     used by: create, sweep
    ❌ deprecated: use excluded_files instead, if on Python 3.13 or higher, e.g. ['top dir 3/**',]
    the files from the directories and their subdirs will be ignored, together with excluded_dirs_as_regex, excluded_files, excluded_files_as_regex, excluded_files_as_glob
    see also included_top_dirs
  • included_dirs_as_regex: list[str]     used by: create, sweep
    a list of regex patterns (each to be passed to re.compile)
    if present, only the file from the matching directories will be considered, together with included_top_dirs, included_files, included_files_as_regex, included_files_as_glob
    / must be used as the path separator, also on Windows
    the patterns are matched (using re.search) against a path relative to source_dir (or backup_dir in case of sweep)
    the first segment in the relative path to match against also starts with a slash
    e.g. ['/B$',] will match each directory named B, at any level; ['^/B$',] will match only {source_dir}/B (or {backup_dir}/B in case of sweep)
    regex-pattern matching is case-sensitive – use (?i) at each pattern's beginning for case-insensitive matching, e.g. ['(?i)/b$',]
    see also https://docs.python.org/3/library/re.html
  • excluded_dirs_as_regex: list[str]     used by: create, sweep
    the files from the matching directories will be ignored, together with excluded_top_dirs, excluded_files, excluded_files_as_regex, excluded_files_as_glob
    see also included_dirs_as_regex
  • included_files_as_glob: list[str]     used by: create, sweep
    ❌ deprecated: use included_files instead, if on Python 3.13 or higher
    a list of glob patterns, also known as shell-style wildcards, i.e. * ? [seq] [!seq]
    if present, only the matching files will be considered, together with included_files, included_files_as_regex, included_top_dirs, included_dirs_as_regex
    the paths/globs can be partial, relative to source_dir or absolute, but always under source_dir (or backup_dir in case of sweep)
    unlike with glob patterns used in included_files, here matching is done from the right if the pattern is relative, e.g. ['B\b1.txt',] will match C:\A\B\b1.txt and C:\B\b1.txt
    ⚠️ caution: a leading path separator indicates an absolute path, but on Windows you also need a drive letter, e.g. ['\A\a1.txt'] will never match; use ['C:\A\a1.txt'] instead
    on Windows, global-pattern matching is case-insensitive, and both \ and / can be used
    see also https://docs.python.org/3/library/fnmatch.html and https://en.wikipedia.org/wiki/Glob_(programming)
  • excluded_files_as_glob: list[str]     used by: create, sweep
    ❌ deprecated: use excluded_files instead, if on Python 3.13 or higher
    the matching files will be ignored, together with excluded_files, excluded_files_as_regex, excluded_top_dirs, excluded_dirs_as_regex
    see also included_files_as_glob
  • included_files_as_regex: list[str]     used by: create, sweep
    if present, only the matching files will be considered, together with included_files, included_files_as_glob, included_top_dirs, included_dirs_as_regex
    see also included_dirs_as_regex
  • excluded_files_as_regex: list[str]     used by: create, sweep
    the matching files will be ignored, together with excluded_files, excluded_files_as_glob, excluded_top_dirs, excluded_dirs_as_regex
    see also included_dirs_as_regex
  • checksum_comparison_if_same_size: bool = False     used by: create
    when False, a file is considered changed if its mtime is later than the latest backup's mtime and its size changed
    when True, BLAKE2b checksum is calculated to determine if the file changed despite having the same size
    mtime := last modification time
    see also https://en.wikipedia.org/wiki/File_verification
  • file_deduplication: bool = False     used by: create
    when True, an attempt is made to find and skip duplicates
    a duplicate file has the same suffix and size and part of its name, case-insensitive (suffix, name)
  • min_age_in_days_of_backups_to_sweep: int = 2     used by: sweep
    only the backups which are older than the specified number of days are considered for removal
  • number_of_backups_per_day_to_keep: int = 2     used by: sweep
    for each file, the specified number of backups per day is kept, if available
    more backups per day might be kept to satisfy number_of_backups_per_week_to_keep and/or number_of_backups_per_month_to_keep
    oldest backups are removed first
  • number_of_backups_per_week_to_keep: int = 14     used by: sweep
    for each file, the specified number of backups per week is kept, if available
    more backups per week might be kept to satisfy number_of_backups_per_day_to_keep and/or number_of_backups_per_month_to_keep
    oldest backups are removed first
  • number_of_backups_per_month_to_keep: int = 60     used by: sweep
    for each file, the specified number of backups per month is kept, if available
    more backups per month might be kept to satisfy number_of_backups_per_day_to_keep and/or number_of_backups_per_week_to_keep
    oldest backups are removed first
  • commands_using_filters: list[str] = ['create']     used by: create, sweep
    determines which commands can use the filters specified in the included_* and excluded_* settings
    by default, filters are used only by create, i.e. sweep considers all created backups (no filter is applied)
    a filter for sweep could be used to e.g. never remove backups from the first day of a month:
    excluded_files = ['**/[0-9][0-9][0-9][0-9]-[0-9][0-9]-01_*.tar*'] or
    excluded_files_as_regex = ['/\d\d\d\d-\d\d-01_\d\d,\d\d,\d\d(\.\d{6})?[+-]\d\d,\d\d~\d+(~.+)?\.tar(\.(gz|bz2|xz|zst))?$']
    it's best when the setting is part of a separate profile, i.e. a copy made for sweep,
    otherwise create will also seek such files to be excluded
  • db_path: str = backup_base_dir/rumar.sqlite

Settings schema version 3 vs 2

Version 3 has the additional settings included_files and excluded_files. They rely on PurePath.full_match(...), which was added in Python 3.13.
The new settings remove the need for the following ones:

  • included_top_dirs
  • excluded_top_dirs
  • included_files_as_glob
  • excluded_files_as_glob

Also backup_base_dir_for_profile is renamed to backup_dir.

Settings schema version 2 vs 1

Version 1 contained sha256_comparison_if_same_size.
In version 2 it's checksum_comparison_if_same_size.

Logging settings

Logging is controlled by settings located in rumar/rumar.logging.toml inside $XDG_CONFIG_HOME ($HOME/.config if not set) on POSIX, or inside %APPDATA% on NT (Windows). You can copy the below settings to your own file and modify them as needed.

By default, rumar.log is created in the current directory (where rumar.py is executed). This can be changed by setting filename=/path/to/rumar.log.
To disable the creation of rumar.log, put a hash # in front of "to_file", in [loggers.rumar].

version = 1

[formatters.f1]
format = "{levelShort} {asctime}: {funcName:24} {msg}"
style = "{"
validate = true

[handlers.to_console]
class = "logging.StreamHandler"
formatter = "f1"
#level = "DEBUG_14"

[handlers.to_file]
class = "logging.FileHandler"
filename = "rumar.log"
encoding = "UTF-8"
formatter = "f1"
#level = "DEBUG_14"

[loggers.rumar]
handlers = [
    "to_console",
    "to_file",
]
level = "DEBUG_14"

More information: https://docs.python.org/3/library/logging.config.html#logging-config-dictschema


Copyright © 2023-2025 macmarrum
SPDX-License-Identifier: GPL-3.0-or-later

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages