v0.2.0 #210
wwoytenko
announced in
Announcements
v0.2.0
#210
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Greenmask 0.2.0
This is one of the biggest releases since Greenmask was founded. We've been in close contact with our users, gathering
feedback, and working hard to make Greenmask more flexible, reliable, and user-friendly.
This major release introduces exciting new features such as database subsetting, pgzip support, restoration in
topological order, and refactored transformers, significantly enhancing Greenmask's flexibility to better meet business
needs. It also includes several fixes and improvements.
Preface
This release is a major milestone that significantly expands Greenmask's functionality, transforming it into a simple,
extensible, and reliable solution for database security, data anonymization, and everyday operations. Our goal is to
create a core system that can serve as a foundation for comprehensive dynamic staging environments and robust data
security.
Notable changes
PostgreSQL 17 support - revised ported library to support PostgreSQL 17
Database Subset - a new feature that allows you to define a subset of the database,
allowing you to scale down the dump size (#110). This is
robust for multipurpose and especially useful for testing and development environments. It supports:
for the FK reference with NULL values to include them in the subset.
FK in Greenmask that will be used for subset dependencies graph. The virtual reference can be defined for a column
or an expression, allowing you to get the value from JSON and similar.
circular dependencies in the subset by generating a recursive query. The query is generated with integrity checks
of the subset ensuring that the data gathered from circular dependencies is consistent.
and examples.
recursive query for the SCC whether it is a single cycle or multiple cycles, making the subset system universal
for any database schema.
a virtual reference for a table with polymorphic references
using
polymorphic_exprsattribute and use greenmask to generate a subset for such tables.pgzip support for faster compression
and decompression — setting
--pgzipcan speed up the dump andrestoration processes through parallel compression. In some tests, it shows up to 5x faster dump and restore
operations.
Restoration in topological order - This flag ensures
that dependent tables are not restored until the tables they depend on have been restored. This is useful when you
want to be notified of errors as immediately as possible without waiting for the entire table to be restored.
Insert format restoration - For a flexible restoration
process, Greenmask now supports data restoration in the
INSERTformat. It generates the insert statements based onCOPYrecords from the dump. You do not need to re-dump your data to use this feature; it can be defined in therestorecommand. The list of new features related to theINSERTformat:INSERTstatements with the**ON CONFLICT DO NOTHING**clause if the flag--on-conflict-do-nothingis set.
certain errors and continue inserting subsequent rows from the dump.
want to insert data periodically from another source, this can be used together with the database subset and
transformations to catch up the target database.
Restore data batching (#173) -
By default, the COPY protocol returns the error only on transaction commit. To override this behavior, use the
--batch-sizeflag to specify the number of rows to insert in a single batch during the COPY command. This is usefulwhen you want to control the transaction size and commit.
Introduced
keep_nullparameter forRandomPersontransformer.Introduced dynamic parameters in the transformers
and predefined cast functions accessible via
cast_to. These functions cover frequent operations such asUnixTimestampToDateandIntToBool.The transformation logic has been significantly refactored, making transformers more customizable and flexible than
before.
Introduced transformation engines
random- generates transformer values based on pseudo-random algorithms.hash- generates transformer values using hash functions. Currently, it utilizessha3hash functions, whichare secure but perform slowly. In the stable release, there will be an option to choose between
sha3andSipHash.Introduced static parameters value template
Dumps retention management - Introduced retention
parameters (#201) for the delete command. Introduced two new
statuses: failed and in progress. A dump is considered failed if it lacks a "done" heartbeat or
if the last heartbeat timestamp exceeds 30 minutes. The delete command now supports the following retention
parameters:
--dry-run: Runs the deletion operation in test mode with verbose output, without actually deleting anything.--before-date 2024-08-27T23:50:54+00:00: Deletes dumps older than the specified date. The date must be providedin RFC3339Nano format, for example:
2021-01-01T00:00:00Z.--retain-recent 10: Retains the N most recent dumps, where N is specified by the user.--retain-for 1w2d3h4m5s6ms7us8ns: Retains dumps for the specified duration. The format supports weeks (w),days (d), hours (h), minutes (m), seconds (s), milliseconds (ms), microseconds (us), and nanoseconds (ns).
--prune-failed: Prunes (removes) all dumps that have failed.--prune-unsafe: Prunes dumps with "unknown-or-failed" statuses. This option only works in conjunction with--prune-failed.Docker image mirroring into the GitHub Container Registry
Core
Parametrizerinterface, now implemented for both dynamic and static parameters.Driverinitialization logic.Driver.Parametrizerinterface.TransformationContext, as the first step towards enabling new feature transformationconditions (Feature: conditional transform #34).
static mode ensures performance remains high. Using only the necessary transformation features helps keep
transformation time predictable.
Transformers
RandomEmail - Introduces a new transformer that
supports both random and deterministic engines. It allows for flexible email value generation; you can use column
values in the template and choose to keep the original domain or select any from the
domainsparameter.NoiseDate, NoiseFloat, NoiseInt -
These transformers support both random and deterministic engines, offering dynamic mode parameters that control the
noise thresholds within the
minandmaxrange. Unlike previous implementations which used a singleratioparameter, the new release features
min_ratioandmax_ratioparameters to define noise values more precisely.Utilizing the
hashengine in these transformers enhances security by complicating statistical analysis forattackers, especially when the same salt is used consistently over long periods.
NoiseNumeric - A newly implemented transformer,
sharing features with
NoiseIntandNoiseFloat, but specifically designed for numeric values (large integers orfloats). It provides a
decimalparameter to handle values with fractions.RandomChoice - Now supports the
hashengineRandomDate, RandomFloat, RandomInt -
Now enhanced with hash engine support. Threshold parameters
minandmaxhave been updated to support dynamic mode,allowing for more flexible configurations.
RandomNumeric - A new transformer specifically
designed for numeric types (large integers or floats), sharing similar features with
RandomIntandRandomFloat,but tailored for handling huge numeric values.
RandomString - Now supports hash engine mode
RandomUnixTimestamp - This new transformer
generates Unix timestamps with selectable units (
second,millisecond,microsecond,nanosecond). Similar infunction to
RandomDate, it supports the hash engine and dynamic parameters forminandmaxthresholds, with theability to override these units using
min_unitandmax_unitparameters.RandomUuid - Added hash engine support
RandomPerson - Implemented a new transformer that
replaces
RandomName,RandomLastName,RandomFirstName,RandomFirstNameMale,RandomFirstNameFemale,RandomTitleMale, andRandomTitleFemale. This new transformer offers enhanced customizability while providingsimilar functionalities as the previous versions. It generates personal data such as
FirstName,LastName, andTitle, based on the providedgenderparameter, which now supports dynamic mode. Future minor versions will allowfor overriding the default names database.
Added tsModify - a new
template function for time.Time objects modification
Introduced a new RandomIp transformer capable of
generating a random IP address based on the specified netmask.
Added a new RandomMac transformer for generating
random Mac addresses.
Deleted transformers include
RandomMacAddress,RandomIPv4,RandomIPv6,RandomUnixTime,RandomTitleMale,RandomTitleFemale,RandomFirstName,RandomFirstNameMale,RandomFirstNameFemale,RandomLastName, andRandomNamedue to the introduction of more flexible and unified options.Fixes and improvements
validatecommand with the--tableflag, which had thewrong order of the table name representation
{{ table_name }}.{{ schema }}instead of{{ schema }}.{{ table_name }}.Row.SetColumnout of range validation.restoreWorkerpanic caused when the worker received an error from pgx.handling in the
restorecommand.jobs now start a transaction for each table restoration and commit it after the table restoration is done.
--exit-on-errorworks incorrectly in therestorecommand. Now, the--exit-on-errorflag works correctly with thedatasection.validatecommand.latestto exclude specifickeywords.
in the
RandomPersontransformer.parameters such as
--exclude-table,--table, etc.buffer limit in the
Emailtransformer.columns_type_overridedid not work.just ignored instead of throwing an error.
minandmaxparameter values were ignoredin transformers
NoiseDate,NoiseNumeric,NoiseFloat,NoiseInt,RandomNumeric,RandomFloat, andRandomInt.newline and semicolon. Now backward pg_dump call
pg_restore 1724504511561 --file 1724504511561.sqlis backwardcompatible and works as expected.
generated column.
Full Changelog: v0.1.14...v0.2.0
Contributors
Special thanks
Links
Feel free to reach out to us if you have any questions or need assistance:
This discussion was created from the release v0.2.0.
Beta Was this translation helpful? Give feedback.
All reactions