Skip to content

Commit 0a71bd8

Browse files
authored
Merge pull request #12060 from TillJan/12005-Feature-Flag
documentation: split feature flags table to subsections
2 parents ac7e86c + 49bd83a commit 0a71bd8

File tree

8 files changed

+170
-91
lines changed

8 files changed

+170
-91
lines changed

doc/sphinx-guides/source/admin/big-data-administration.rst

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -206,7 +206,7 @@ Challenges:
206206
Users will need to be made aware of these limitations and the possibilities for managing them (e.g. by aggregating multiple files in a single, larger file, or storing smaller files in the base-store via the normal Dataverse upload UI).
207207
- There is currently `a bug <https://github.com/gdcc/dataverse-globus/issues/2>`_ that won't allow users to transfer files from/to endpoints where they do not have permission to list the overall file tree (i.e. an institution manages <endpoint>/institution_name but the user only has access to <endpoint>/institution_name/my_dir.)
208208
Until that is fixed, a work-around is to first transfer data to an endpoint without this restriction.
209-
- An alternative, experimental implementation of Globus polling of ongoing upload transfers was added in v6.4. This framework does not rely on the instance staying up continuously for the duration of the transfer and saves the state information about Globus upload requests in the database. While it is now the recommended option, it is not enabled by default. See the ``globus-use-experimental-async-framework`` feature flag (see :ref:`feature-flags`) and the JVM option :ref:`dataverse.files.globus-monitoring-server`.
209+
- An alternative, experimental implementation of Globus polling of ongoing upload transfers was added in v6.4. This framework does not rely on the instance staying up continuously for the duration of the transfer and saves the state information about Globus upload requests in the database. While it is now the recommended option, it is not enabled by default. See the :ref:`dataverse.feature.globus-use-experimental-async-framework` feature flag and the JVM option :ref:`dataverse.files.globus-monitoring-server`.
210210

211211
More details of the setup required to enable Globus is described in the `Community Dataverse-Globus Setup and Configuration document <https://docs.google.com/document/d/1mwY3IVv8_wTspQC0d4ddFrD2deqwr-V5iAGHgOy4Ch8/edit?usp=sharing>`_ and the references therein.
212212

@@ -280,11 +280,11 @@ Scaling-related Configuration
280280
There are a broad range of options (that are not turned on by default) for improving how well Solr indexing and searching scales and for handling more files per dataset. Some of these are useful for all installations while others are related to specific use cases, or are mostly for emergency use (e.g. disabling facets).
281281
(see :ref:`database-settings`, :ref:`jvm-options`, and :ref:`feature-flags` for more details):
282282

283-
- dataverse.feature.add-publicobject-solr-field=true - specifically marks unrestricted content as public in Solr. See :ref:`feature-flags`.
284-
- dataverse.feature.avoid-expensive-solr-join=true - this tells Dataverse to use the feature above to speed up searches. See :ref:`feature-flags`.
285-
- dataverse.feature.reduce-solr-deletes=true - when Solr entries are being updated, this avoids an unnecessary step (deletion of existing entries) for entries that are being replaced. See :ref:`feature-flags`.
286-
- dataverse.feature.disable-dataset-thumbnail-autoselect=true - by default, Dataverse scans through all files in a dataset to find one that can be used as a thumbnail, which is expensive for many files. This disables that behavior to improve performance. See :ref:`feature-flags`.
287-
- dataverse.feature.only-update-datacite-when-needed=true - reduces the load on DataCite and reduces Dataverse failures related to that load, which is important when using file PIDs on Datasets with many files. See :ref:`feature-flags`.
283+
- :ref:`dataverse.feature.add-publicobject-solr-field` =true - specifically marks unrestricted content as public in Solr.
284+
- :ref:`dataverse.feature.avoid-expensive-solr-join` =true - this tells Dataverse to use the feature above to speed up searches.
285+
- :ref:`dataverse.feature.reduce-solr-deletes` =true - when Solr entries are being updated, this avoids an unnecessary step (deletion of existing entries) for entries that are being replaced.
286+
- :ref:`dataverse.feature.disable-dataset-thumbnail-autoselect` =true - by default, Dataverse scans through all files in a dataset to find one that can be used as a thumbnail, which is expensive for many files. This disables that behavior to improve performance.
287+
- :ref:`dataverse.feature.only-update-datacite-when-needed` =true - reduces the load on DataCite and reduces Dataverse failures related to that load, which is important when using file PIDs on Datasets with many files.
288288
- :ref:`dataverse.solr.min-files-to-use-proxy` =<X> - improve performance/lower memory requirements when indexing datasets with many files, suggested value is in the range 200 to 500
289289
- :ref:`dataverse.solr.concurrency.max-async-indexes` =<X> - limits the number of index operations running in parallel. The default is 4, larger values may improve performance (if the Solr instance is appropriately sized)
290290
- :ref:`:SolrFullTextIndexing` - false improves performance at the expense of not indexing file contents

doc/sphinx-guides/source/api/native-api.rst

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1709,7 +1709,7 @@ The fully expanded example above (without environment variables) looks like this
17091709
17101710
The CSV response has column headers mirroring the JSON entries. They are internationalized (when internationalization is configured).
17111711

1712-
Note: This feature requires the "role-assignment-history" feature flag to be enabled (see :ref:`feature-flags`).
1712+
Note: This feature requires the :ref:`dataverse.feature.role-assignment-history` feature flag to be enabled.
17131713

17141714
Datasets
17151715
--------
@@ -3305,7 +3305,7 @@ The fully expanded example above (without environment variables) looks like this
33053305
curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST "https://demo.dataverse.org/api/datasets/:persistentId/returnToAuthor?persistentId=doi:10.5072/FK2/J8SJZB" -H "Content-type: application/json" -d @reason-for-return.json
33063306
33073307
The review process can sometimes resemble a tennis match, with the authors submitting and resubmitting the dataset over and over until the curators are satisfied. Each time the curators send a "reason for return" via API, that reason is sent by email and is persisted into the database, stored at the dataset version level.
3308-
Note the reason is required, unless the `disable-return-to-author-reason` feature flag has been set (see :ref:`feature-flags`). Reason is a free text field and could be as simple as "The author would like to modify his dataset", "Files are missing", "Nothing to report" or "A curation report with comments and suggestions/instructions will follow in another email" that suits your situation.
3308+
Note the reason is required, unless the :ref:`dataverse.feature.disable-return-to-author-reason` feature flag has been set. Reason is a free text field and could be as simple as "The author would like to modify his dataset", "Files are missing", "Nothing to report" or "A curation report with comments and suggestions/instructions will follow in another email" that suits your situation.
33093309

33103310
The :ref:`send-feedback-admin` Admin only API call may be useful as a way to move the conversation to email. However, note that these emails go to contacts (versus authors) and there is no database record of the email contents. (:ref:`dataverse.mail.cc-support-on-contact-email` will send a copy of these emails to the support email address which would provide a record.)
33113311
The :ref:`send-feedback` API call may be useful as a way to move the conversation to email. However, note that these emails go to contacts (versus authors) and there is no database record of the email contents. (:ref:`dataverse.mail.cc-support-on-contact-email` will send a copy of these emails to the support email address which would provide a record.)
@@ -4454,7 +4454,7 @@ The fully expanded example above (without environment variables) looks like this
44544454
44554455
The CSV response has column headers mirroring the JSON entries. They are internationalized (when internationalization is configured).
44564456
4457-
Note: This feature requires the "role-assignment-history" feature flag to be enabled (see :ref:`feature-flags`).
4457+
Note: This feature requires the :ref:`dataverse.feature.role-assignment-history` feature flag to be enabled.
44584458
44594459
Dataset Files Role Assignment History
44604460
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -4511,7 +4511,7 @@ The fully expanded example above (without environment variables) looks like this
45114511
45124512
The CSV response for this call is the same as for the /api/datasets/{id}/assignments/history call above with the exception that definedOn will be a comma separated list of one or more file ids.
45134513
4514-
Note: This feature requires the "role-assignment-history" feature flag to be enabled (see :ref:`feature-flags`).
4514+
Note: This feature requires the :ref:`dataverse.feature.role-assignment-history` feature flag to be enabled.
45154515
45164516
Update Dataset License
45174517
~~~~~~~~~~~~~~~~~~~~~~
@@ -7009,7 +7009,7 @@ To create a harvesting client you must supply a JSON file that describes the con
70097009
70107010
The following optional fields are supported:
70117011
7012-
- ``sourceName``: When ``index-harvested-metadata-source`` is enabled (see :ref:`feature-flags`), sourceName will override the nickname in the Metadata Source facet. It can be used to group the content from many harvesting clients under the same name.
7012+
- ``sourceName``: When the :ref:`dataverse.feature.index-harvested-metadata-source` feature flag is enabled, sourceName will override the nickname in the Metadata Source facet. It can be used to group the content from many harvesting clients under the same name.
70137013
- ``archiveDescription``: What the name suggests. If not supplied, will default to "This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data."
70147014
- ``set``: The OAI set on the remote server. If not supplied, will default to none, i.e., "harvest everything". (Note: see the note below on using sets when harvesting from DataCite; this is new as of v6.6).
70157015
- ``style``: Defaults to "default" - a generic OAI archive. (Make sure to use "dataverse" when configuring harvesting from another Dataverse installation).

doc/sphinx-guides/source/developers/big-data-support.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -198,4 +198,4 @@ An overview of the control and data transfer interactions between components was
198198

199199
See also :ref:`Globus settings <:GlobusSettings>` and :ref:`globus-stores`.
200200

201-
An alternative, experimental implementation of Globus polling of ongoing upload transfers has been added in v6.4. This framework does not rely on the instance staying up continuously for the duration of the transfer and saves the state information about Globus upload requests in the database. Due to its experimental nature it is not enabled by default. See the ``globus-use-experimental-async-framework`` feature flag (see :ref:`feature-flags`) and the JVM option :ref:`dataverse.files.globus-monitoring-server`.
201+
An alternative, experimental implementation of Globus polling of ongoing upload transfers has been added in v6.4. This framework does not rely on the instance staying up continuously for the duration of the transfer and saves the state information about Globus upload requests in the database. Due to its experimental nature it is not enabled by default. See the :ref:`dataverse.feature.globus-use-experimental-async-framework` feature flag and the JVM option :ref:`dataverse.files.globus-monitoring-server`.

doc/sphinx-guides/source/developers/configuration.rst

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -122,5 +122,4 @@ convenient usage of it anywhere in the codebase. When adding a flag, please add
122122
status, add some Javadocs about the flagged feature and add a ``@since`` tag to make it easier to identify when a flag
123123
has been introduced.
124124

125-
We want to maintain a list of all :ref:`feature flags <feature-flags>` in the :ref:`configuration guide <feature-flags>`,
126-
please add yours to the list.
125+
Add the feature flag to the list at :ref:`feature flags <feature-flags>`.

doc/sphinx-guides/source/developers/globus-api.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -185,7 +185,7 @@ As the transfer can take significant time and the API call is asynchronous, the
185185

186186
Once the transfer completes, Dataverse will remove the write permission for the principal.
187187

188-
An alternative, experimental implementation of Globus polling of ongoing upload transfers has been added in v6.4. This new framework does not rely on the instance staying up continuously for the duration of the transfer and saves the state information about Globus upload requests in the database. Due to its experimental nature it is not enabled by default. See the ``globus-use-experimental-async-framework`` feature flag (see :ref:`feature-flags`) and the JVM option :ref:`dataverse.files.globus-monitoring-server`.
188+
An alternative, experimental implementation of Globus polling of ongoing upload transfers has been added in v6.4. This new framework does not rely on the instance staying up continuously for the duration of the transfer and saves the state information about Globus upload requests in the database. Due to its experimental nature it is not enabled by default. See the :ref:`dataverse.feature.globus-use-experimental-async-framework` feature flag and the JVM option :ref:`dataverse.files.globus-monitoring-server`.
189189

190190
Note that when using a managed endpoint that uses the Globus S3 Connector, the checksum should be correct as Dataverse can validate it. For file-based endpoints, the checksum should be included if available but Dataverse cannot verify it.
191191

doc/sphinx-guides/source/developers/performance.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -120,8 +120,8 @@ While in the past Solr performance hasn't been much of a concern, in recent year
120120

121121
We are tracking performance problems in `#10469 <https://github.com/IQSS/dataverse/issues/10469>`_.
122122

123-
In a meeting with a Solr expert on 2024-05-10 we were advised to avoid joins as much as possible. (It was acknowledged that many Solr users make use of joins because they have to, like we do, to keep some documents private.) Toward that end we have added two feature flags called ``avoid-expensive-solr-join`` and ``add-publicobject-solr-field`` as explained under :ref:`feature-flags`. It was confirmed experimentally that performing the join on all the public objects (published collections, datasets and files), i.e., the bulk of the content in the search index, was indeed very expensive, especially on a large instance the size of the IQSS prod. archive, especially under indexing load. We confirmed that it was in fact unnecessary and were able to replace it with a boolean field directly in the indexed documents, which is achieved by the two feature flags above. However, as of writing this, this mechanism should still be considered experimental.
124-
Another flag, ``reduce-solr-deletes``, avoids deleting solr documents for files in a dataset prior to sending updates. It also eliminates several causes of orphan permission documents. This is expected to improve indexing performance to some extent and is a step towards avoiding unnecessary updates (i.e. when a doc would not change).
123+
In a meeting with a Solr expert on 2024-05-10 we were advised to avoid joins as much as possible. (It was acknowledged that many Solr users make use of joins because they have to, like we do, to keep some documents private.) Toward that end we have added two feature flags called :ref:`dataverse.feature.avoid-expensive-solr-join` and :ref:`dataverse.feature.add-publicobject-solr-field`. It was confirmed experimentally that performing the join on all the public objects (published collections, datasets and files), i.e., the bulk of the content in the search index, was indeed very expensive, especially on a large instance the size of the IQSS prod. archive, especially under indexing load. We confirmed that it was in fact unnecessary and were able to replace it with a boolean field directly in the indexed documents, which is achieved by the two feature flags above. However, as of writing this, this mechanism should still be considered experimental.
124+
Another feature flag, :ref:`dataverse.feature.reduce-solr-deletes`, avoids deleting Solr documents for files in a dataset prior to sending updates. It also eliminates several causes of orphan permission documents. This is expected to improve indexing performance to some extent and is a step towards avoiding unnecessary updates (i.e. when a doc would not change).
125125

126126
Datasets with Large Numbers of Files or Versions
127127
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

0 commit comments

Comments
 (0)