Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 40 additions & 0 deletions dataset/example02/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
---
SPDX-FileType: DOCUMENTATION
SPDX-License-Identifier: CC-BY-4.0
---

# Dataset example 2 - Image dataset

## Description

This example illustrates an SBOM for a labeled image dataset of human faces
used to train models that recognize facial expressions.

The SBOM demonstrates Dataset-profile properties for **image datasets**,
including data collection, preprocessing, known bias, and privacy sensitivity.

## Profile conformance

`core`, `dataset`

## SPDX files

| Version | File |
| ------- | ---- |
| SPDX 3.0 | [spdx3.0/example02.spdx3.json](./spdx3.0/example02.spdx3.json) |
| SPDX 3.1 (draft) | [spdx3.1/example02.spdx3.json-draft](./spdx3.1/example02.spdx3.json-draft) |

[![A diagram of Dataset example 2 - Image dataset.](./example02.spdx3.png "A diagram of Dataset example 2 - Image dataset.")](./example02.spdx3.png)

## Key properties demonstrated

| Property | Notes |
| -------- | ----- |
| `/Dataset/confidentialityLevel` | `clear` - freely distributable (with license) |
| `/Dataset/dataCollectionProcess` | How the images were sourced |
| `/Dataset/dataPreprocessing` | Steps applied to prepare images before use |
| `/Dataset/datasetSize` | `21474836480` bytes (~20 GB) - deprecated in SPDX 3.1, use `/Software/artifactSize` |
| `/Dataset/datasetType` | `image` |
| `/Dataset/hasSensitivePersonalInformation` | `yes` - dataset contains images of people |
| `/Dataset/intendedUse` | Training/evaluation use cases - deprecated in SPDX 3.1, use `/Core/intendedUse` |
| `/Dataset/knownBias` | Demographic imbalances documented |
Binary file added dataset/example02/example02.spdx3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
116 changes: 116 additions & 0 deletions dataset/example02/spdx3.0/example02.spdx3.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
{
"@context": "https://spdx.org/rdf/3.0.1/spdx-context.jsonld",
"@graph": [
{
"type": "CreationInfo",
"@id": "_:creationinfo",
"specVersion": "3.0.1",
"createdBy": [
"https://spdx.org/spdxdocs/dataset-example02-b1b2c3d4-e5f6-7890-abcd-000000002001#Organization1"
],
"created": "2024-07-01T00:00:00Z"
},
{
"type": "Organization",
"spdxId": "https://spdx.org/spdxdocs/dataset-example02-b1b2c3d4-e5f6-7890-abcd-000000002001#Organization1",
"creationInfo": "_:creationinfo",
"name": "Computer Vision Research Lab"
},
{
"type": "SpdxDocument",
"spdxId": "https://spdx.org/spdxdocs/dataset-example02-b1b2c3d4-e5f6-7890-abcd-000000002001",
"creationInfo": "_:creationinfo",
"profileConformance": [
"core",
"dataset"
],
"rootElement": [
"https://spdx.org/spdxdocs/dataset-example02-b1b2c3d4-e5f6-7890-abcd-000000002001#SBOM1"
]
},
{
"type": "software_Sbom",
"spdxId": "https://spdx.org/spdxdocs/dataset-example02-b1b2c3d4-e5f6-7890-abcd-000000002001#SBOM1",
"creationInfo": "_:creationinfo",
"profileConformance": [
"core",
"dataset"
],
"rootElement": [
"https://spdx.org/spdxdocs/dataset-example02-b1b2c3d4-e5f6-7890-abcd-000000002001#DatasetPackage1"
],
"software_sbomType": [
"analyzed"
]
},
{
"type": "dataset_DatasetPackage",
"spdxId": "https://spdx.org/spdxdocs/dataset-example02-b1b2c3d4-e5f6-7890-abcd-000000002001#DatasetPackage1",
"creationInfo": "_:creationinfo",
"name": "FaceExpress-50K",
"summary": "Labeled image dataset of facial expressions for emotion recognition model training.",
"description": "50,000 labeled photographs of human faces showing one of seven basic emotions (angry, disgust, fear, happy, neutral, sad, surprise). Images collected in controlled laboratory conditions and in the wild. Intended for training and evaluating facial expression recognition models.",
"software_packageVersion": "1.2.0",
"software_primaryPurpose": "data",
"software_downloadLocation": "https://example.com/datasets/faceexpress-50k",
"software_copyrightText": "Copyright 2024 Computer Vision Research Lab",
"builtTime": "2024-01-15T00:00:00Z",
"releaseTime": "2024-07-01T00:00:00Z",
"dataset_datasetType": [
"image"
],
"dataset_datasetSize": 21474836480,
"dataset_datasetAvailability": "registration",
"dataset_confidentialityLevel": "clear",
"dataset_dataCollectionProcess": "Images sourced from three channels: (1) laboratory sessions with consenting adult volunteers photographed under 4 different lighting conditions; (2) licensed stock photographs from commercial providers; (3) Creative Commons-licensed images from Flickr. Each image independently labeled by 3 annotators; majority vote used as ground truth. Annotators were trained on Paul Ekman's FACS basic emotion categories.",
"dataset_dataPreprocessing": [
"Face detection and alignment using MTCNN",
"Crop to 224x224 pixels centered on detected face bounding box",
"Images with face detection confidence below 0.9 excluded",
"Conversion to RGB; grayscale images excluded"
],
"dataset_knownBias": [
"Demographic imbalance: approximately 62% images of perceived lighter skin tones",
"Laboratory-sourced images may overrepresent posed rather than spontaneous expressions",
"Age distribution skewed toward 20-35 age group (58% of dataset)"
],
"dataset_hasSensitivePersonalInformation": "yes",
"dataset_intendedUse": "Training and evaluation of facial expression recognition and emotion detection models. Not intended for direct deployment in consumer applications without additional validation.",
"verifiedUsing": [
{
"type": "Hash",
"algorithm": "sha256",
"hashValue": "a7f3b92e1cd458f9026e3a4b57c8d1e0f2a9b4c6d8e0f1a2b3c4d5e6f7a8b9c0"
}
],
"comment": "SPDX 3.0 NOTE: 'dataset_datasetSize' (value: 21474836480 bytes) is DEPRECATED in SPDX 3.1 - use 'software_artifactSize' instead (same unit: bytes). Also, 'dataset_intendedUse' is DEPRECATED in SPDX 3.1 - use Core-level 'intendedUse' property instead."
},
{
"type": "Relationship",
"spdxId": "https://spdx.org/spdxdocs/dataset-example02-b1b2c3d4-e5f6-7890-abcd-000000002001#Relationship1",
"creationInfo": "_:creationinfo",
"relationshipType": "hasDeclaredLicense",
"from": "https://spdx.org/spdxdocs/dataset-example02-b1b2c3d4-e5f6-7890-abcd-000000002001#DatasetPackage1",
"to": [
"https://spdx.org/spdxdocs/dataset-example02-b1b2c3d4-e5f6-7890-abcd-000000002001#LicenseExpression1"
]
},
{
"type": "Relationship",
"spdxId": "https://spdx.org/spdxdocs/dataset-example02-b1b2c3d4-e5f6-7890-abcd-000000002001#Relationship2",
"creationInfo": "_:creationinfo",
"relationshipType": "hasConcludedLicense",
"from": "https://spdx.org/spdxdocs/dataset-example02-b1b2c3d4-e5f6-7890-abcd-000000002001#DatasetPackage1",
"to": [
"https://spdx.org/spdxdocs/dataset-example02-b1b2c3d4-e5f6-7890-abcd-000000002001#LicenseExpression1"
]
},
{
"type": "simplelicensing_LicenseExpression",
"spdxId": "https://spdx.org/spdxdocs/dataset-example02-b1b2c3d4-e5f6-7890-abcd-000000002001#LicenseExpression1",
"creationInfo": "_:creationinfo",
"simplelicensing_licenseExpression": "LicenseRef-FaceExpress-NonCommercial-Research",
"comment": "Non-commercial research use only. Redistribution requires written permission. See https://example.com/datasets/faceexpress-50k/license"
}
]
}
115 changes: 115 additions & 0 deletions dataset/example02/spdx3.1/example02.spdx3.json-draft
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
{
"@context": "https://spdx.org/rdf/3.1/spdx-context.jsonld",
"@graph": [
{
"type": "CreationInfo",
"@id": "_:creationinfo",
"specVersion": "3.1",
"createdBy": [
"https://spdx.org/spdxdocs/dataset-example02-b1b2c3d4-e5f6-7890-abcd-000000002002#Organization1"
],
"created": "2025-01-01T00:00:00Z"
},
{
"type": "Organization",
"spdxId": "https://spdx.org/spdxdocs/dataset-example02-b1b2c3d4-e5f6-7890-abcd-000000002002#Organization1",
"creationInfo": "_:creationinfo",
"name": "Computer Vision Research Lab"
},
{
"type": "SpdxDocument",
"spdxId": "https://spdx.org/spdxdocs/dataset-example02-b1b2c3d4-e5f6-7890-abcd-000000002002",
"creationInfo": "_:creationinfo",
"profileConformance": [
"core",
"dataset"
],
"rootElement": [
"https://spdx.org/spdxdocs/dataset-example02-b1b2c3d4-e5f6-7890-abcd-000000002002#SBOM1"
]
},
{
"type": "software_Sbom",
"spdxId": "https://spdx.org/spdxdocs/dataset-example02-b1b2c3d4-e5f6-7890-abcd-000000002002#SBOM1",
"creationInfo": "_:creationinfo",
"profileConformance": [
"core",
"dataset"
],
"rootElement": [
"https://spdx.org/spdxdocs/dataset-example02-b1b2c3d4-e5f6-7890-abcd-000000002002#DatasetPackage1"
],
"software_sbomType": [
"analyzed"
]
},
{
"type": "dataset_DatasetPackage",
"spdxId": "https://spdx.org/spdxdocs/dataset-example02-b1b2c3d4-e5f6-7890-abcd-000000002002#DatasetPackage1",
"creationInfo": "_:creationinfo",
"name": "FaceExpress-50K",
"summary": "Labeled image dataset of facial expressions for emotion recognition model training.",
"description": "50,000 labeled photographs of human faces showing one of seven basic emotions (angry, disgust, fear, happy, neutral, sad, surprise). Images collected in controlled laboratory conditions and in the wild. Intended for training and evaluating facial expression recognition models.",
"software_packageVersion": "1.2.0",
"software_primaryPurpose": "data",
"software_downloadLocation": "https://example.com/datasets/faceexpress-50k",
"software_copyrightText": "Copyright 2024 Computer Vision Research Lab",
"software_artifactSize": 21474836480,
"builtTime": "2024-01-15T00:00:00Z",
"releaseTime": "2024-07-01T00:00:00Z",
"intendedUse": "Training and evaluation of facial expression recognition and emotion detection models. Not intended for direct deployment in consumer applications without additional validation.",
"dataset_datasetType": [
"image"
],
"dataset_datasetAvailability": "registration",
"dataset_confidentialityLevel": "clear",
"dataset_dataCollectionProcess": "Images sourced from three channels: (1) laboratory sessions with consenting adult volunteers photographed under 4 different lighting conditions; (2) licensed stock photographs from commercial providers; (3) Creative Commons-licensed images from Flickr. Each image independently labeled by 3 annotators; majority vote used as ground truth. Annotators were trained on Paul Ekman's FACS basic emotion categories.",
"dataset_dataPreprocessing": [
"Face detection and alignment using MTCNN",
"Crop to 224x224 pixels centered on detected face bounding box",
"Images with face detection confidence below 0.9 excluded",
"Conversion to RGB; grayscale images excluded"
],
"dataset_knownBias": [
"Demographic imbalance: approximately 62% images of perceived lighter skin tones",
"Laboratory-sourced images may overrepresent posed rather than spontaneous expressions",
"Age distribution skewed toward 20-35 age group (58% of dataset)"
],
"dataset_hasSensitivePersonalInformation": "yes",
"verifiedUsing": [
{
"type": "Hash",
"algorithm": "sha256",
"hashValue": "a7f3b92e1cd458f9026e3a4b57c8d1e0f2a9b4c6d8e0f1a2b3c4d5e6f7a8b9c0"
}
],
"comment": "SPDX 3.1 CHANGES: (1) 'dataset_datasetSize: 21474836480' (bytes, SPDX 3.0, deprecated) renamed to 'software_artifactSize: 21474836480' (~20 GB) in SPDX 3.1. (2) 'dataset_intendedUse' (SPDX 3.0, deprecated) replaced by Core-level 'intendedUse' property. Compare with SPDX 3.0 version in spdx3.0/example02.spdx3.json."
},
{
"type": "Relationship",
"spdxId": "https://spdx.org/spdxdocs/dataset-example02-b1b2c3d4-e5f6-7890-abcd-000000002002#Relationship1",
"creationInfo": "_:creationinfo",
"relationshipType": "hasDeclaredLicense",
"from": "https://spdx.org/spdxdocs/dataset-example02-b1b2c3d4-e5f6-7890-abcd-000000002002#DatasetPackage1",
"to": [
"https://spdx.org/spdxdocs/dataset-example02-b1b2c3d4-e5f6-7890-abcd-000000002002#LicenseExpression1"
]
},
{
"type": "Relationship",
"spdxId": "https://spdx.org/spdxdocs/dataset-example02-b1b2c3d4-e5f6-7890-abcd-000000002002#Relationship2",
"creationInfo": "_:creationinfo",
"relationshipType": "hasConcludedLicense",
"from": "https://spdx.org/spdxdocs/dataset-example02-b1b2c3d4-e5f6-7890-abcd-000000002002#DatasetPackage1",
"to": [
"https://spdx.org/spdxdocs/dataset-example02-b1b2c3d4-e5f6-7890-abcd-000000002002#LicenseExpression1"
]
},
{
"type": "simplelicensing_LicenseExpression",
"spdxId": "https://spdx.org/spdxdocs/dataset-example02-b1b2c3d4-e5f6-7890-abcd-000000002002#LicenseExpression1",
"creationInfo": "_:creationinfo",
"simplelicensing_licenseExpression": "LicenseRef-FaceExpress-NonCommercial-Research"
}
]
}
Loading