Skip to content

tidymodels/tidymodels.org

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

432 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Creative Commons License

tidymodels.org

This repo is the source of https://www.tidymodels.org, and this readme tells you how it all works.

  • If you spot any small problems with the website, please feel empowered to fix them directly with a PR.

  • If you see any larger problems, an issue is probably better: that way we can discuss the problem before you commit any time to it.

This repo (and resulting website) is licensed as CC BY-SA.

Requirements to preview the site locally

R packages

When updating the site, the goal is to use the most recent CRAN versions of the modeling/data analysis packages.

  1. Get a local copy of the website source.

    • Users of devtools/usethis can do:
      usethis::create_from_github("tidymodels/tidymodels.org")
      Note that usethis::create_from_github() works best when it can find a GitHub personal access token and usethis (git2r, really) is configured correctly for your preferred transport protocol (SSH vs HTTPS). Setup advice.
    • Otherwise, use your favorite method to fork and clone or download the repo as a ZIP file and unpack.
  2. Start R in your new tidymodels.org/ directory.

  3. To install the required packages, run the code within

    installs.R
    

    This file will also install the keras python libraries and environments.

  4. Restart R.

  5. You should now be able to render the site in all the usual ways for quarto by calling quarto render.

Quarto

We use the latest release version of quarto. You can install and manage different version with qvm.

The website is deployed to GitHub Pages via the publish.yml workflow.

Structure

The source of the website is a collection of .qmd files stored in the folders in this repository. This site is then rendered as a Quarto html website.

  • packages/: this is a top-level page on the site rendered from a single .qmd file.

  • start/: these files make up a 5-part tutorial series to help users get started with tidymodels. Each article is an .qmd file as a page bundle, meaning that each article is in its own folder along with accompanying images, data, and rendered figures.

  • learn/: these files make up the articles presented in the learn section. This section is nested, meaning that inside this section, there are actually 4 subsections: models, statistics, work, develop. Each article is an .qmd file.

  • help/: this is a top-level page on the site rendered from a single .qmd file.

  • contribute/: this is a top-level page on the site rendered from a single .qmd file.

  • books/: these files make up the books page, linked from resource stickies. To add a new book, create a new folder with a new .qmd file inside named index.qmd. An image file of the cover should be added in the same folder, named cover.*.

  • find/: these files make up the find page, linked from the top navbar and resource stickies. Each of these pages is an .qmd file. The CSV data files in this directory are generated by scripts in make_function_lists/.

  • make_function_lists/: scripts that generate the CSV reference lists for the find pages. See Generating function lists below.

Quarto profiles

This repo uses two Quarto profiles to split behavior between local and CI rendering:

  • _quarto-local.yml (default): used when rendering locally. Defines post-render scripts such as post-render.R and post-render-downlit.R.
  • _quarto-production.yml: used in CI via QUARTO_PROFILE: production in publish.yml. Also runs post-render-downlit.R so code linking applies to all HTML files including frozen pages.

When adding a script that should only run locally, add it to _quarto-local.yml. If it should run in CI, add it to _quarto-production.yml and ensure the workflow installs the needed dependencies.

Code linking

R functions in code blocks are hyperlinked to their documentation via the downlit package, enabled with code-link: true in _quarto.yml.

Because library(tidymodels) is not automatically expanded by downlit (unlike library(tidyverse)), post-render-downlit.R explicitly seeds the package list via tidymodels::tidymodels_packages() so functions like step_*, tune(), etc. are linked correctly.

Package metadata

Every .qmd file that contains R code declares its package dependencies in the YAML front matter using the r-packages field:

r-packages:
  - tidymodels
  - ranger
  - kableExtra

Convention: list only packages that are not already members of the tidymodels meta-package. The full list of tidymodels members can be checked with tidymodels::tidymodels_packages(). For example, dplyr, ggplot2, modeldata, tune, and rlang are all covered by listing tidymodels and should not be listed separately.

This metadata is the foundation for tooling that can:

  • install exactly the packages needed for a given page
  • selectively re-render only pages affected by a package release

Pure prose pages (no R code chunks) do not need this field.

Workflow

  • To add a new post to learn/, add a new folder with a index.qmd file in it and adapt the YAML header from an existing post. If new packages are required to run this post, then add them to the packages object in installs.R and to the r-packages field in the new post's YAML front matter.

  • To preview the site, render it locally with the latest quarto release version.

  • The site is currently rendered locally (macOS), not in CI. Rendered outputs are committed to the repo — the freeze cache (_freeze/) and the .md files kept via keep-md: true — and those files are what gets deployed. Always include them in your PR.

  • Rendering in CI via a PR comment: If you'd prefer not to render locally, comment /render on your open PR. A GitHub Actions workflow (render-pr.yml) will detect which .qmd files changed, install the needed packages, render those pages, and commit the output back to your branch. It posts a comment when done (or links to the failed run on error). Only repo owners, org members, and collaborators can trigger this.

  • Note on platform differences: As the automated nightly re-render (check-cran-releases.yml) matures, pages will increasingly be rendered on Linux (Ubuntu) rather than macOS. The first time a page is re-rendered in CI you may see numerical differences in the output — floating point results can vary slightly between platforms due to differences in BLAS/LAPACK libraries and other system-level factors. These differences are expected and not a sign of a bug, but should be reviewed before merging the automated PR.

  • keep-md: true is set in _quarto.yml so that rendered .md files are committed alongside the source. This makes it possible to review in a PR whether code produced different results than before.

  • To do a complete rerender, run re-render.R script.

Rerender

We try to do a rerender after a release of a main package.

  • Make sure that all_packages.R is up to date.

  • Run installs.R script. Make sure to check that dev versions aren't present.

  • Run re-render.R script.

Selective re-render

To re-render only the pages affected by one or more package updates, use re-render-package.R:

Rscript re-render-package.R ranger
Rscript re-render-package.R ranger glmnet   # union of affected pages, deduped
Rscript re-render-package.R tidymodels      # all pages that use tidymodels
Rscript re-render-package.R --all           # every page on the site

This reads package_map.json to find affected pages, clears their freeze cache, and re-renders them.

Supporting files

  • package_map.json: maps each package to the pages that depend on it. Regenerate after changing any r-packages: field:

    Rscript make_package_map.R
  • _versions.json: records the installed package versions at the time of the last render. Update after any re-render:

    Rscript make_versions.R

Automated re-renders via GitHub Actions

The check-cran-releases.yml workflow runs on weekdays at 4am Pacific time. It compares current CRAN versions against _versions.json and, if any packages have updated, automatically:

  1. Installs only the packages needed for the affected pages (via install_for_packages.R, which uses the shared install_packages.R helper)
  2. Re-renders the affected pages
  3. Updates _versions.json and package_map.json
  4. Opens a pull request for review, including the old and new versions of each updated package

If any page fails to render, an issue is opened instead of a PR, with a link to the failed workflow run. The _versions.json and package_map.json are not updated on failure, so the workflow will retry on the next run.

You can also trigger it manually from the GitHub Actions UI, or with the gh CLI:

# Normal version check
gh workflow run check-cran-releases.yml

# Force re-render for specific packages
gh workflow run check-cran-releases.yml -f packages="ranger glmnet"

# Re-render every page
gh workflow run check-cran-releases.yml -f packages="--all"

Generating function lists

The find/ pages display searchable tables of functions, models, and recipe steps. The data for these tables comes from CSV files generated by scripts in make_function_lists/.

When to regenerate

Regenerate the function lists when:

  • New packages are added to the tidymodels ecosystem
  • After major CRAN releases of tidymodels packages
  • When new models or recipe steps are added

How to run

To regenerate all function lists:

Rscript make_function_lists/run_all.R

To force a fresh run (ignoring cache):

Rscript make_function_lists/run_all.R --fresh

To run individual generators:

Rscript make_function_lists/broom.R
Rscript make_function_lists/recipes.R
Rscript make_function_lists/tidymodels.R
Rscript make_function_lists/parsnip.R
Rscript make_function_lists/sparse.R
Rscript make_function_lists/tidyclust.R

About

Source of tidymodels.org

Resources

Stars

Watchers

Forks

Contributors