This repo is the source of https://www.tidymodels.org, and this readme tells you how it all works.
-
If you spot any small problems with the website, please feel empowered to fix them directly with a PR.
-
If you see any larger problems, an issue is probably better: that way we can discuss the problem before you commit any time to it.
This repo (and resulting website) is licensed as CC BY-SA.
When updating the site, the goal is to use the most recent CRAN versions of the modeling/data analysis packages.
-
Get a local copy of the website source.
- Users of devtools/usethis can do:
Note that
usethis::create_from_github("tidymodels/tidymodels.org")
usethis::create_from_github()works best when it can find a GitHub personal access token and usethis (git2r, really) is configured correctly for your preferred transport protocol (SSH vs HTTPS). Setup advice. - Otherwise, use your favorite method to fork and clone or download the repo as a ZIP file and unpack.
- Users of devtools/usethis can do:
-
Start R in your new
tidymodels.org/directory. -
To install the required packages, run the code within
installs.RThis file will also install the
keraspython libraries and environments. -
Restart R.
-
You should now be able to render the site in all the usual ways for quarto by calling
quarto render.
We use the latest release version of quarto. You can install and manage different version with qvm.
The website is deployed to GitHub Pages via the publish.yml workflow.
The source of the website is a collection of .qmd files stored in the folders in this repository. This site is then rendered as a Quarto html website.
-
packages/: this is a top-level page on the site rendered from a single.qmdfile. -
start/: these files make up a 5-part tutorial series to help users get started with tidymodels. Each article is an.qmdfile as a page bundle, meaning that each article is in its own folder along with accompanying images, data, and rendered figures. -
learn/: these files make up the articles presented in the learn section. This section is nested, meaning that inside this section, there are actually 4 subsections:models,statistics,work,develop. Each article is an.qmdfile. -
help/: this is a top-level page on the site rendered from a single.qmdfile. -
contribute/: this is a top-level page on the site rendered from a single.qmdfile. -
books/: these files make up the books page, linked from resource stickies. To add a new book, create a new folder with a new.qmdfile inside namedindex.qmd. An image file of the cover should be added in the same folder, namedcover.*. -
find/: these files make up the find page, linked from the top navbar and resource stickies. Each of these pages is an.qmdfile. The CSV data files in this directory are generated by scripts inmake_function_lists/. -
make_function_lists/: scripts that generate the CSV reference lists for the find pages. See Generating function lists below.
This repo uses two Quarto profiles to split behavior between local and CI rendering:
_quarto-local.yml(default): used when rendering locally. Defines post-render scripts such aspost-render.Randpost-render-downlit.R._quarto-production.yml: used in CI viaQUARTO_PROFILE: productioninpublish.yml. Also runspost-render-downlit.Rso code linking applies to all HTML files including frozen pages.
When adding a script that should only run locally, add it to _quarto-local.yml. If it should run in CI, add it to _quarto-production.yml and ensure the workflow installs the needed dependencies.
R functions in code blocks are hyperlinked to their documentation via the downlit package, enabled with code-link: true in _quarto.yml.
Because library(tidymodels) is not automatically expanded by downlit (unlike library(tidyverse)), post-render-downlit.R explicitly seeds the package list via tidymodels::tidymodels_packages() so functions like step_*, tune(), etc. are linked correctly.
Every .qmd file that contains R code declares its package dependencies in the YAML front matter using the r-packages field:
r-packages:
- tidymodels
- ranger
- kableExtraConvention: list only packages that are not already members of the tidymodels meta-package. The full list of tidymodels members can be checked with tidymodels::tidymodels_packages(). For example, dplyr, ggplot2, modeldata, tune, and rlang are all covered by listing tidymodels and should not be listed separately.
This metadata is the foundation for tooling that can:
- install exactly the packages needed for a given page
- selectively re-render only pages affected by a package release
Pure prose pages (no R code chunks) do not need this field.
-
To add a new post to
learn/, add a new folder with aindex.qmdfile in it and adapt the YAML header from an existing post. If new packages are required to run this post, then add them to thepackagesobject ininstalls.Rand to ther-packagesfield in the new post's YAML front matter. -
To preview the site, render it locally with the latest quarto release version.
-
The site is currently rendered locally (macOS), not in CI. Rendered outputs are committed to the repo — the freeze cache (
_freeze/) and the.mdfiles kept viakeep-md: true— and those files are what gets deployed. Always include them in your PR. -
Rendering in CI via a PR comment: If you'd prefer not to render locally, comment
/renderon your open PR. A GitHub Actions workflow (render-pr.yml) will detect which.qmdfiles changed, install the needed packages, render those pages, and commit the output back to your branch. It posts a comment when done (or links to the failed run on error). Only repo owners, org members, and collaborators can trigger this. -
Note on platform differences: As the automated nightly re-render (
check-cran-releases.yml) matures, pages will increasingly be rendered on Linux (Ubuntu) rather than macOS. The first time a page is re-rendered in CI you may see numerical differences in the output — floating point results can vary slightly between platforms due to differences in BLAS/LAPACK libraries and other system-level factors. These differences are expected and not a sign of a bug, but should be reviewed before merging the automated PR. -
keep-md: trueis set in_quarto.ymlso that rendered.mdfiles are committed alongside the source. This makes it possible to review in a PR whether code produced different results than before. -
To do a complete rerender, run
re-render.Rscript.
We try to do a rerender after a release of a main package.
-
Make sure that
all_packages.Ris up to date. -
Run
installs.Rscript. Make sure to check that dev versions aren't present. -
Run
re-render.Rscript.
To re-render only the pages affected by one or more package updates, use re-render-package.R:
Rscript re-render-package.R ranger
Rscript re-render-package.R ranger glmnet # union of affected pages, deduped
Rscript re-render-package.R tidymodels # all pages that use tidymodels
Rscript re-render-package.R --all # every page on the siteThis reads package_map.json to find affected pages, clears their freeze cache, and re-renders them.
-
package_map.json: maps each package to the pages that depend on it. Regenerate after changing anyr-packages:field:Rscript make_package_map.R
-
_versions.json: records the installed package versions at the time of the last render. Update after any re-render:Rscript make_versions.R
The check-cran-releases.yml workflow runs on weekdays at 4am Pacific time. It compares current CRAN versions against _versions.json and, if any packages have updated, automatically:
- Installs only the packages needed for the affected pages (via
install_for_packages.R, which uses the sharedinstall_packages.Rhelper) - Re-renders the affected pages
- Updates
_versions.jsonandpackage_map.json - Opens a pull request for review, including the old and new versions of each updated package
If any page fails to render, an issue is opened instead of a PR, with a link to the failed workflow run. The _versions.json and package_map.json are not updated on failure, so the workflow will retry on the next run.
You can also trigger it manually from the GitHub Actions UI, or with the gh CLI:
# Normal version check
gh workflow run check-cran-releases.yml
# Force re-render for specific packages
gh workflow run check-cran-releases.yml -f packages="ranger glmnet"
# Re-render every page
gh workflow run check-cran-releases.yml -f packages="--all"The find/ pages display searchable tables of functions, models, and recipe steps. The data for these tables comes from CSV files generated by scripts in make_function_lists/.
Regenerate the function lists when:
- New packages are added to the tidymodels ecosystem
- After major CRAN releases of tidymodels packages
- When new models or recipe steps are added
To regenerate all function lists:
Rscript make_function_lists/run_all.RTo force a fresh run (ignoring cache):
Rscript make_function_lists/run_all.R --freshTo run individual generators:
Rscript make_function_lists/broom.R
Rscript make_function_lists/recipes.R
Rscript make_function_lists/tidymodels.R
Rscript make_function_lists/parsnip.R
Rscript make_function_lists/sparse.R
Rscript make_function_lists/tidyclust.R