We have reverted the change made in hardhat 1.0.0 that caused recipe preprocessors to drop non-standard roles by default when calling
forge(). Determining what roles are required at
bake()time is really something that should be controlled within recipes, not hardhat. This results in the following changes (#207):
The new argument,
bake_dependent_roles, that was added to
default_recipe_blueprint()in 1.0.0 has been removed. It is no longer needed with the new behavior.
forge()will pass on all columns from
bake()except those with roles of
outcomes = TRUE, it will also pass on the
"outcome"role. This is essentially the same as the pre-1.0.0 behavior, and means that, by default, all non-standard roles are required at
bake()time. This assumption is now also enforced by recipes 1.0.0, even if you aren’t using hardhat or a workflow.
In the development version of recipes, which will become recipes 1.0.0, there is a new
update_role_requirements()function that can be used to declare that a role is not required at
bake()time. hardhat now knows how to respect that feature, and in
forge()it won’t pass on columns of
bake()that have roles that aren’t required at
CRAN release: 2022-06-10
Fixed a bug where the results from calling
mold()using hardhat < 1.0.0 were no longer compatible with calling
forge()in hardhat >= 1.0.0. This could occur if you save a workflow object after fitting it, then load it into an R session that uses a newer version of hardhat (#200).
Internal details related to how blueprints work alongside
forge()were heavily re-factored to support the fix for #200. These changes are mostly internal or developer focused. They include:
Blueprints no longer store the clean/process functions used when calling
forge(). These were stored in
blueprint$forge$process()and were strictly for internal use. Storing them in the blueprint caused problems because blueprints created with old versions of hardhat were unlikely to be compatible with newer versions of hardhat. This change means that
new_blueprint()and the other blueprint constructors no longer have
run_mold()has been repurposed. Rather than calling the
$process()functions (which, as mentioned above, are no longer in the blueprint), the methods for this S3 generic have been rewritten to directly call the current versions of the clean and process functions that live in hardhat. This should result in less accidental breaking changes.
CRAN release: 2022-06-01
Recipe preprocessors now ignore non-standard recipe roles (i.e. not
"predictor") by default when calling
forge(). Previously, it was assumed that all non-standard role columns present in the original training data were also required in the test data when
forge()is called. It seems to be more often the case that those columns are actually not required to
bake()new data, and often won’t even be present when making predictions on new data. For example, a custom
"case_weights"role might be required for computing case-weighted estimates at
prep()time, but won’t be necessary at
bake()time (since the estimates have already been pre-computed and stored). To account for the case when you do require a specific non-standard role to be present at
default_recipe_blueprint()has gained a new argument,
bake_dependent_roles, which can be set to a character vector of non-standard roles that are required.
rlang >=1.0.2 and vctrs >=0.4.1 are now required.
Bumped required R version to
>= 3.4.0to reflect tidyverse standards.
CRAN release: 2022-01-24
CRAN release: 2021-07-14
Added a new family of
extract_*()S3 generics for extracting important components from various tidymodels objects. S3 methods will be defined in other tidymodels packages. For example, tune will register an
extract_workflow()method to easily extract the workflow embedded within the result of
CRAN release: 2020-11-09
create_modeling_package()) now ensures that all generated functions are templated on the model name. This makes it easier to add multiple models to the same package (#152).
CRAN release: 2020-07-02
indicators = "none"in
default_formula_blueprint()no longer accidentally expands character columns into dummy variable columns. They are now left completely untouched and pass through as characters. When
indicators = "traditional"or
indicators = "one_hot", character columns are treated as unordered factors (#139).
default_formula_blueprint()now takes character input rather than logical. To update:
indicators = TRUE -> indicators = "traditional" indicators = FALSE -> indicators = "none"
Logical input for
indicatorswill continue to work, with a warning, until hardhat 0.1.6, where it will be formally deprecated.
There is also a new
indicators = "one_hot"option which expands all factor columns into
Kdummy variable columns corresponding to the
Klevels of that factor, rather than the more traditional
K - 1expansion.
CRAN release: 2020-05-20
Updated to stay current with the latest vctrs 0.3.0 conventions.
scream()is now stricter when checking ordered factor levels in new data against the
ptypeused at training time. Ordered factors must now have exactly the same set of levels at training and prediction time. See
?screamfor a new graphic outlining how factor levels are handled (#132).
CRAN release: 2020-02-28
CRAN release: 2020-01-08