mlwiz.evaluation

evaluation.config

class mlwiz.evaluation.config.Config(config_dict: dict)

Bases: object

Simple class to manage the configuration dictionary as a Python object with fields.

Parameters:

config_dict (dict) – the configuration dictionary

get(key: str, default: object | None = None) object

Returns the key from the dictionary if present, otherwise the default value specified

Parameters:
  • key (str) – the key to look up in the dictionary

  • default (object) – the default object

Returns:

a value from the dictionary

items() list

Invokes the items() method of the configuration dictionary

Returns:

a list of (key, value) pairs

keys() set

Invokes the keys() method of the configuration dictionary

Returns:

the set of keys in the dictionary

evaluation.evaluator

class mlwiz.evaluation.evaluator.RiskAssesser(outer_folds: int, inner_folds: int, experiment_class: Callable[[...], Experiment], exp_path: str, splits_filepath: str, model_configs: Grid | RandomSearch, risk_assessment_training_runs: int, model_selection_training_runs: int, higher_is_better: bool, gpus_per_task: float, base_seed: int = 42, training_timeout_seconds: int = -1)

Bases: object

Class implementing a K-Fold technique to do Risk Assessment (estimate of the true generalization performances) and K-Fold Model Selection (select the best hyper-parameters for each external fold

Parameters:
  • outer_folds (int) – The number K of outer TEST folds. You should have generated the splits accordingly

  • outer_folds – The number K of inner VALIDATION folds. You should have generated the splits accordingly

  • experiment_class – (Callable[…, Experiment]): the experiment class to be instantiated

  • exp_path (str) – The folder in which to store all results

  • splits_filepath (str) – The splits filepath with additional meta information

  • model_configs – (Union[Grid, RandomSearch]): an object storing all possible model configurations, e.g., config.base.Grid

  • risk_assessment_training_runs (int) – no of final training runs to mitigate bad initializations

  • risk_assessment_training_runs – no of training runs to mitigate bad initializations at model selection time

  • higher_is_better (bool) – whether the best model for each external fold should be selected by higher or lower score values

  • gpus_per_task (float) – Number of gpus to assign to each experiment. Can be < 1.

  • base_seed (int) – Seed used to generate experiments seeds. Used to replicate results. Default is 42

  • training_timeout_seconds (int) – optional timeout limit per experiment in seconds

compute_best_hyperparameters(folder: str, outer_k: int, no_configurations: int, skip_config_ids: List[int])

Chooses the best hyper-parameters configuration using the proper validation mean score.

Parameters:
  • folder (str) – the model selection folder associated with outer fold k

  • outer_k (int) – the current outer fold to consider. Used for telegram updates

  • no_configurations (int) – number of possible configurations

  • skip_config_ids – list of configuration ids to skip

compute_final_runs_score_per_fold(outer_k: int)

Computes the average scores for the final runs of a specific outer fold

Parameters:

outer_k (int) – id of the outer fold from 0 to K-1

compute_risk_assessment_result()

Aggregates Outer Folds results and compute Training and Test mean/std

model_selection(kfold_folder: str, outer_k: int, debug: bool, execute_config_id: int | None, skip_config_ids: List[int])

Performs model selection.

Parameters:
  • kfold_folder – The root folder for model selection

  • outer_k – the current outer fold to consider

  • debug – if True, sequential execution is performed and logs are printed to screen

  • execute_config_id – if debug mode is enabled, it will prioritize the execution of this configuration. It assumes indices start from 1. Use this to debug specific configurations.

  • skip_config_ids – if provided, the provided list of configurations will not be considered for model selection. Use it, for instance, when a run is taking too long to execute and you decide it is not worth to wait for it.

process_config_results_across_inner_folds(config_folder: str, config: Config)

Averages the results for each configuration across inner folds and stores it into a file.

Parameters:
  • config_folder (str)

  • config (Config) – the configuration object

process_model_selection_runs(inner_fold_exp_folder: str, inner_k: int)
Computes the average performances for the training runs about

a specific configuration and a specific inner_fold split

Parameters:
  • inner_fold_exp_folder (str) – an inner fold experiment folder of a specific configuration

  • inner_k (int) – the inner fold id

risk_assessment(debug: bool, execute_config_id: int | None = None, skip_config_ids: List[int] | None = None)

Performs risk assessment to evaluate the performances of a model.

Parameters:
  • debug – if True, sequential execution is performed and logs are printed to screen

  • execute_config_id – if debug mode is enabled, it will prioritize the execution of this configuration for each model selection procedure. It assumes indices start from 1. Use this to debug specific configurations.

  • skip_config_ids – if provided, the provided list of configurations will not be considered for model selection. Use it, for instance, when a run is taking too long to execute and you decide it is not worth to wait for it.

run_final_model(outer_k: int, debug: bool)

Performs the final runs once the best model for outer fold outer_k has been chosen.

Parameters:
  • outer_k (int) – the current outer fold to consider

  • debug (bool) – if True, sequential execution is performed and logs are printed to screen

wait_configs(skip_config_ids: List[int])

Waits for configurations to terminate and updates the state of the progress manager

mlwiz.evaluation.evaluator.extract_and_sum_elapsed_seconds(file_path)
mlwiz.evaluation.evaluator.run_test(experiment_class: Callable[[...], Experiment], dataset_getter: Callable[[...], DataProvider], best_config: dict, outer_k: int, run_id: int, final_run_exp_path: str, final_run_torch_path: str, exp_seed: int, training_timeout_seconds: int, logger: Logger) Tuple[int, int, float]

Ray job that performs a risk assessment run and returns bookkeeping information for the progress manager.

Parameters:
  • experiment_class – (Callable[…, Experiment]): the class of the experiment to instantiate

  • dataset_getter – (Callable[…, DataProvider]): the class of the data provider to instantiate

  • best_config (dict) – the best configuration to use for this specific outer fold

  • run_id (int) – the id of the final run (for bookkeeping reasons)

  • final_run_exp_path (str) – path of the experiment root folder

  • final_run_torch_path (str) – path where to store the results of the experiment

  • exp_seed (int) – seed of the experiment

  • training_timeout_seconds (int) – timeout for the experiment in seconds

  • logger (Logger) – a logger to log information in the appropriate file

Returns:

a tuple with outer fold id, final run id, and time elapsed

mlwiz.evaluation.evaluator.run_valid(experiment_class: Callable[[...], Experiment], dataset_getter: Callable[[...], DataProvider], config: dict, config_id: int, run_id: int, fold_exp_folder: str, fold_results_torch_path: str, exp_seed: int, training_timeout_seconds: int, logger: Logger) Tuple[int, int, int, int, float]

Ray job that performs a model selection run and returns bookkeeping information for the progress manager.

Parameters:
  • experiment_class – (Callable[…, Experiment]): the class of the experiment to instantiate

  • dataset_getter – (Callable[…, DataProvider]): the class of the data provider to instantiate

  • config (dict) – the configuration of this specific experiment

  • config_id (int) – the id of the configuration (for bookkeeping reasons)

  • run_id (int) – the id of the training run (for bookkeeping reasons)

  • fold_exp_folder (str) – path of the experiment root folder

  • fold_results_torch_path (str) – path where to store the results of the experiment

  • exp_seed (int) – seed of the experiment

  • training_timeout_seconds (int) – timeout for the experiment in seconds

  • logger (Logger) – a logger to log information in the appropriate file

Returns:

a tuple with outer fold id, inner fold id, config id, run id,

and time elapsed

mlwiz.evaluation.evaluator.send_telegram_update(bot_token: str, bot_chat_ID: str, bot_message: str)

Sends a message using Telegram APIs. Markdown can be used.

Parameters:
  • bot_token (str) – token of the user’s bot

  • bot_chat_ID (str) – identifier of the chat where to write the message

  • bot_message (str) – the message to be sent

evaluation.grid

class mlwiz.evaluation.grid.Grid(configs_dict: dict)

Bases: object

Class that implements grid-search. It computes all possible configurations starting from a suitable config file.

Parameters:

configs_dict (dict) – the configuration dictionary specifying the different configurations to try

_gen_configs() List[dict]

Takes a dictionary of key:list pairs and computes all possible combinations.

Returns:

A list of al possible configurations in the form of dictionaries

_gen_helper(cfgs_dict: dict) dict

Helper generator that yields one possible configuration at a time.

_list_helper(values: object) object

Recursively parses lists of possible options for a given hyper-parameter.

property exp_name: str

Computes the name of the root folder

Returns:

the name of the root folder as made of EXP-NAME_DATASET-NAME

property num_configs: int

Computes the number of configurations to try during model selection

Returns:

the number of configurations

evaluation.util

class mlwiz.evaluation.util.ProgressManager(outer_folds, inner_folds, no_configs, final_runs, show=True)

Bases: object

Class that is responsible for drawing progress bars.

Parameters:
  • outer_folds (int) – number of external folds for model assessment

  • inner_folds (int) – number of internal folds for model selection

  • no_configs (int) – number of possible configurations in model selection

  • final_runs (int) – number of final runs per outer fold once the best model has been selected

  • show (bool) – whether to show the progress bar or not. Default is True

_init_assessment_pbar(i: int)

Initializes the progress bar for risk assessment

Parameters:

i (int) – the id of the outer fold (from 0 to outer folds - 1)

_init_selection_pbar(i: int, j: int)

Initializes the progress bar for model selection

Parameters:
  • i (int) – the id of the outer fold (from 0 to outer folds - 1)

  • j (int) – the id of the inner fold (from 0 to inner folds - 1)

refresh()

Refreshes the progress bar

Prints the footer of the progress bar

show_header()

Prints the header of the progress bar

update_state(msg: dict)

Updates the state of the progress bar (different from showing it on screen, see refresh()) once a message is received

Parameters:

msg (dict) – message with updates to be parsed

mlwiz.evaluation.util._df_to_latex_table(df, no_decimals=2, model_as_row=True)
mlwiz.evaluation.util.choice(*args)

Implements a random choice between a list of values

mlwiz.evaluation.util.clear_screen()

Clears the CLI interface.

mlwiz.evaluation.util.create_dataframe(config_list: List[dict], key_mappings: List[Tuple[str, Callable]])

Creates a pandas DataFrame from a list of configuration dictionaries and key mappings.

Parameters:
  • config_list – List[dict] A list of dictionaries, where each dictionary represents a configuration. Each configuration must contain an exp_folder key and may include nested keys corresponding to hyperparameter names.

  • key_mappings – List[Tuple[str, Callable]] A list of tuples where: - The first element (str) is the hyperparameter name to extract from the configurations. - The second element (Callable) is a transformation function to apply to the extracted value.

Returns:

pandas.DataFrame

A DataFrame containing rows generated from config_list with columns for exp_folder and the specified key_mappings. If a mapping value is missing, the corresponding DataFrame cell will contain None.

Return type:

df

mlwiz.evaluation.util.create_latex_table_from_assessment_results(exp_metadata, metric_key='main_score', no_decimals='2', model_as_row=True, use_single_outer_fold=False) str

Creates a LaTeX table from a list of experiment folders, each containing assessment results. :param exp_metadata: A list of (paths to the experiment folder, model name, dataset name). :type exp_metadata: list[tuple(str,str,str)] :param metric_key: The key for the metric to extract. Default is ‘main_score’. :type metric_key: str :param no_decimals: The number of rounded decimal places to display in the LaTeX table. :type no_decimals: int :param model_as_row: If True, models are rows and datasets are columns. If False, the opposite. :type model_as_row: bool :param use_single_outer_fold: If True, only the first outer fold is used. This is useful

because when the number of outer folds is 1, the std in the assessment file is 0, therefore we want to recover the std across the final runs of the unique outer fold.

mlwiz.evaluation.util.filter_experiments(config_list: List[dict], logic: bool = 'AND', parameters: dict = {})

Filters the list of configurations returned by the method retrieve_experiments according to a dictionary. The dictionary contains the keys and values of the configuration files you are looking for.

If you specify more then one key/value pair to look for, then the logic parameter specifies whether you want to filter using the AND/OR rule.

For a key, you can specify more than one possible value you are interested in by passing a list as the value, for instance {‘device’: ‘cpu’, ‘lr’: [0.1, 0.01]}

Parameters:
  • config_list – The list of configuration files

  • logic – if AND, a configuration is selected iff all conditions are satisfied. If OR, a config is selected when at least one of the criteria is met.

  • parameters – dictionary with parameters used to filter the configurations

Returns:

a list of filtered configurations like the one in input

mlwiz.evaluation.util.get_scores_from_assessment_results(exp_folder, metric_key='main_score') dict

Extracts scores from the configuration dictionary. :param exp_folder: The path to the experiment folder. :type exp_folder: str :param metric_key: The key for the metric to extract. Default is ‘main_score’. :type metric_key: str

mlwiz.evaluation.util.get_scores_from_outer_results(exp_folder, outer_fold_id, metric_key='main_score') dict

Extracts scores from the configuration dictionary. :param exp_folder: The path to the experiment folder. :type exp_folder: str :param outer_fold_id: The ID of the outer fold, from 1 on. :type outer_fold_id: int :param metric_key: The key for the metric to extract. Default is ‘main_score’. :type metric_key: str

mlwiz.evaluation.util.instantiate_data_provider_from_config(config: dict, splits_filepath: str, n_outer_folds: int, n_inner_folds: int) DataProvider

Instantiate a data provider from a configuration file. :param config (dict): the configuration file :param splits_filepath (str): the path to data splits file :param n_outer_folds (int): the number of outer folds :param n_inner_folds (int): the number of inner folds :return: an instance of DataProvider, i.e., the data provider

mlwiz.evaluation.util.instantiate_dataset_from_config(config: dict) DatasetInterface

Instantiate a dataset from a configuration file.

Parameters:

(dict) (config) – the configuration file

Returns:

an instance of DatasetInterface, i.e., the dataset

mlwiz.evaluation.util.instantiate_model_from_config(config: dict, dataset: DatasetInterface) ModelInterface

Instantiate a model from a configuration file. :param config (dict): the configuration file :param dataset (DatasetInterface): the dataset used in the experiment :return: an instance of ModelInterface, i.e., the model

mlwiz.evaluation.util.load_checkpoint(checkpoint_path: str, model: ModelInterface, device: torch.device)

Load a checkpoint from a checkpoint file into a model. :param checkpoint_path: the checkpoint file path :param model (ModelInterface): the model :param device (torch.device): the device, e.g, “cpu” or “cuda”

mlwiz.evaluation.util.loguniform(*args)

Performs a log-uniform random selection.

Parameters:

*args – a tuple of (log min, log max, [base]) to use. Base 10 is used if the third argument is not available.

Returns:

a randomly chosen value

mlwiz.evaluation.util.normal(*args)

Implements a univariate normal sampling given its parameters

mlwiz.evaluation.util.randint(*args)

Implements a random integer sampling in an interval

mlwiz.evaluation.util.retrieve_best_configuration(model_selection_folder) dict

Once the experiments are done, retrieves the winning configuration from a specific model selection folder, and returns it as a dictionaries

Parameters:

model_selection_folder – path to the folder of a model selection, that is, your_results_path/…./MODEL_SELECTION/

Returns:

a dictionary with info about the best configuration

mlwiz.evaluation.util.retrieve_experiments(model_selection_folder, skip_results_not_found: bool = False) List[dict]

Once the experiments are done, retrieves the config_results.json files of all configurations in a specific model selection folder, and returns them as a list of dictionaries

Parameters:
  • model_selection_folder – path to the folder of a model selection, that is, your_results_path/…./MODEL_SELECTION/

  • skip_results_not_found – whether to skip an experiment if a config_results.json file has not been produced yet. Useful when analyzing experiments while others still run.

Returns:

a list of dictionaries, one per configuration, each with an extra key “exp_folder” which identifies the config folder.

mlwiz.evaluation.util.uniform(*args)

Implements a uniform sampling given an interval