mlwiz.evaluation

evaluation.config

class mlwiz.evaluation.config.Config(config_dict: dict)

Bases: object

Simple class to manage the configuration dictionary as a Python object with fields.

Parameters:: config_dict (dict) – the configuration dictionary

get(key: str, default: object | None = None) → object

Returns the key from the dictionary if present, otherwise the default value specified

Parameters:

key (str) – the key to look up in the dictionary
default (object) – the default object

Returns:

a value from the dictionary

items() → list

Invokes the items() method of the configuration dictionary

Returns:: a list of (key, value) pairs

keys() → set

Invokes the keys() method of the configuration dictionary

Returns:: the set of keys in the dictionary

evaluation.evaluator

class mlwiz.evaluation.evaluator.RiskAssesser(outer_folds: int, inner_folds: int, experiment_class: Callable[[...], Experiment], exp_path: str, splits_filepath: str, model_configs: Grid | RandomSearch, risk_assessment_training_runs: int, model_selection_training_runs: int, higher_is_better: bool, gpus_per_task: float, base_seed: int = 42, training_timeout_seconds: int = -1)

Bases: object

Class implementing a K-Fold technique to do Risk Assessment (estimate of the true generalization performances) and K-Fold Model Selection (select the best hyper-parameters for each external fold

Parameters:

outer_folds (int) – The number K of outer TEST folds. You should have generated the splits accordingly
outer_folds – The number K of inner VALIDATION folds. You should have generated the splits accordingly
experiment_class – (Callable[…, Experiment]): the experiment class to be instantiated
exp_path (str) – The folder in which to store all results
splits_filepath (str) – The splits filepath with additional meta information
model_configs – (Union[Grid, RandomSearch]): an object storing all possible model configurations, e.g., config.base.Grid
risk_assessment_training_runs (int) – no of final training runs to mitigate bad initializations
risk_assessment_training_runs – no of training runs to mitigate bad initializations at model selection time
higher_is_better (bool) – whether the best model for each external fold should be selected by higher or lower score values
gpus_per_task (float) – Number of gpus to assign to each experiment. Can be < 1.
base_seed (int) – Seed used to generate experiments seeds. Used to replicate results. Default is 42
training_timeout_seconds (int) – optional timeout limit per experiment in seconds

compute_best_hyperparameters(folder: str, outer_k: int, no_configurations: int, skip_config_ids: List[int])

Chooses the best hyper-parameters configuration using the proper validation mean score.

Parameters:

folder (str) – the model selection folder associated with outer fold k
outer_k (int) – the current outer fold to consider. Used for telegram updates
no_configurations (int) – number of possible configurations
skip_config_ids – list of configuration ids to skip

compute_final_runs_score_per_fold(outer_k: int)

Computes the average scores for the final runs of a specific outer fold

Parameters:: outer_k (int) – id of the outer fold from 0 to K-1

compute_risk_assessment_result(): Aggregates Outer Folds results and compute Training and Test mean/std

model_selection(kfold_folder: str, outer_k: int, debug: bool, execute_config_id: int | None, skip_config_ids: List[int])

Performs model selection.

Parameters:

kfold_folder – The root folder for model selection
outer_k – the current outer fold to consider
debug – if True, sequential execution is performed and logs are printed to screen
execute_config_id – if debug mode is enabled, it will prioritize the execution of this configuration. It assumes indices start from 1. Use this to debug specific configurations.
skip_config_ids – if provided, the provided list of configurations will not be considered for model selection. Use it, for instance, when a run is taking too long to execute and you decide it is not worth to wait for it.

process_config_results_across_inner_folds(config_folder: str, config: Config)

Averages the results for each configuration across inner folds and stores it into a file.

Parameters:

config_folder (str)
config (Config) – the configuration object

process_model_selection_runs(inner_fold_exp_folder: str, inner_k: int)

Computes the average performances for the training runs about: a specific configuration and a specific inner_fold split

Parameters:

inner_fold_exp_folder (str) – an inner fold experiment folder of a specific configuration
inner_k (int) – the inner fold id

risk_assessment(debug: bool, execute_config_id: int | None = None, skip_config_ids: List[int] | None = None)

Performs risk assessment to evaluate the performances of a model.

Parameters:

debug – if True, sequential execution is performed and logs are printed to screen
execute_config_id – if debug mode is enabled, it will prioritize the execution of this configuration for each model selection procedure. It assumes indices start from 1. Use this to debug specific configurations.
skip_config_ids – if provided, the provided list of configurations will not be considered for model selection. Use it, for instance, when a run is taking too long to execute and you decide it is not worth to wait for it.

run_final_model(outer_k: int, debug: bool)

Performs the final runs once the best model for outer fold outer_k has been chosen.

Parameters:

outer_k (int) – the current outer fold to consider
debug (bool) – if True, sequential execution is performed and logs are printed to screen

wait_configs(skip_config_ids: List[int]): Waits for configurations to terminate and updates the state of the progress manager

mlwiz.evaluation.evaluator.extract_and_sum_elapsed_seconds(file_path)

mlwiz.evaluation.evaluator.run_test(experiment_class: Callable[[...], Experiment], dataset_getter: Callable[[...], DataProvider], best_config: dict, outer_k: int, run_id: int, final_run_exp_path: str, final_run_torch_path: str, exp_seed: int, training_timeout_seconds: int, logger: Logger) → Tuple[int, int, float]

Ray job that performs a risk assessment run and returns bookkeeping information for the progress manager.

Parameters:

experiment_class – (Callable[…, Experiment]): the class of the experiment to instantiate
dataset_getter – (Callable[…, DataProvider]): the class of the data provider to instantiate
best_config (dict) – the best configuration to use for this specific outer fold
run_id (int) – the id of the final run (for bookkeeping reasons)
final_run_exp_path (str) – path of the experiment root folder
final_run_torch_path (str) – path where to store the results of the experiment
exp_seed (int) – seed of the experiment
training_timeout_seconds (int) – timeout for the experiment in seconds
logger (Logger) – a logger to log information in the appropriate file

Returns:

a tuple with outer fold id, final run id, and time elapsed

mlwiz.evaluation.evaluator.run_valid(experiment_class: Callable[[...], Experiment], dataset_getter: Callable[[...], DataProvider], config: dict, config_id: int, run_id: int, fold_exp_folder: str, fold_results_torch_path: str, exp_seed: int, training_timeout_seconds: int, logger: Logger) → Tuple[int, int, int, int, float]

Ray job that performs a model selection run and returns bookkeeping information for the progress manager.

Parameters:

experiment_class – (Callable[…, Experiment]): the class of the experiment to instantiate
dataset_getter – (Callable[…, DataProvider]): the class of the data provider to instantiate
config (dict) – the configuration of this specific experiment
config_id (int) – the id of the configuration (for bookkeeping reasons)
run_id (int) – the id of the training run (for bookkeeping reasons)
fold_exp_folder (str) – path of the experiment root folder
fold_results_torch_path (str) – path where to store the results of the experiment
exp_seed (int) – seed of the experiment
training_timeout_seconds (int) – timeout for the experiment in seconds
logger (Logger) – a logger to log information in the appropriate file

Returns:

a tuple with outer fold id, inner fold id, config id, run id,: and time elapsed

mlwiz.evaluation.evaluator.send_telegram_update(bot_token: str, bot_chat_ID: str, bot_message: str)

Sends a message using Telegram APIs. Markdown can be used.

Parameters:

bot_token (str) – token of the user’s bot
bot_chat_ID (str) – identifier of the chat where to write the message
bot_message (str) – the message to be sent

evaluation.grid

class mlwiz.evaluation.grid.Grid(configs_dict: dict)

Bases: object

Class that implements grid-search. It computes all possible configurations starting from a suitable config file.

Parameters:: configs_dict (dict) – the configuration dictionary specifying the different configurations to try

_gen_configs() → List[dict]

Takes a dictionary of key:list pairs and computes all possible combinations.

Returns:: A list of al possible configurations in the form of dictionaries

_gen_helper(cfgs_dict: dict) → dict: Helper generator that yields one possible configuration at a time.

_list_helper(values: object) → object: Recursively parses lists of possible options for a given hyper-parameter.

property exp_name: str

Computes the name of the root folder

Returns:: the name of the root folder as made of EXP-NAME_DATASET-NAME

property num_configs: int

Computes the number of configurations to try during model selection

Returns:: the number of configurations

evaluation.random_search

class mlwiz.evaluation.random_search.RandomSearch(configs_dict: dict)

Bases: Grid

Class that implements random-search. It computes all possible configurations starting from a suitable config file.

Parameters:: configs_dict (dict) – the configuration dictionary specifying the different configurations to try

_dict_helper(configs: dict)

Recursively parses a dictionary

Returns:: A dictionary

_gen_helper(cfgs_dict: dict) → Iterator[Dict[str, Any]]

Takes a dictionary of key:list pairs and computes all possible combinations.

Returns:: A list of all possible configurations in the form of dictionaries

_sampler_helper(configs: dict)

Samples possible hyperparameter(s) and returns it (them, in this case as a dict)

Returns:
A dictionary

evaluation.util

class mlwiz.evaluation.util.ProgressManager(outer_folds, inner_folds, no_configs, final_runs, show=True)

Bases: object

Class that is responsible for drawing progress bars.

Parameters:

outer_folds (int) – number of external folds for model assessment
inner_folds (int) – number of internal folds for model selection
no_configs (int) – number of possible configurations in model selection
final_runs (int) – number of final runs per outer fold once the best model has been selected
show (bool) – whether to show the progress bar or not. Default is True

_init_assessment_pbar(i: int)

Initializes the progress bar for risk assessment

Parameters:: i (int) – the id of the outer fold (from 0 to outer folds - 1)

_init_selection_pbar(i: int, j: int)

Initializes the progress bar for model selection

Parameters:

i (int) – the id of the outer fold (from 0 to outer folds - 1)
j (int) – the id of the inner fold (from 0 to inner folds - 1)

refresh(): Refreshes the progress bar

show_footer(): Prints the footer of the progress bar

show_header(): Prints the header of the progress bar

update_state(msg: dict)

Updates the state of the progress bar (different from showing it on screen, see refresh()) once a message is received

Parameters:: msg (dict) – message with updates to be parsed

mlwiz.evaluation.util._df_to_latex_table(df, no_decimals=2, model_as_row=True)

mlwiz.evaluation.util.choice(*args): Implements a random choice between a list of values

mlwiz.evaluation.util.clear_screen(): Clears the CLI interface.

mlwiz.evaluation.util.create_dataframe(config_list: List[dict], key_mappings: List[Tuple[str, Callable]])

Creates a pandas DataFrame from a list of configuration dictionaries and key mappings.

Parameters:

config_list – List[dict] A list of dictionaries, where each dictionary represents a configuration. Each configuration must contain an exp_folder key and may include nested keys corresponding to hyperparameter names.
key_mappings – List[Tuple[str, Callable]] A list of tuples where: - The first element (str) is the hyperparameter name to extract from the configurations. - The second element (Callable) is a transformation function to apply to the extracted value.

Returns:

pandas.DataFrame: A DataFrame containing rows generated from config_list with columns for exp_folder and the specified key_mappings. If a mapping value is missing, the corresponding DataFrame cell will contain None.

Return type:

df

mlwiz.evaluation.util.create_latex_table_from_assessment_results(exp_metadata, metric_key='main_score', no_decimals='2', model_as_row=True, use_single_outer_fold=False) → str

Creates a LaTeX table from a list of experiment folders, each containing assessment results. :param exp_metadata: A list of (paths to the experiment folder, model name, dataset name). :type exp_metadata: list[tuple(str,str,str)] :param metric_key: The key for the metric to extract. Default is ‘main_score’. :type metric_key: str :param no_decimals: The number of rounded decimal places to display in the LaTeX table. :type no_decimals: int :param model_as_row: If True, models are rows and datasets are columns. If False, the opposite. :type model_as_row: bool :param use_single_outer_fold: If True, only the first outer fold is used. This is useful

because when the number of outer folds is 1, the std in the assessment file is 0, therefore we want to recover the std across the final runs of the unique outer fold.

mlwiz.evaluation.util.filter_experiments(config_list: List[dict], logic: bool = 'AND', parameters: dict = {})

Filters the list of configurations returned by the method retrieve_experiments according to a dictionary. The dictionary contains the keys and values of the configuration files you are looking for.

If you specify more then one key/value pair to look for, then the logic parameter specifies whether you want to filter using the AND/OR rule.

For a key, you can specify more than one possible value you are interested in by passing a list as the value, for instance {‘device’: ‘cpu’, ‘lr’: [0.1, 0.01]}

Parameters:

config_list – The list of configuration files
logic – if AND, a configuration is selected iff all conditions are satisfied. If OR, a config is selected when at least one of the criteria is met.
parameters – dictionary with parameters used to filter the configurations

Returns:

a list of filtered configurations like the one in input

mlwiz.evaluation.util.get_scores_from_assessment_results(exp_folder, metric_key='main_score') → dict: Extracts scores from the configuration dictionary. :param exp_folder: The path to the experiment folder. :type exp_folder: str :param metric_key: The key for the metric to extract. Default is ‘main_score’. :type metric_key: str

mlwiz.evaluation.util.get_scores_from_outer_results(exp_folder, outer_fold_id, metric_key='main_score') → dict: Extracts scores from the configuration dictionary. :param exp_folder: The path to the experiment folder. :type exp_folder: str :param outer_fold_id: The ID of the outer fold, from 1 on. :type outer_fold_id: int :param metric_key: The key for the metric to extract. Default is ‘main_score’. :type metric_key: str

mlwiz.evaluation.util.instantiate_data_provider_from_config(config: dict, splits_filepath: str, n_outer_folds: int, n_inner_folds: int) → DataProvider: Instantiate a data provider from a configuration file. :param config (dict): the configuration file :param splits_filepath (str): the path to data splits file :param n_outer_folds (int): the number of outer folds :param n_inner_folds (int): the number of inner folds :return: an instance of DataProvider, i.e., the data provider

mlwiz.evaluation.util.instantiate_dataset_from_config(config: dict) → DatasetInterface

Instantiate a dataset from a configuration file.

Parameters:: (dict) (config) – the configuration file
Returns:: an instance of DatasetInterface, i.e., the dataset

mlwiz.evaluation.util.instantiate_model_from_config(config: dict, dataset: DatasetInterface) → ModelInterface: Instantiate a model from a configuration file. :param config (dict): the configuration file :param dataset (DatasetInterface): the dataset used in the experiment :return: an instance of ModelInterface, i.e., the model

mlwiz.evaluation.util.load_checkpoint(checkpoint_path: str, model: ModelInterface, device: torch.device): Load a checkpoint from a checkpoint file into a model. :param checkpoint_path: the checkpoint file path :param model (ModelInterface): the model :param device (torch.device): the device, e.g, “cpu” or “cuda”

mlwiz.evaluation.util.loguniform(*args)

Performs a log-uniform random selection.

Parameters:: *args – a tuple of (log min, log max, [base]) to use. Base 10 is used if the third argument is not available.
Returns:: a randomly chosen value

mlwiz.evaluation.util.normal(*args): Implements a univariate normal sampling given its parameters

mlwiz.evaluation.util.randint(*args): Implements a random integer sampling in an interval

mlwiz.evaluation.util.retrieve_best_configuration(model_selection_folder) → dict

Once the experiments are done, retrieves the winning configuration from a specific model selection folder, and returns it as a dictionaries

Parameters:: model_selection_folder – path to the folder of a model selection, that is, your_results_path/…./MODEL_SELECTION/
Returns:: a dictionary with info about the best configuration

mlwiz.evaluation.util.retrieve_experiments(model_selection_folder, skip_results_not_found: bool = False) → List[dict]

Once the experiments are done, retrieves the config_results.json files of all configurations in a specific model selection folder, and returns them as a list of dictionaries

Parameters:

model_selection_folder – path to the folder of a model selection, that is, your_results_path/…./MODEL_SELECTION/
skip_results_not_found – whether to skip an experiment if a config_results.json file has not been produced yet. Useful when analyzing experiments while others still run.

Returns:

a list of dictionaries, one per configuration, each with an extra key “exp_folder” which identifies the config folder.

mlwiz.evaluation.util.uniform(*args): Implements a uniform sampling given an interval