mlwiz.experiment
experiment.experiment
Experiment wrapper for building and running training jobs.
Defines Experiment, which instantiates models/engines from configs and runs validation/test loops.
- class mlwiz.experiment.experiment.Experiment(model_configuration: dict, exp_path: str, exp_seed: int)
Bases:
objectClass that handles a single standard experiment.
- Parameters:
model_configuration (dict) – the dictionary holding the experiment-specific configuration
exp_path (str) – path to the experiment folder
exp_seed (int) – the experiment’s seed to use
- _cleanup_ddp()
Tear down process group.
- _return_class_and_args(key: str) Tuple[Callable[[...], object], dict]
Returns the class and arguments associated to a specific key in the configuration file.
- Parameters:
config – the configuration dictionary
key – a string representing a particular class in the configuration dictionary
- Returns:
a tuple (class, dict of arguments), or (None, None) if the key is not present in the config dictionary
- _run_ddp(mode: str, dataset_getter, training_timeout_seconds, logger, progress_callback: Callable[[dict], None] | None = None, should_terminate: Callable[[], bool] | None = None)
Spawn one local process per visible GPU and return rank-0 result.
- _run_test_impl(dataset_getter, training_timeout_seconds, logger, progress_callback: Callable[[dict], None] | None = None, should_terminate: Callable[[], bool] | None = None, ddp_rank: int | None = None, ddp_world_size: int = 1)
Internal final run used by both single-process and DDP paths.
- _run_valid_impl(dataset_getter, training_timeout_seconds, logger, progress_callback: Callable[[dict], None] | None = None, should_terminate: Callable[[], bool] | None = None, ddp_rank: int | None = None, ddp_world_size: int = 1)
Internal validation run used by both single-process and DDP paths.
- _set_worker_device(ddp_rank: int | None)
Set per-rank device in config.
- _setup_ddp(rank: int, world_size: int, master_port: int)
Initialize the process group for this rank.
- _should_use_ddp() bool
Enable DDP when multiple CUDA devices are visible in this process.
- _wrap_ddp_model(model, ddp_rank: int | None)
Wrap model in DDP (single-device and model-parallel cases).
- create_engine(config: Config, model: ModelInterface) TrainingEngine
Utility that instantiates the training engine. It looks for pre-defined fields in the configuration file, i.e.
loss,scorer,optimizer,scheduler,gradient_clipper,early_stopperandplotter, all of which should be classes implementing theEventHandlerinterface- Parameters:
config (
Config) – the configuration dictionarymodel – the model that needs be trained
- Returns:
a
TrainingEngineobject
- create_model(dim_input_features: int | Tuple[int], dim_target: int, config: Config) ModelInterface
Instantiates a model that implements the
ModelInterfaceinterface- Parameters:
dim_input_features (Union[int, Tuple[int]]) – number of node features
dim_target (int) – target dimension
config (
Config) – the configuration dictionary
- Returns:
a model that implements the
ModelInterfaceinterface
- run_test(dataset_getter, training_timeout_seconds, logger, progress_callback: Callable[[dict], None] | None = None, should_terminate: Callable[[], bool] | None = None)
This function returns the training, validation and test results for a final run. Do not use the test to train the model nor for early stopping reasons! If possible, rely on already available subclasses of this class.
It implements a simple training scheme.
- Parameters:
dataset_getter (
DataProvider) – a data providertraining_timeout_seconds (int) – timeout for the experiment in seconds
logger (
Logger) – the logger
- Returns:
a tuple of training,validation,test dictionaries. Each dictionary has two keys:
LOSS(as defined inmlwiz.static)SCORE(as defined inmlwiz.static)
For instance, training_results[SCORE] is a dictionary itself with other fields to be used by the evaluator.
- run_valid(dataset_getter, training_timeout_seconds, logger, progress_callback: Callable[[dict], None] | None = None, should_terminate: Callable[[], bool] | None = None)
This function returns the training and validation results for a model selection run. Do not attempt to load the test set inside this method! If possible, rely on already available subclasses of this class.
It implements a simple training scheme.
- Parameters:
dataset_getter (
DataProvider) – a data providertraining_timeout_seconds (int) – timeout for the experiment in seconds
logger (
Logger) – the logger
- Returns:
a tuple of training and test dictionaries. Each dictionary has two keys:
LOSS(as defined inmlwiz.static)SCORE(as defined inmlwiz.static)
For instance, training_results[SCORE] is a dictionary itself with other fields to be used by the evaluator.
- mlwiz.experiment.experiment._ddp_worker(rank: int, world_size: int, mode: str, experiment_spec, dataset_getter_spec, training_timeout_seconds, logger_spec, master_port: int, result_queue, progress_queue, stop_flag)
Worker used by DDP spawn.
- mlwiz.experiment.experiment._find_free_port() int
Return a free localhost TCP port.
- mlwiz.experiment.experiment._to_dotted_path(obj) str
Return the dotted path for a class/function object.
- mlwiz.experiment.experiment._to_queue_safe(obj)
Convert tensors/numpy values to plain Python before queue transfer.