=============== Training N2JNet =============== To train `N2JNet`, pass in the config yml file as the argument, e.g. :: $python train_n2jnet.py nersc_config.yml The "training" section of the `nersc_config.yml` config file in the repo provides an example of how to configure training. We take a look at it here. First, we configure the dataloader. This means setting the training healpixes of CosmoDC2, batch size, number of CPU cores, input galaxy features, as well as the galaxy selection and photometric noise level. :: # Dataloader kwargs data: train_dist_name: 'norm' train_dist_kwargs: loc: 0.01 scale: 0.04 in_dir: '/global/cscratch1/sd/jwp/n2j/data_v04' train_hp: [9559, 10327, 9687, 9814, 9815, 9816, 9942, 9943, 10070, 10071, 10072, 10198] val_hp: [10199, 10200, 10450] n_train: [50000, 50000, 50000, 50000, 50000, 50000, 50000, 50000, 50000, 50000, 50000, 50000] # Final effective training set size n_subsample_train: 200000 n_val: [50000, 50000, 50000] # Final effective val set size n_subsample_val: 1000 batch_size: 1000 val_batch_size: 1000 num_workers: 18 # Global (graph-level) target; final_gamma1, final_gamm2 also available sub_target: ['final_kappa'] # Local (node-level) target sub_target_local: ['stellar_mass', 'redshift'] # Features available; do not modify (determined at data generation time) features: ['galaxy_id', 'ra', 'dec', 'redshift', 'ra_true', 'dec_true', 'redshift_true', 'bulge_to_total_ratio_i', 'ellipticity_1_true', 'ellipticity_2_true', 'ellipticity_1_bulge_true', 'ellipticity_1_disk_true', 'ellipticity_2_bulge_true', 'ellipticity_2_disk_true', 'shear1', 'shear2', 'convergence', 'size_bulge_true', 'size_disk_true', 'size_true', 'mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_Y_lsst'] # Features to use as input sub_features: ['ra_true', 'dec_true', 'mag_u_lsst', 'mag_g_lsst', 'mag_r_lsst', 'mag_i_lsst', 'mag_z_lsst', 'mag_Y_lsst'] noise_kwargs: mag: override_kwargs: null depth: 5 detection_kwargs: ref_features: ['mag_i_lsst'] max_vals: [25.3] # Optimizer kwargs optimization: early_stop_memory: 50 weight_local_loss: 0.1 optim_kwargs: lr: 0.002 weight_decay: 0.0001 lr_scheduler_kwargs: patience: 5 factor: 0.5 min_lr: 0.0000001 verbose: True Then we configure the optimizer, i.e. the weighting between the loss of global convergence labels and local stellar mass and redshift labels, early-stopping, initial learning rate, learning rate decay. :: # Optimizer kwargs optimization: early_stop_memory: 50 weight_local_loss: 1.0 optim_kwargs: lr: 0.001 weight_decay: 0.0001 lr_scheduler_kwargs: patience: 5 factor: 0.5 min_lr: 0.0000001 verbose: True We then configure the depth and width of the network architecture. The `global flow` key determines whether the final layer predicting the convergence is a normalizing flow. :: # Model kwargs model: dim_local: 50 dim_global: 50 dim_hidden: 50 dim_pre_aggr: 50 n_iter: 5 n_out_layers: 5 dropout: 0.04 global_flow: False Lastly, we set more general attributes of training such as the seed, device, and the number of training epochs. :: # Trainer attributes trainer: device_type: 'cuda' checkpoint_dir: results/E1 seed: 1028 n_epochs: 200 # If you want to resume training from a checkpoint resume_from: checkpoint_path: null