summary_stats_baseline#

Summary stats baseline computations

Module Contents#

Classes#

SummaryStats

Matcher

Functions#

get_number_counts(x, batch_indices)

Get the unweighted number counts

get_inv_dist_number_counts(x, batch_indices, pos_indices)

Get the inverse-dist weighted number counts

get_optimal_threshold(thresholds, n_matches[, min_matches])

Get the smallest threshold that has some minimum number of matches

match(train_x, test_x, train_y, threshold)

Match summary stats between train and test within given threshold

summary_stats_baseline.get_number_counts(x, batch_indices)[source]#

Get the unweighted number counts

Parameters#

xtorch.tensor

Input features of shape [n_nodes, n_features] for a given batch

batch_indicestorch.tensor

Batch indices of shape [n_nodes,] for a given batch

summary_stats_baseline.get_inv_dist_number_counts(x, batch_indices, pos_indices)[source]#

Get the inverse-dist weighted number counts

Parameters#

xtorch.tensor

Input features of shape [n_nodes, n_features] for a given batch

batch_indicestorch.tensor

Batch indices of shape [n_nodes,] for a given batch

pos_indiceslist

List of the two indices corresponding to ra, dec in x

class summary_stats_baseline.SummaryStats(n_data, pos_indices=[0, 1])[source]#
update(batch, i)[source]#

Update stats for a new batch

Parameters#

batcharray or dict

new batch of data whose data can be accessed by the functions in loader_dict

iint

index indicating that the batch is the i-th batch

set_stats(stats_path)[source]#

Loads a previously stored stats

Parameters#

stats_pathstr

Path to the .npy file of the stats dictionary

export_stats(stats_path)[source]#

Exports the stats attribute to disk as a npy file

Parameters#

stats_pathstr

Path to the .npy file of the stats dictionary

class summary_stats_baseline.Matcher(train_stats, test_stats, train_y, out_dir, test_y=None)[source]#
match_summary_stats(thresholds, interim_pdf_func=None, min_matches=1000, k_max=np.inf)[source]#

Match summary stats between train and test

Parameters#

thresholdsdict

Matching thresholds for summary stats Keys should be one or both of ‘N’ and ‘N_inv_dist’.

interim_pdf_funccallable, optional

Interim prior PDF with which to reweight the samples

get_samples(idx, ss_name, threshold=None)[source]#

Get the pre-weighting (raw) accepted samples

Parameters#

idxint

ID of sightline

ss_namestr

Summary stats name

thresholdint, optional

Matching threshold. If None, use the optimal threshold. Default: None

Returns#

np.ndarray

Samples of shape [n_matches]

get_overview_table()[source]#
summary_stats_baseline.get_optimal_threshold(thresholds, n_matches, min_matches=1000)[source]#

Get the smallest threshold that has some minimum number of matches

Parameters#

thresholds : array-like n_matches : array-like min_matches : int

summary_stats_baseline.match(train_x, test_x, train_y, threshold)[source]#

Match summary stats between train and test within given threshold

Parameters#

train_xnp.ndarray

train summary stats

test_xfloat

test summary stats

train_ynp.ndarray

train target values

thresholdfloat

closeness threshold matching is based on

Returns#

tuple

boolean mask of accepted samples for train_y and the accepted samples