CASBI.create_template_library#
Module Contents#
Classes#
Template Library class for loading and preprocess the data for the SBI model. The class can be access by the users for inspection of the training and test set by the training_galaxies and test_galaxies attributes, which returns a dictionary with galaxies “observables” and “parameters” and the index of the galaxy in the training and test set. The class also has a method get_inference_input() to return the training and test set with the right format to be given to the “CASBI.inference.sbi.train_inference()” function. The template library first needs to be instanciated with the path to the galaxy file and the dataframe containing the galaxy data obtained by CASBI.preprocessing, and it is possible to choose the total mass budject “M_tot”, the power-law esponent “alpha” of the luminosity function, the number of bins used to generate the 2D observables histogram and the observational uncertanties stardard deviation “sigma”. In order to generate training and test set the method gen_libary() needs to be called with the number of galaxies in the test and training set. |
API#
- class CASBI.create_template_library.TemplateLibrary(galaxy_file_path: str, dataframe_path: str, preprocessing_path: str, M_tot: float = 1410000000.0, alpha=1.25, d: float = 0.1, bins: int = 64, sigma: float = 0.0, perc_feh: float = 0.1, perc_ofe: float = 0.1, galaxy_names_to_remove: list = ['g6.31e09.01024', 'g6.31e09.00832', 'g6.31e09.00704', 'g6.31e09.00768', 'g6.31e09.00960', 'g6.31e09.00896'])[source]#
Template Library class for loading and preprocess the data for the SBI model. The class can be access by the users for inspection of the training and test set by the training_galaxies and test_galaxies attributes, which returns a dictionary with galaxies “observables” and “parameters” and the index of the galaxy in the training and test set. The class also has a method get_inference_input() to return the training and test set with the right format to be given to the “CASBI.inference.sbi.train_inference()” function. The template library first needs to be instanciated with the path to the galaxy file and the dataframe containing the galaxy data obtained by CASBI.preprocessing, and it is possible to choose the total mass budject “M_tot”, the power-law esponent “alpha” of the luminosity function, the number of bins used to generate the 2D observables histogram and the observational uncertanties stardard deviation “sigma”. In order to generate training and test set the method gen_libary() needs to be called with the number of galaxies in the test and training set.
Parameters:
- galaxy_file_path: str
path to the galaxy files
- dataframe_path: str
path to the dataframe file
Initialization
- pdf(m, m_max, m_min, alpha)[source]#
Power law mass function
Parameters: m: mass of the galaxy m_max: maximum mass of the galaxy m_min: minimum mass of the galaxy alpha: power law index of the mass function
Returns: pdf: power law mass function value at mass m
- cdf(m, m_max, m_min, alpha)[source]#
Cumulative distribution function of the power law mass function
Parameters: m: mass of the galaxy m_max: maximum mass of the galaxy m_min: minimum mass of the galaxy alpha: power law index of the mass function
Returns: cdf: cumulative distribution function value at mass m
- inverse_cdf(y, m_max, m_min, alpha)[source]#
Inverse cumulative distribution function of the power law mass function. It is used to sample analytically the mass function.
Parameters: y: random number between 0 and 1 m_max: maximum mass of the galaxy m_min: minimum mass of the galaxy alpha: power law index of the mass function
Returns: m: mass of the galaxy analytically sampled #
- gen_subhalo_sample(samples, masses, times, nbrs)[source]#
Function to return the Galaxy name, mass and infall time obtain by sampling the mass function and then looking for Neighbors in the mass space. If the sample is too far away from the mass function, 5 new samples are drawn and we randomly select one of them, if they are enough close and not already in the sample list. If the total mass is not reached we break the loop and return the list of samples, masses and times.
Parameters: samples: list of galaxy names masses: list of galaxy masses times: list of galaxy infall times nbrs: fitted nearest neighbors model to the masses
- gen_halo(j, galaxies_test=None)[source]#
Generate a real halo by sampling the mass function and then looking for Neighbors in the mass space. Returns the histogram of the galaxy, the mass and the infall time of the galaxy. If the test set is provided, it checks if the galaxy is already present in the test set, if so it generates a new one untill it is not present anymore.
Parameters: hist_file_path: path to the histogram file j: index of the galaxy to be generated galaxies_test: list of galaxy names in the test set
- gen_libary(N_test, N_train)[source]#
Generate the template library of galaxies by sampling the mass function and then looking for Neighbors in the mass space. It instanciate the 2d histogram of the galaxies (‘observables’), the mass and infall time of the galaxies (‘parameters’) as disctionaries with the (i, j) index as keys, j beeing galaxy index and the i-th subhalo index. The parameters are the mass, the infall, the subhalo index and the galaxy index. The training and test set are accessible through the training_galaxies, and test_galaxies attributes.
Parameters: N_test: number of galaxies in the test set N_train: number of galaxies in the training set