CASBI.preprocessing#

Module Contents#

Functions#

extract_parameter_array

Extract the parameters and observables from the path. Checks all the possible errors and if one is found it is saved as an ‘error_file’. If no stars were formed in the snapshot, the function dosen’t save any file. Two .npz files are returned, one with the parameters and another with the observables. In order to load the parameters values use the common way of accessing numpy array in .npz file, for example: np.load(‘file.npz’)[‘star_mass’]. The parameters that are extracted are: star_mass, infall_time. The observables that are extracted are: [Fe/H], [O/Fe], refered to as ‘feh’ and ‘ofe’.

gen_files

Generate the parameter and observable files for all the given paths, and save them in the 2 separate folders for parameters and observables. It is suggested to use the glob library to get all the paths of the snapshots in the simulation like: path = glob.glob(‘storage/g?.??e??/g?.??e??.0????’) Saves also a dataframe with the errors that occurred during the extraction of the parameters and observables, in the same directory as the files.

preprocess

Save the necessary files to preprocess the data for the training set. It saves aggregated information of Galaxy Mass, Number of stars, [Fe/H] and [O/Fe] in the preprocess_dir. so that percentile cut can be computed in gen_dataframe funciton

load_data

Load the data from the file_path and return a pandas dataframe with the data. This function is then distributed in CASBI.preprocessing.gen_dataframe function

gen_dataframe

Genereate the dataframe used for the sampling process in the CASBI.template_library class

API#

CASBI.preprocessing.extract_parameter_array(sim_path='str', file_path='str', position_flag=False) None[source]#

Extract the parameters and observables from the path. Checks all the possible errors and if one is found it is saved as an ‘error_file’. If no stars were formed in the snapshot, the function dosen’t save any file. Two .npz files are returned, one with the parameters and another with the observables. In order to load the parameters values use the common way of accessing numpy array in .npz file, for example: np.load(‘file.npz’)[‘star_mass’]. The parameters that are extracted are: star_mass, infall_time. The observables that are extracted are: [Fe/H], [O/Fe], refered to as ‘feh’ and ‘ofe’.

sim_pathstr

Path to the simulation snapshot. The path should end with ‘simulation_name.snapshot_number’ and it is used to create the name of the .npz files.

file_pathstr

Path to the folder where the file will be saved. The file is a .npz file with parameters and observables stored in it.

position_flagbool

flag to save the positions of the stars in the snapshot. Default is False.

file.npz array

The file is save in the folder ‘/file_path/name_file_parameters.npz’. The parameters are: file[‘star_mass’] : float Total mass of the formed stars in the snapshot file[‘infall_time’] : float Time at which the snapshot was taken in Gyr file[‘position’] : array Array with the positions of the formed stars in the snapshot

The observables are: file[‘feh’] : np.array Array with the [Fe/H] of the formed stars in the snapshot file[‘ofe’] : np.array Array with the [O/Fe] of the formed stars in the snapshot

CASBI.preprocessing.gen_files(sim_path: str, file_path: str, position_flag=False) None[source]#

Generate the parameter and observable files for all the given paths, and save them in the 2 separate folders for parameters and observables. It is suggested to use the glob library to get all the paths of the snapshots in the simulation like: path = glob.glob(‘storage/g?.??e??/g?.??e??.0????’) Saves also a dataframe with the errors that occurred during the extraction of the parameters and observables, in the same directory as the files.

sim_pathstr

Path to the simulation snapshots. The path should end with ‘simulation_name.snapshot_number’ and it is used to create the name of the .npz files.

file_pathstr

Path to the folder where the files will be saved.

None

CASBI.preprocessing.preprocess(file_dir: str, preprocess_dir: str) None[source]#

Save the necessary files to preprocess the data for the training set. It saves aggregated information of Galaxy Mass, Number of stars, [Fe/H] and [O/Fe] in the preprocess_dir. so that percentile cut can be computed in gen_dataframe funciton

file_dirstr

Path to the folder where the files with the parameters and observables are saved.

preprocess_dirstr

Path to the folder where the preprocess information will be saved.

preprocess_file_path: str

Path to the file with the preprocess information.

CASBI.preprocessing.load_data(file_path)[source]#

Load the data from the file_path and return a pandas dataframe with the data. This function is then distributed in CASBI.preprocessing.gen_dataframe function

file_pathstr

Path to the file with the parameters and observables.

df_temppandas.DataFrame

The dataframe with the data from the file_path.

CASBI.preprocessing.gen_dataframe(file_dir: str, dataframe_path: str) None[source]#

Genereate the dataframe used for the sampling process in the CASBI.template_library class

file_dirstr

Path to the folder where the files with the parameters and observables are saved.

dataframe_pathstr

Path to the folder where the dataframe will be saved

dfpandas.DataFrame

The dataframe with the data from the file_dir.