CASBI.preprocessing#
Module Contents#
Functions#
Extract the parameters and observables from the path. Checks all the possible errors and if one is found it is saved as an ‘error_file’. If no stars were formed in the snapshot, the function dosen’t save any file. Two .npz files are returned, one with the parameters and another with the observables. In order to load the parameters values use the common way of accessing numpy array in .npz file, for example: np.load(‘file.npz’)[‘star_mass’]. The parameters that are extracted are: star_mass, infall_time. The observables that are extracted are: [Fe/H], [O/Fe], refered to as ‘feh’ and ‘ofe’. |
|
Generate the parameter and observable files for all the given paths, and save them in the 2 separate folders for parameters and observables. It is suggested to use the glob library to get all the paths of the snapshots in the simulation like: path = glob.glob(‘storage/g?.??e??/g?.??e??.0????’) Saves also a dataframe with the errors that occurred during the extraction of the parameters and observables, in the same directory as the files. |
|
Save the necessary files to preprocess the data for the training set. It saves aggregated information of Galaxy Mass, Number of stars, [Fe/H] and [O/Fe] in the preprocess_dir. so that percentile cut can be computed in gen_dataframe funciton |
|
Load the data from the file_path and return a pandas dataframe with the data. This function is then distributed in CASBI.preprocessing.gen_dataframe function |
|
Genereate the dataframe used for the sampling process in the CASBI.template_library class |
API#
- CASBI.preprocessing.extract_parameter_array(sim_path='str', file_path='str', position_flag=False) None[source]#
Extract the parameters and observables from the path. Checks all the possible errors and if one is found it is saved as an ‘error_file’. If no stars were formed in the snapshot, the function dosen’t save any file. Two .npz files are returned, one with the parameters and another with the observables. In order to load the parameters values use the common way of accessing numpy array in .npz file, for example: np.load(‘file.npz’)[‘star_mass’]. The parameters that are extracted are: star_mass, infall_time. The observables that are extracted are: [Fe/H], [O/Fe], refered to as ‘feh’ and ‘ofe’.
- sim_pathstr
Path to the simulation snapshot. The path should end with ‘simulation_name.snapshot_number’ and it is used to create the name of the .npz files.
- file_pathstr
Path to the folder where the file will be saved. The file is a .npz file with parameters and observables stored in it.
- position_flagbool
flag to save the positions of the stars in the snapshot. Default is False.
- file.npz array
The file is save in the folder ‘/file_path/name_file_parameters.npz’. The parameters are: file[‘star_mass’] : float Total mass of the formed stars in the snapshot file[‘infall_time’] : float Time at which the snapshot was taken in Gyr file[‘position’] : array Array with the positions of the formed stars in the snapshot
The observables are: file[‘feh’] : np.array Array with the [Fe/H] of the formed stars in the snapshot file[‘ofe’] : np.array Array with the [O/Fe] of the formed stars in the snapshot
- CASBI.preprocessing.gen_files(sim_path: str, file_path: str, position_flag=False) None[source]#
Generate the parameter and observable files for all the given paths, and save them in the 2 separate folders for parameters and observables. It is suggested to use the glob library to get all the paths of the snapshots in the simulation like: path = glob.glob(‘storage/g?.??e??/g?.??e??.0????’) Saves also a dataframe with the errors that occurred during the extraction of the parameters and observables, in the same directory as the files.
- sim_pathstr
Path to the simulation snapshots. The path should end with ‘simulation_name.snapshot_number’ and it is used to create the name of the .npz files.
- file_pathstr
Path to the folder where the files will be saved.
None
- CASBI.preprocessing.preprocess(file_dir: str, preprocess_dir: str) None[source]#
Save the necessary files to preprocess the data for the training set. It saves aggregated information of Galaxy Mass, Number of stars, [Fe/H] and [O/Fe] in the preprocess_dir. so that percentile cut can be computed in gen_dataframe funciton
- file_dirstr
Path to the folder where the files with the parameters and observables are saved.
- preprocess_dirstr
Path to the folder where the preprocess information will be saved.
- preprocess_file_path: str
Path to the file with the preprocess information.
- CASBI.preprocessing.load_data(file_path)[source]#
Load the data from the file_path and return a pandas dataframe with the data. This function is then distributed in CASBI.preprocessing.gen_dataframe function
- file_pathstr
Path to the file with the parameters and observables.
- df_temppandas.DataFrame
The dataframe with the data from the file_path.
- CASBI.preprocessing.gen_dataframe(file_dir: str, dataframe_path: str) None[source]#
Genereate the dataframe used for the sampling process in the CASBI.template_library class
- file_dirstr
Path to the folder where the files with the parameters and observables are saved.
- dataframe_pathstr
Path to the folder where the dataframe will be saved
- dfpandas.DataFrame
The dataframe with the data from the file_dir.