Helpful Programs for Gathering data and Post-processing Data
As well as including programs for creating and running mass numbers of genetic algorithm, we have also included scripts and programs to gather and process the data across all the set of repeated genetic algorithm runs. In this article, we will introduce all of the program that can be used to run these programs. These programs can be run by typing the program you want to run into the terminal from whatever directory you are in.
The scripts and programs that we will be mentioned here are:
What to make sure is done before running any of these scripts.
If you installed Organisms through pip3
If you installed the Organisms program with pip3
, these scripts will be installed in your bin. You do not need to add anything into your ~/.bashrc
. You are all good to go.
If you performed a Manual installation
If you have manually added this program to your computer (such as cloning this program from Github), you will need to make sure that you have included the Postprocessing_Programs
folder into your PATH
in your ~/.bashrc
file. All of these program can be found in the Postprocessing_Programs
folder. To execute these programs from the Postprocessing_Programs
folder, you must include the following in your ~/.bashrc
:
export PATH_TO_GA="<Path_to_Organisms>"
where <Path_to_Organisms>"
is the path to get to the genetic algorithm program. Also include somewhere before this in your ~/.bashrc
:
export PATH="$PATH_TO_GA"/Organisms/Postprocessing_Programs:$PATH
See more about this in Installation of the Genetic Algorithm.
Did_Complete.py
- Have all your genetic algorithm trials completed?
This program is the first program that you should use before continuing on with any analysis. It is a quick program that will scan through all the trials, and check to see if they have completed.
To use this program, you want to enter into into the terminal
Did_Complete.py
in the directory that you ran your MakeTrials.py script from. You can also enter Did_Complete.py
into the terminal within any folders, as long as at some point it will find the Trials in the subdirectories that you ran.
Did_Find_LES.py
- Did all your genetic algorithm trials find the global minimum?
This program is designed to determine which of the trials you can found the global minimum that you were searching for. To run this program, enter Did_Find_LES.py
into the terminal at any directory you want. This program will go through all subdirectories in search for folders that start with Trial
, and look through the result to see if the global minimum you are looking for has been found for each trial run.
More specifically, this algorithm looks for any entries in the EnergyProfiles.txt files for each trial of clusters that are of a certain energy to a certain number of decimal places.
This program will ask the user what the energy is of the cluster that the user wants to locate in each trial, the number of decimal places that the user wants to round the energy to, and the number of generations the user wants to means the genetic algorithm trials up to (If this is not given, the algorithm will look through every generation).
Each set of trials is measured individually for different genetic algorithms in different folders.
GetLESOfAllTrials.py
- Get information of generations and number of minimisations performed
This program is designed to obtain information about the generation and the number of minimisations performed to first obtain the lowest energy clusters each trial had found. This algorithm will also report the average number of generations and average number of minimisation performed across all the trials that had found the lowest of the lowest energy clusters those trials had found. For example, if 5 of 20 genetic algorithm trials found the a cluster with the same energy and this cluster was lower in energy than the lowest energy clusters found from the other 15 trials, then the average number of generations and minimisations is taken for those 5 that had found the lowest of the lowest energy clusters.
You can run this program by typing GetLESOfAllTrials.py
in the terminal in any folder. This program will search through all subdirectories for folders that start with the name Trial
, and report on those genetic algorithm trials found in the same folder (being apart of the same set of genetic algorithm trials). The algorithm will ask for two pieces of information:
The generation you would like to search up to (Default: The full genetic algorithm until the LES has been found or the genetic algorithm has successfully finished).
The number of decimal places to round the energy to (Default: 2 decimal places).
You can also enter this in the terminal when you type in GetLESOfAllTrials.py
:
GetLESOfAllTrials.py maximum_generation_to_sample_up_to
where the number of decimal places to run the genetic algorithm to is given as 2 decimal places (this is the default), or you can enter into the terminal
GetLESOfAllTrials.py maximum_generation_to_sample_up_to number_of_decimal_places_to_round_the_energy_to
Each set of trials is measured individually for different genetic algorithms in different folders. This program should be run after all genetic algorithm trials have successfully finished.
Postprocessing_Database.py
and Postprocessing_Many_Databases_Together.py
- For breaking a large database into smaller chunks
If a database (such as the storage database in Recorded_Data/GA_Recording_Database.db
) is too big to process with ase db
, this program is designed to break up the database into smaller databases which can be better handled by ase db
and your computer. This program will sort these clusters before placing them in the separate, potentially smaller databases. This program will also rotate the cluster so that the principle axis of inertia points along the z axis.
To run this program, first move into the Recorded_Data
folder in the terminal, then run the Postprocessing_Database.py
program in the terminal. There are two parameters that need to be entered. These are:
number_of_clusters_per_database (int): This is the maximum number of clusters you would like in each database.
sort_clusters_by (str.): This tells the program how you would like clusters sorted in this(these) database(s).
You can also enter this in the terminal when you type in Postprocessing_Database.py
:
Postprocessing_Database.py number_of_clusters_per_database
where the number of decimal places to run the genetic algorithm to is given as 2 decimal places (this is the default), or you can enter into the terminal
Postprocessing_Database.py number_of_clusters_per_database sort_clusters_by
Postprocessing_Many_Databases_Together.py
- For compiling all databases from all your trials together and breaking them up into smaller chunks if needed
If you have performed many genetic algorithm trials and have created many Recorded_Data/GA_Recording_Database.db
databases for your genetic algorithm trials, you can use the Postprocessing_Many_Databases_Together.py
program to compile all the clusters you recorded across all your genetic algorithm trials together.
This is the recommeneded program to use if you want understand all the various geometries of a cluster.
To run this program, first move into the folder that contains your Trials
folders, then run the Postprocessing_Many_Databases_Together.py
program in the terminal. There are two parameters that need to be entered. These are:
number_of_clusters_per_database (int): This is the maximum number of clusters you would like in each database.
sort_clusters_by (str.): This tells the program how you would like clusters sorted in this(these) database(s).
You can also enter this in the terminal when you type in Postprocessing_Many_Databases_Together.py
:
Postprocessing_Database.py number_of_clusters_per_database
where the number of decimal places to run the genetic algorithm to is given as 2 decimal places (this is the default), or you can enter into the terminal
Postprocessing_Database.py number_of_clusters_per_database sort_clusters_by
database_viewer.py
- Viewing GA databases with ASE database website viewer with metadata
The databases that are created by the Organisms program has metadata that allows the clusters to be organised in the database by their energy. The metadata also contains information about all the variables included in the database for the users convenience. However, in recent versions of ASE the metadata is not included when using the website. database_viewer
allows the metadata to be included in the ASE website viewer.
This program is run by the user moving into the Recorded_Data
folder in the terminal and running the database_viewer.py
program. There is one parameter that need to be entered. This is:
name_of_the_database (str.): This is the name of the database that you want to view.
Enter this into the terminal when you type in database_viewer.py
:
database_viewer.py name_of_the_database
make_energy_vs_similarity_results.py
- For analysing the genetic algorithm under-the-hood
It is often useful to understand how the genetic algorithm procedure during the global optimisation of a cluster. This is especially useful if you are wanting to analyse the efficiency of the genetic algorithm. We have created a program that can help to get under the hood of the Organisms program and understand what clusters the genetic algorithm was obtaining. This creates a series of energy vs similarity plots that act as a way of observing clusters created on the potential energy surface. See more information about the make_energy_vs_similarity_results.py program at Information about using the make_energy_vs_similarity_results.py script.
remove_blank_arrayJobs.py
- For removing blank arrayJob
output and error files outside of Trials
folders
If you have been making lots of repeated trials using the MakeTrials.py
script and all your runs have completed, you will find that you will have a lot of arrayJob
files that are empty. This is because all the trials have completed and the data from the arrayJob
output and error files has been moved into the respective Trial folder. This program is designed to remove these blank arrayJob
files.
When you run this program, it will look into every subfolder for the folder that contains all the Trial folders. It will then look to see if the arrayJob
files are blank or not. The blank arrayJob
files will be removed.
Note that it will not delete arrayJob
files that are within trials folders. This is any folder that is named TrialX
, where X
is an integer.
remove_overall_arrayJobs.py
- For removing all arrayJob
output and error files outside of Trials
folders
This program will remove all arrayJob
output and error files that are found alongside Trials
folders.
To run this program, go into the folder that your genetic algorithms have been run in and type remove_overall_arrayJobs.py
into the terminal. This program will look into all the subdirectories for those folders that contain your Trials
folders. It will then delete all the arrayJob
output and error files that are alongside your Trials
folders.
Note that it will not delete arrayJob
files that are within trials folders. This is any folder that is named TrialX
, where X
is an integer.
tar_trials_collectively.py
- Tar all Trials
folders (and other files and folders)
This program will recursively tar all subdirectories that include Trials
folders. For example, if the folder called OffPerGenEquals16
contains Trial1
, Trial2
, Trial3
, Trial4
, Trial5
, Run.py
, RunMinimisation.py
, mass_submit.sl
, this program will tar OffPerGenEquals16
and everything in it into the tar file called OffPerGenEquals16.tar
in the same place as where OffPerGenEquals16
had originally been found.
This program will also delete the Trials folders, since they have all been tarred up. Files and folder will not be deleted if you enter into the terminal:
tar_trials_collectively.py False
untar_trials_collectively.py
- Untar all Trials
folders (and other files and folders)
This program will recursively untar all subdirectories that contain a tar file, and will untar the tar file in place. This is useful for untar tar files that were made using tar_trials_collectively.py
This program will also delete the tar files in the process. Tar files will not be deleted if you enter into the terminal
untar_trials_collectively.py False