Collection.py

This describes a collection of clusters, such as that stored as a population or a set of offspring.

Collection.py, 12/04/2017, Geoffrey R Weal

This class is specifically designed to hold a collection of clusters, such as in a population or in an offspring pool.

This class holds the foundation of the object used to store clusters in the population, the offspring, and other collections.

class Organisms.GA.Collection.Collection(name, size, path=None, have_database=False, write_collection_history=False, write_cluster_in_RAM=True)

This is the foundation of the object used to store clusters in the population, the offspring, and for recording clusters made during the genetic algorithm using the GA_Recording_System.py

Parameters:
  • path (str.) – The path that clusters will be written to disk. Default: None, meaning will write to same path as your execution Run.py file.

  • name (str.) – Name of the Collection. This can be any name you like, it is just to note what the collection is. All collections should have a unique name if possible to prevent confusion, however this will not break the program.

  • size (int) – This is the number of clusters that should be in the collection. In this version of the Organisms program, the number of clusters in the population and made during Creation of Offspring should be consistant throughout the genetic algorithm process.

  • write_collection_history (bool.) – This will tell the collection to record a txt file of what clusters were in the collection over the generations Organisms performs. Default: False

  • write_files_as (str.) –

    This tells the collection if and how to write clusters to the disk (Default: “database”). There are three options:

    • ”database” if you want the Collection to make a database

    • ”xyz” if you want the Collection to make xyz files

    • None, “None” or “none” if you do not want the Collection to make any cluster files

add(index, cluster)

Adds a cluster to the Collection.

Index:

index (int/str.): the index of the ith cluster in the Collection. If “End” is inputed, the cluster will be append to the end of the Collection list. cluster (Organisms.GA.Cluster): The cluster to add at the ith position in the Collection.

add_clusters_into_RAM(cluster_dict, cluster_names)

This method adds clusters into the RAM

Parameters:
  • cluster_dict ({int: ASE.Cluster}) – This is a dicionary of all the clusters from the database, given as {cluster_name: Cluster}

  • cluster_names (list of int) – list of the names of the clusters that are needed for the collection

add_metadata()

Due to an issue with some versions of ASE, this method will write metadata to the ASE database, if a database is being used to store cluster information.

add_to_database(cluster, center=False)

Allows the user to write a cluster in the collection to a ASE database

Inputs:

cluster (Organisms.GA.Cluster): The cluster to add to the database.

add_to_history_file(generation_number, is_epoch=False, epoch_due_to_population_energy_convergence=None)

This definition will add the information of the population. This is suppose to be used after each generation has completed.

Parameters:
  • generation_number (int) – The current generation that the genetic algorithm run has just performed.

  • is_epoch (bool.) – Has an epoch just occurred

  • epoch_due_to_population_energy_convergence (bool.) – If an epoch occurred, was it because the energy of the clusters converged.

assess_clusters_in_database(database_path)

This algorithm will check to make sure that there are no technical issues with the database, whether that is the original or the backup database.

Parameters:

database_path (str.) – The path to the database

returns is_database_working: Is true if there are no technical issues with the database, False if there are technical issues. rtype is_database_working: bool. returns reason_for_issue: Return a None object if everything is all good, otherwise returns the exception detailing the issues with the database or the name of the cluster that caused issues. rtype is_database_working: None or str.

backup_database()

This method will make a backup of all the clusters in the Collection as a ASE database

check_PoolProfileTXT_exists()

This definition checks to see if the PoolProfile folder exists

check_clusters_in_database(cluster_dict, cluster_names, cluster_energies, decimal_place)

This method will check that the database contains all the clusters you need and does not have any issues.

Parameters:
  • cluster_dict ({int: ASE.Cluster}) – This is a dicionary of all the clusters from the database, given as {cluster_name: Cluster}

  • cluster_names (list of int) – list of the names of the clusters that are needed for the collection

  • cluster_energies (list of float) – list of the energies of the clusters that are needed for the collection

returns is_database_all_good: True means the database is all good and contains all the clusters needed to restore the collection, as well as comfirming they are of the correct energy. False means something is not working with the database, the database does not contain a required cluster, or there is an issue with clusters not having the energy that they should have. rtype is_database_all_good: bool.

check_database_and_determine_if_to_use_backup()

This method will remove any journal or lock files associated with the database, as well as check that either the database or the backup is functional and can be used to allowing your genetic algorithm trial to resume.

returns did_use_backup_database: True means the backup database was used. False means the original database was used. rtype did_use_backup_database: bool.

check_historyfile(resume_from_generation)

This method will check the history file to make sure that it does not contain any information from a failed generation.

If it does contain information from failed generations, it will delete those lines from the history file.

Parameters:

generation_number (int) – The current generation that the genetic algorithm run has just performed.

close()

This closes the history text file.

create_collection_history()

This definition will create the history file.

This includes the folder, contents of the folder and beginning to write the history file.

It also included information about the clusters in the collection file when it was first created.

delete_collection_database()

This method will remove the whole database for this Collection from the disk.

does_contain_database(backup)

Does the collection contain a backup file.

Parameters:

backup (bool.) – If true, look for the backup database file. If false, look for the original database file.

returns does_database_exist: Returns if either the database file or the backup database file exists. rtype does_database_exist: bool.

get_cluster_energies()

Returns all the clusters that the collection contains in a list

Returns:

all the clusters that the collection (list)

get_cluster_from_name(name)

This method will return the cluster in the Collection with the name “name”

Inputs:

name (int): The name of the cluster you want to obtain from the Collection.

Returns:

The cluster in the Collection with the name “name” (Organisms.GA.Cluster)

get_cluster_names(order=False)

Will provide a list of all the names of all the clusters in the Collection

Inputs:

order (bool.): This tag will tell this method whether the user would like the list of names given in order.

Returns:

List of the names of all the clusters in the Population

get_clusters()

Returns all the clusters that the collection contains in a list

Returns:

all the clusters that the collection (list)

get_history_path()

Return the path to the history file

Returns:

the path to the history file

get_index(name_to_find)

This method will provide the index of the cluster that has the name “name_to_find” in the Collection

Inputs:

name_to_find (int): the name of the cluster in the Collection to obtain the index for

Returns:

the index of the cluster in the Collection with the name “name_to_find”

Exceptions:

Will break if the cluster with the name “name_to_find” can not be found in this method.

get_max_mean_min_energies()

The maximum, mean, and minimum energy of the clusters in the population.

returns maximum_energy: This is the maximum energy of the cluster rtype maximum_energy: float returns mean_energy: This is the mean energy of the cluster rtype mean_energy: float returns minimum_energy: This is the minimum energy of the cluster rtype minimum_energy: float

history_file_name(end_name=None)

Get the name of the history file for this Collection.

Inputs:

end_name (str): The suffix of the name for the history gfile.

import_clusters_from_database_to_memory(current_generation, clusters_in_resumed_population, clusters_in_resumed_population_energies, decimal_place)

This method will attempt at obtaining the clustrs from the database and placing them in the collection in the RAM.

This method is currently set up for reading the population, but if needed it can be reworked for general purpose.

Parameters:
  • current_generation (float) – The current generation

  • clusters_in_resumed_population (list of int) – The names of the clusters in the current population

  • clusters_in_resumed_population_energies (list of floats) – The energies of the clusters in the current population

  • decimal_place (int) – The number of decimal places that energies are rounded to in your genetic algorithm run

returns did_clusters_come_from_backup: Did the imported clusters come from the backup database or the original. True if from the backup database, False if from the original backup. rtype did_clusters_come_from_backup: bool.

is_there_an_energy_range(rounding)

Determines if there is a range of energies in the collection

Parameters:

rounding (float) – The rounding of the energy of the cluster

returns Is there a range of energies in the collection rtype bool

make_collection_folder()

Will create the directory for self.path if it does not exist.

max_energy()

The maximum energy of the clusters in the population.

returns maximum_energy: This is the maximum energy of the cluster rtype maximum_energy: float

mean_energy()

The mean energy of the clusters in the population.

returns mean_energy: This is the mean energy of the cluster rtype mean_energy: float

min_energy()

The minimum energy of the clusters in the population.

returns minimum_energy: This is the minimum energy of the cluster rtype minimum_energy: float

move_backup_database_to_normal_backup()

This method will remove the original database file and replace it with the backup database file.

open(w_or_a)

This opens the profile pool text file

Inputs:

w_or_a = ‘Indicates how to open the file, whether to open it as a new file (“w”) or to append information to the history file (“a”).

pop(ith)

Pops the cluster at the ith positiion of the Collection and returns it.

Inputs:

ith (int): The index for the cluster that you want to get from the Collection

Returns:

The cluster that you want to obtain from the Collections (Organisms.GA.Cluster)

read_collection_database(database_path, current_generation=None)

This method will read the clusters in the database. Furthermore, this method is also designed to repair the collection database by removing any clusters that were created after the current generation if desired.

Parameters:
  • database_path (str.) – The path to the database.

  • current_generation (int) – The current generation that your genetic algorithm trial is being resumed from

returns clusters: This is a list of all the clusters from the database rtype clusters: list of Organisms.GA.Clusters

remove(index)

Removes a cluster to the Collection.

Index:

index (int): the index of the ith cluster in the Collection

remove_backup_database()

This method will remove the backup of all the clusters in the Collection, which will be in the format of a ASE database.

remove_backup_database_if_exists()

This method will remove the backup of all the clusters in the Collection, which will be in the format of a ASE database.

remove_clusters_from_database_that_are_from_unsuccessful_generations(database_path, current_generation=None)

This method will go through the database and delete any clusters of unsuccessful generations.

Parameters:
  • database_path (str.) – The path to the database.

  • current_generation (int) – The current generation that your genetic algorithm trial is being resumed from

remove_to_database(cluster)

Allows the user to remove a cluster in the collection from the ASE database

Inputs:

cluster (Organisms.GA.Cluster): The cluster to remove from the database.

replace(index, new_cluster)

Will replace the ith cluster in the Collection with a new cluster. Uses the self.remove and self.add methods.

Inputs:

index (int): the index of the ith cluster in the Collection new_cluster (Organisms.GA.Cluster): The new cluster to add at the ith position in the Collection

sort_by_energy()

This method will sort the clusters in the list by their energy (from lowest energy to highest energy).

sort_by_name()

This method will sort the clusters in the list by their name.

view_cluster(ith)

Allow the user to visually look at the cluster using the ASE gui. This is a debugging method.

Inputs:

ith (int): the index of the ith cluster in the Collection to view in the ASE gui.