CNA_Database.py

Meow

CNA_Database.py, Geoffrey Weal, 29/10/2018

This script holds the information required to make a CNA_database.

class Organisms.GA.SCM_Scripts.CNA_Database.CNA_Database(rCuts, population, cut_off_similarity, get_single_CNA_profile_method, get_similarity_profile_method, no_of_cpus, debug)

This is a database that holds all the entries of CNA_Entry objects.

Parameters:
  • rCuts (float) – These are the rCut values to scan the CNA across.

  • population (Organisms.GA.Population) –

  • cut_off_similarity (float) – The maximum similarity above which to be considered similar enough to exclude offspring from being accessed into future generations.

  • get_single_CNA_profile_method (__func__) – This is the method to get the CNA Profile, either from the T-SCM or the A-SCM.

  • get_similarity_profile_method (__func__) – This is the method that uses the CNA profiles of two clusters in order to give the similarity profile between the two clusters, Whether this is obtained by the T-SCM or the A-SCM.

  • no_of_cpus (int) – The number of cpu to use to obtain the similarity profile between two clusters

  • debug (bool.) – Get data to use to debug the SCM. Default: False

add(collection, initialise=False)

Add the similarity data of the clusters in the collection to the CNA_Database

Parameters:
  • collection (Organisms.GA.Collection) – The clusters to add to the CNA_Database

  • initialise (bool) – Are the clusters from a newly created population. Default: False

check_database(population)

Check the database to make sure there are the correct number of entries in the database.

Parameters:

population (Organisms.GA.Population) – The population

get_all_averages_for_a_cluster(cluster_name)

Get the averages SCM similarities of a cluser compared to every other clusrer in the similarity profile.

Parameters:

cluster_name (list of floats) – Cluster to get the averages similarities of.

Returns:

all the average similarities between cluster_name any every other cluster in the database.

Return type:

list of float

get_details()

Retun the information about the CNA Database, including:

Returns:

The cut_off_similarity, get_single_CNA_method, get_similarity_profile_method, no_of_cpus, debug

Return type:

(float, __func__, __func__, int, bool)

get_max_similarity(name_1, name_2)

This def will obtain the max similarity percentage from the compararison of two clusters, name_1 and name_2.

Parameters:
  • name_1 (int) – The name of the first cluster you would like to look for an entry in the CNA_Database.

  • name_2 (int) – The name of the second cluster you would like to look for an entry in the CNA_Database.

Returns:

returns the maximum similarity between cluster name_1 and name_2

Return type:

float

get_similar_clusters_in_database()

Get clusters in the clusters that are deemed structurally similar in the CNA_Database

Returns:

a list of the names of all the clusters thaat are similar to each other.

Return type:

list of int

is_pair_in_the_database(dir_1, dir_2)

Determine if a CNA entry exists for two clusters in the similarity_profile_database

Parameters:
  • name1 (int) – The dir of the first cluster you would like to look for an entry in the CNA_Database.

  • name2 (int) – The dir of the second cluster you would like to look for an entry in the CNA_Database.

Returns:

True if the similarity profile for cluster name1 and name2, False if not.

Return type:

bool

keys()

Return a list of the names of all the clusters in the database

Returns:

A list of the names of all the clusters in the database

Return type:

list of int

make_simple_table(similarity_profile_database)

This is a simpled table that can be printed to the terminal that shows all the similarities between clusters in the population + offspring

Parameters:

similarity_profile_database (Organisms.GA.SCM_Scripts.CNA_Database.CNA_Database) – This is the CNA database that contains all similarity information of clusters in rhe population and offspring.

print_cna_database_details()

Print information about the database.

remove(names_to_remove)

Remove all entries that exists in the database that are associated with a particular cluster.

Parameters:

name_to_remove (int) – The name of the cluster you would like to remove from the CNA_Database.

reset()

Reset the CNA profiles of generated clusters and the similarity profile database.

class Organisms.GA.SCM_Scripts.CNA_Database.Tree

This is a Tree designed for the CNA_Database to hold references of CNA_Entry.

see_tree()

This shown the clusters in the database

Organisms.GA.SCM_Scripts.CNA_Database.cna_profile_generator(collection, rCuts)

This is a generator that returns the clusters in the collection with the rCut values to scan across.

Parameters:
  • collection (Organisms.GA.Collection) – A collection

  • rCuts (list of float) – The list of clusters to scan across with the SCM.

Returns:

a tuple of the cluster the rCut values to scan across with the SCM

Organisms.GA.SCM_Scripts.CNA_Database.initial_similarity_profile_generator(collection, cna_database)

This is a generator that returns the clusters in the collection with the rCut values to scan across.

Parameters:
  • collection (Organisms.GA.Collection) – A collection

  • cna_database (list of float) – The list of clusters to scan across with the SCM.

Returns:

a tuple of the names of the clusters and their associated CNA profiles.

Return type:

(int, int, Counter, Counter)

Organisms.GA.SCM_Scripts.CNA_Database.similarity_profile_generator(population, offsprings, cna_database)

This is a generator that returns the clusters in the collection with the rCut values to scan across.

Parameters:
  • population (Organisms.GA.Population) – The population

  • offsprings (Organisms.GA.Offspring_Pool) – The collection of offspring

  • cna_database (list of float) – The list of clusters to scan across with the SCM.

Returns:

a tuple of the names of the clusters and their associated CNA profiles.

Return type:

(int, int, Counter, Counter)