py4pa.ona¶

py4pa.ona.betweenness_centrality_parallel(G, processes=None)¶: Parallel betweenness centrality function

py4pa.ona.calc_density(df_nodes, df_edges, target_attribute)¶

Calculates the density of connections between the groups in a specific target target_attribute

Parameters:

df_nodes (Pandas DataFrame) – Dataframe containing the Node list
df_edges (Pandas DataFrame) – DataFrame containing the edge list
target_attribute (String) – Name of attribute in Node List that we want to calculate the densities between

Return type:

Pandas DataFrame containing the densities, grouped by the target_attribute values

py4pa.ona.calc_modularity(df_nodes, df_edges, target_attribute, weighted=False, direction='outbound')¶

Calculates the Modularity of connections originating from groups in a specific target target_attribute

Parameters:

df_nodes (Pandas DataFrame) – Dataframe containing the Node list
df_edges (Pandas DataFrame) – DataFrame containing the edge list
target_attribute (String) – Name of attribute in Node List that we want to calculate the modularities between
weighted (Boolean default False) – If set to True, the modularities will be weighted by the amount of email traffic. If False, will just calculate on basis on presence of a connection
direction (String default = 'outbound') – ‘outbound’ or ‘inbound’ determines the direction of the email traffic to be considered

Returns:

py4pa.ona.clean_email_data(dir, files='all', include_subject=False, engine='c', encoding='latin', delete_old_file=False)¶

Cleans email data from Splunk to key fields only

Parameters:

dir (String ()) – Path to the root directory containing the data files to process
file (List or 'all' (optional)) – List containing all files to be processed within the directory. If ‘all’ passed, then all CSV files in the directory will be processed
include_subject (Boolean default = False) – Defines whether the Subject field should be included in the cleaned data
engine (String default='c') – The Pandas engine to read in the data, either ‘c’ or ‘python’
encoding (String default = 'latin') – The file encoding of files to be read
delete_old_file (Boolean default = False) – If set to True, the original Splunk data file will be delete_old_file

Returns:

py4pa.ona.generate_node_edge_lists(email_data, demographic_data, demographic_key, output_dir, include_subject=False)¶

Generates Node and Edge lists from email data and saves them to csv

Parameters:

email_data (List of Strings) – List of paths to all files containing email data to be processed
demographic_data (String) – Path to file containing all node demographic data to be added to Node list
demographic_key (String) – Column in demographic_data that contains email address to act as join to email_data
output_dir (String) – Path to directory to save Node and Edge lists into. Must include ‘/’ at end.
include_subject (Boolean default = False) – Defines whether the Subject field should be included in the email data

Returns:

py4pa.ona.generate_nx_digraph(node_list, edge_list)¶

Generates NetworkX DiGraph object

Parameters:

node_list (String or Pandas Dataframe) – Path to file containing Node list, or Dataframe of Node List
edge_list (String or Pandas Dataframe) – Path to file containing Edge list

Returns:

G – NetworkX DiGraph object

Return type:

NetworkX DiGraph