py4pa.ona

py4pa.ona.betweenness_centrality_parallel(G, processes=None)

Parallel betweenness centrality function

py4pa.ona.calc_density(df_nodes, df_edges, target_attribute)

Calculates the density of connections between the groups in a specific target target_attribute

Parameters:
  • df_nodes (Pandas DataFrame) – Dataframe containing the Node list

  • df_edges (Pandas DataFrame) – DataFrame containing the edge list

  • target_attribute (String) – Name of attribute in Node List that we want to calculate the densities between

Return type:

Pandas DataFrame containing the densities, grouped by the target_attribute values

py4pa.ona.calc_modularity(df_nodes, df_edges, target_attribute, weighted=False, direction='outbound')

Calculates the Modularity of connections originating from groups in a specific target target_attribute

Parameters:
  • df_nodes (Pandas DataFrame) – Dataframe containing the Node list

  • df_edges (Pandas DataFrame) – DataFrame containing the edge list

  • target_attribute (String) – Name of attribute in Node List that we want to calculate the modularities between

  • weighted (Boolean default False) – If set to True, the modularities will be weighted by the amount of email traffic. If False, will just calculate on basis on presence of a connection

  • direction (String default = 'outbound') – ‘outbound’ or ‘inbound’ determines the direction of the email traffic to be considered

Returns:

  • Pandas DataFrame containing the modularities, grouped by the

  • target_attribute values

py4pa.ona.chunks(l, n)

Divide a list of nodes l in n chunks

py4pa.ona.clean_email_data(dir, files='all', include_subject=False, engine='c', encoding='latin', delete_old_file=False)

Cleans email data from Splunk to key fields only

Parameters:
  • dir (String ()) – Path to the root directory containing the data files to process

  • file (List or 'all' (optional)) – List containing all files to be processed within the directory. If ‘all’ passed, then all CSV files in the directory will be processed

  • include_subject (Boolean default = False) – Defines whether the Subject field should be included in the cleaned data

  • engine (String default='c') – The Pandas engine to read in the data, either ‘c’ or ‘python’

  • encoding (String default = 'latin') – The file encoding of files to be read

  • delete_old_file (Boolean default = False) – If set to True, the original Splunk data file will be delete_old_file

Returns:

  • Nothing is returned by the function, but new files are written to ‘dir’ that

  • have been cleaned

py4pa.ona.generate_node_edge_lists(email_data, demographic_data, demographic_key, output_dir, include_subject=False)

Generates Node and Edge lists from email data and saves them to csv

Parameters:
  • email_data (List of Strings) – List of paths to all files containing email data to be processed

  • demographic_data (String) – Path to file containing all node demographic data to be added to Node list

  • demographic_key (String) – Column in demographic_data that contains email address to act as join to email_data

  • output_dir (String) – Path to directory to save Node and Edge lists into. Must include ‘/’ at end.

  • include_subject (Boolean default = False) – Defines whether the Subject field should be included in the email data

Returns:

  • nodeList_fPath (String) – Path to the Node List generated

  • edgeList_fPath (String) – Path to the Edge List generated

py4pa.ona.generate_nx_digraph(node_list, edge_list)

Generates NetworkX DiGraph object

Parameters:
  • node_list (String or Pandas Dataframe) – Path to file containing Node list, or Dataframe of Node List

  • edge_list (String or Pandas Dataframe) – Path to file containing Edge list

Returns:

G – NetworkX DiGraph object

Return type:

NetworkX DiGraph

py4pa.ona.generate_nx_digraph_pandas(nodes_df, edges_df)