py4pa.ona¶
- py4pa.ona.betweenness_centrality_parallel(G, processes=None)¶
Parallel betweenness centrality function
- py4pa.ona.calc_density(df_nodes, df_edges, target_attribute)¶
Calculates the density of connections between the groups in a specific target target_attribute
- Parameters:
df_nodes (Pandas DataFrame) – Dataframe containing the Node list
df_edges (Pandas DataFrame) – DataFrame containing the edge list
target_attribute (String) – Name of attribute in Node List that we want to calculate the densities between
- Return type:
Pandas DataFrame containing the densities, grouped by the target_attribute values
- py4pa.ona.calc_modularity(df_nodes, df_edges, target_attribute, weighted=False, direction='outbound')¶
Calculates the Modularity of connections originating from groups in a specific target target_attribute
- Parameters:
df_nodes (Pandas DataFrame) – Dataframe containing the Node list
df_edges (Pandas DataFrame) – DataFrame containing the edge list
target_attribute (String) – Name of attribute in Node List that we want to calculate the modularities between
weighted (Boolean default False) – If set to True, the modularities will be weighted by the amount of email traffic. If False, will just calculate on basis on presence of a connection
direction (String default = 'outbound') – ‘outbound’ or ‘inbound’ determines the direction of the email traffic to be considered
- Returns:
Pandas DataFrame containing the modularities, grouped by the
target_attribute values
- py4pa.ona.chunks(l, n)¶
Divide a list of nodes l in n chunks
- py4pa.ona.clean_email_data(dir, files='all', include_subject=False, engine='c', encoding='latin', delete_old_file=False)¶
Cleans email data from Splunk to key fields only
- Parameters:
dir (String ()) – Path to the root directory containing the data files to process
file (List or 'all' (optional)) – List containing all files to be processed within the directory. If ‘all’ passed, then all CSV files in the directory will be processed
include_subject (Boolean default = False) – Defines whether the Subject field should be included in the cleaned data
engine (String default='c') – The Pandas engine to read in the data, either ‘c’ or ‘python’
encoding (String default = 'latin') – The file encoding of files to be read
delete_old_file (Boolean default = False) – If set to True, the original Splunk data file will be delete_old_file
- Returns:
Nothing is returned by the function, but new files are written to ‘dir’ that
have been cleaned
- py4pa.ona.generate_node_edge_lists(email_data, demographic_data, demographic_key, output_dir, include_subject=False)¶
Generates Node and Edge lists from email data and saves them to csv
- Parameters:
email_data (List of Strings) – List of paths to all files containing email data to be processed
demographic_data (String) – Path to file containing all node demographic data to be added to Node list
demographic_key (String) – Column in demographic_data that contains email address to act as join to email_data
output_dir (String) – Path to directory to save Node and Edge lists into. Must include ‘/’ at end.
include_subject (Boolean default = False) – Defines whether the Subject field should be included in the email data
- Returns:
nodeList_fPath (String) – Path to the Node List generated
edgeList_fPath (String) – Path to the Edge List generated
- py4pa.ona.generate_nx_digraph(node_list, edge_list)¶
Generates NetworkX DiGraph object
- Parameters:
node_list (String or Pandas Dataframe) – Path to file containing Node list, or Dataframe of Node List
edge_list (String or Pandas Dataframe) – Path to file containing Edge list
- Returns:
G – NetworkX DiGraph object
- Return type:
NetworkX DiGraph
- py4pa.ona.generate_nx_digraph_pandas(nodes_df, edges_df)¶