COMP4900/5900: Graph Analytics
$\newcommand{\R}{\mathbb{R}}\DeclareMathOperator{\E}{\mathbf{E}}\DeclareMathOperator{\deg}{deg}\newcommand{\N}{\mathbb{N}}$

Assignment 2

Programming for this assignment should be done in Python 3 using the igraph library with some data visualization coming from matplotlib.

This searchable reference for python-igraph is useful.

You are to submit this assignment in Brightspace as a single zip file, with the following structure:

<student_id>/<input_graph_1>
<student_id>/<input_graph_2>
<student_id>/<input_graph_3>
<student_id>/<input_graph_4>
<student_id>/part1.py
<student_id>/part2.py
<student_id>/analysis.txt  (only for students in COMP5900)

If you find yourself struggling with technical issues related to using Python, igraph, or matplotlib, please feel free to ask questions on the Brightspace course forum. There are probably other students experiencing the same issues.

1. Centrality Measures

Choose two directed graphs each with at least $10,000$ vertices. One place to find these is the Stanford Large Network Dataset Collection, but you can also take any other data you are able to find. The graph must be directed but it may have self-loops and multiple edges. To make things more interesting, it would be good if you can find data in which the vertices have meaningful names.

Write a program with the name part1.py that reads each of the graphs into an igraph.Graph object and computes some centrality measures for the vertices. Your program should compute:

Your program should print out basic information about each graph, including

For each of the centrality measures computed above, your program should output a list of the top 50 most central vertices with respect to that measure. (If your data has named vertices, then you can print them here.)

2. Degree Correlations

Choose two unidrected graphs, each with at least $10,000$ vertices. One place to find these is the Stanford Large Network Dataset Collection, but you can also take any other data you are able to find.

Write a program with the name part2.py reads each of the graphs into an igraph.Graph object and computes some degree correlation measures.

Your program should print out basic information about each graph, including

For each of the centrality measures computed above, your program should output a list of the top 50 most central vertices with respect to that measure. (If your data has named vertices, then you can print them here.)