removed answer

bfe4e9ad · Samuel Koovely · e8e129b1 · bfe4e9ad
Commit bfe4e9ad authored 2 years ago by Samuel Koovely
--- a/tutorials/ES1/ES1_Tutorial3.ipynb
+++ b/tutorials/ES1/ES1_Tutorial3.ipynb
@@ -764,9 +764,7 @@
   "execution_count": null,
   "metadata": {},
   "outputs": [],
-   "source": [
-    "dijkstra(G,'IND','FAI')"
-   ]
+   "source": []
  },
  {
   "cell_type": "markdown",

 %% Cell type:code id: tags:

 ``` python
 import networkx as nx
 %matplotlib inline
 import numpy as np
 ```

 %% Cell type:markdown id: tags:

 # Tutorial 3

 This tutorial is based on a tutorial shared on https://github.com/CambridgeUniversityPress/FirstCourseNetworkScience.

 Contents:

 1. Paths
 2. Connected components
 3. Directed paths & components
 4. Dataset: US air traffic network

 %% Cell type:markdown id: tags:

 # 1. Paths

 Let's start with a very simple, undirected network.

 %% Cell type:code id: tags:

 ``` python
 G = nx.Graph()

 G.add_nodes_from([1,2,3,4])

 G.add_edges_from([(1,2),(2,3),(1,3),(1,4)])

 nx.draw(G, with_labels=True)
 ```

 %% Cell type:markdown id: tags:

 In Networkx, a *path* is a sequence of edges connecting two nodes (in the lecture we called it a *walk*). In this simple example, we can easily see that there is indeed at least one path that connects nodes 3 and 4. We can verify this with NetworkX:

 %% Cell type:code id: tags:

 ``` python
 nx.has_path(G, 3, 4)
 ```

 %% Cell type:markdown id: tags:

 There can be more than one path between two nodes. Again considering nodes 3 and 4, there are two such "simple" paths:

 %% Cell type:code id: tags:

 ``` python
 list(nx.all_simple_paths(G, 3, 4))
 ```

 %% Cell type:markdown id: tags:

 A simple path is one without any cycles (in the lecture we called this a *path*). If we allowed cycles, there would be infinitely many paths because one could always just go around the cycle as many times as desired.

 We are often most interested in *shortest* paths. In an unweighted network, the shortest path is the one with the fewest edges. We can see that of the two simple paths between nodes 3 and 4, one is shorter than the other. We can get this shortest path with a single NetworkX function:

 %% Cell type:code id: tags:

 ``` python
 nx.shortest_path(G, 3, 4)
 ```

 %% Cell type:markdown id: tags:

 If you only care about the path length, there's a function for that too:

 %% Cell type:code id: tags:

 ``` python
 nx.shortest_path_length(G, 3, 4)
 ```

 %% Cell type:markdown id: tags:

 Note that a path length is defined here by the number of *edges* in the path, not the number of nodes, which implies

    nx.shortest_path_length(G, u, v) == len(nx.shortest_path(G, u, v)) - 1

 for nodes $u$ and $v$.

 %% Cell type:markdown id: tags:

 ## 2. Connected components

 In the simple network above, we can see that for *every* pair of nodes, we can find a path connecting them. This is the definition of a *connected* graph. We can check this property for a given graph:

 %% Cell type:code id: tags:

 ``` python
 nx.is_connected(G)
 ```

 %% Cell type:markdown id: tags:

 Not every graph is connected:

 %% Cell type:code id: tags:

 ``` python
 G = nx.Graph()

 nx.add_cycle(G, (1,2,3))
 G.add_edge(4,5)

 nx.draw(G, with_labels=True)
 ```

 %% Cell type:code id: tags:

 ``` python
 nx.is_connected(G)
 ```

 %% Cell type:markdown id: tags:

 And NetworkX will raise an error if you ask for a path between nodes where none exists:

 %% Cell type:code id: tags:

 ``` python
 nx.has_path(G, 3, 5)
 ```

 %% Cell type:code id: tags:raises-exception

 ``` python
 nx.shortest_path(G, 3, 5)
 ```

 %% Cell type:markdown id: tags:

 Visually, we can identify two connected components in our graph. Let's verify this:

 %% Cell type:code id: tags:

 ``` python
 nx.number_connected_components(G)
 ```

 %% Cell type:markdown id: tags:

 The `nx.connected_components()` function takes a graph and returns a list of sets of node names, one such set for each connected component. Verify that the two sets in the following list correspond to the two connected components in the drawing of the graph above:

 %% Cell type:code id: tags:

 ``` python
 list(nx.connected_components(G))
 ```

 %% Cell type:markdown id: tags:

 In case you're not familiar with Python sets, they are collections of items without duplicates. These are useful for collecting node names because node names should be unique. As with other collections, we can get the number of items in a set with the `len` function:

 %% Cell type:code id: tags:

 ``` python
 components = list(nx.connected_components(G))
 len(components[0])
 ```

 %% Cell type:markdown id: tags:

 We often care about the largest connected component, which is sometimes referred to as the *core* of the network. We can make use of Python's builtin `max` function in order to obtain the largest connected component. By default, Python's `max` function sorts things in lexicographic (i.e. alphabetical) order, which is not helpful here. We want the maximum connected component when sorted in order of their sizes, so we pass `len` as a key function:

 %% Cell type:code id: tags:

 ``` python
 max(nx.connected_components(G), key=len)
 ```

 %% Cell type:markdown id: tags:

 While it's often enough to just have the list of node names, sometimes we need the actual subgraph consisting of the largest connected component. One way to get this is to pass the list of node names to the `G.subgraph()` function:

 %% Cell type:code id: tags:

 ``` python
 core_nodes = max(nx.connected_components(G), key=len)
 core = G.subgraph(core_nodes)

 nx.draw(core, with_labels=True)
 ```

 %% Cell type:markdown id: tags:

 Those of you using tab-completion will also notice a `nx.connected_component_subgraphs()` function. This can also be used to get the core subgraph but the method shown is more efficient when you only care about the largest connected component.

 %% Cell type:markdown id: tags:

 # 3. Directed paths & components

 Let's extend these ideas about paths and connected components to directed graphs.

 %% Cell type:code id: tags:

 ``` python
 D = nx.DiGraph()
 D.add_edges_from([
    (1,2),
    (2,3),
    (3,2), (3,4), (3,5),
    (4,2), (4,5), (4,6),
    (5,6),
    (6,4),
 ])
 nx.draw(D, with_labels=True)
 ```

 %% Cell type:markdown id: tags:

 ### Directed paths

 We know that in a directed graph, an edge from an arbitrary node $u$ to an arbitrary node $v$ does not imply that an edge exists from $v$ to $u$. Since paths must follow edge direction in directed graphs, the same asymmetry applies for paths. Observe that this graph has a path from 1 to 4, but not in the reverse direction.

 %% Cell type:code id: tags:

 ``` python
 nx.has_path(D, 1, 4)
 ```

 %% Cell type:code id: tags:

 ``` python
 nx.has_path(D, 4, 1)
 ```

 %% Cell type:markdown id: tags:

 The other NetworkX functions dealing with paths take this asymmetry into account as well:

 %% Cell type:code id: tags:

 ``` python
 nx.shortest_path(D, 2, 5)
 ```

 %% Cell type:code id: tags:

 ``` python
 nx.shortest_path(D, 5, 2)
 ```

 %% Cell type:markdown id: tags:

 Since there is no edge from 5 to 3, the shortest path from 5 to 2 cannot simply backtrack the shortest path from 2 to 5 -- it has to go a longer route through nodes 6 and 4.

 %% Cell type:markdown id: tags:

 ### Directed components

 %% Cell type:markdown id: tags:

 Directed networks have two kinds of connectivity. *Strongly connected* means that there exists a directed path between every pair of nodes, i.e., that from any node we can get to any other node while following edge directionality. Think of cars on a network of one-way streets: they can't drive against the flow of traffic.

 %% Cell type:code id: tags:

 ``` python
 nx.is_strongly_connected(D)
 ```

 %% Cell type:markdown id: tags:

 *Weakly connected* means that there exist a path between every pair of nodes, regardless of direction. Think about pedestrians on a network of one-way streets: they walk on the sidewalks so they don't care about the direction of traffic.

 %% Cell type:code id: tags:

 ``` python
 nx.is_weakly_connected(D)
 ```

 %% Cell type:markdown id: tags:

 If a network is strongly connected, it is also weakly connected. The converse is not always true, as seen in this example.

 The `is_connected` function for undirected graphs will raise an error when given a directed graph.

 %% Cell type:code id: tags:raises-exception

 ``` python
 # This will raise an error
 nx.is_connected(D)
 ```

 %% Cell type:markdown id: tags:

 In the directed case, instead of `nx.connected_components` we now have `nx.weakly_connected_components` and `nx.strongly_connected_components`:

 %% Cell type:code id: tags:

 ``` python
 list(nx.weakly_connected_components(D))
 ```

 %% Cell type:code id: tags:

 ``` python
 list(nx.strongly_connected_components(D))
 ```

 %% Cell type:markdown id: tags:

 ## 4. Dataset: US air traffic network

 This repository contains several example network datasets. Among these is a network of US air travel routes:

 %% Cell type:code id: tags:

 ``` python
 G = nx.read_graphml('./datasets/openflights/openflights_usa.graphml.gz')
 ```

 %% Cell type:markdown id: tags:

 The nodes in this graph are airports, represented by their [IATA codes](https://en.wikipedia.org/wiki/List_of_airports_by_IATA_code:_A); two nodes are connected with an edge if there is a scheduled flight directly connecting these two airports. We'll assume this graph to be undirected since a flight in one direction usually means there is a return flight.

 Thus this graph has edges
 ```
 [('HOM', 'ANC'), ('BGM', 'PHL'), ('BGM', 'IAD'), ...]
 ```
 where ANC is Anchorage, IAD is Washington Dulles, etc.

 These nodes also have **attributes** associated with them, containing additional information about the airports:

 %% Cell type:code id: tags:

 ``` python
 G.nodes['IND']
 ```

 %% Cell type:markdown id: tags:

 Node attributes are stored as a dictionary, so the values can be accessed individually as such:

 %% Cell type:code id: tags:

 ``` python
 G.nodes['IND']['name']
 ```

 %% Cell type:markdown id: tags:

 Now, let's see now how big the network is

 %% Cell type:code id: tags:

 ``` python
 first_airport = list(G.nodes())[0]
 print(G.nodes[first_airport])

 print(list(G.neighbors(first_airport)))
 ```

 %% Cell type:markdown id: tags:

 We want to explore how well connected a certain node is. Let's look for example at the first airport listed in the network.

 %% Cell type:code id: tags:

 ``` python
 first_airport = list(G.nodes())[0]
 print(G.nodes[first_airport])

 print(list(G.neighbors(first_airport)))
 ```

 %% Cell type:markdown id: tags:

 So, the first listed airport is the Redding Municipal Airport ('RDD'), and it is directly connected only to 'SFO'. One can see that 'RDD' has only one neighbour by counting how many non-zero elements are present on the first row of adjacency matrix of the network (remember that for Pyhton objects, indexes starts from 0).

 %% Cell type:code id: tags:

 ``` python
 G_matrix = nx.to_numpy_matrix(G)
 print(np.count_nonzero(G_matrix[0,:]))
 ```

 %% Cell type:markdown id: tags:

 If we look at direct flights 'RDD' is quite isolated, but if we look at the first row of i-th power of the adjacency matrix, we can find how many airports are conncected to 'RDD' by walks of length i! Let's see this for walks with length ranging from 1 to 5.

 %% Cell type:code id: tags:

 ``` python
 power_G = G_matrix.copy()
 for i in range(5):
    i_walks_node0 = power_G[0,:]

    print(f'Number of neighbors reachable by a walk of length {i+1}:', np.count_nonzero(i_walks_node0))
    print(f'Total number of walks of length {i+1} starting on node 0:', int(i_walks_node0.sum()))

    power_G = power_G @ G_matrix # matrix multiplication
 ```

 %% Cell type:markdown id: tags:

 So, out of the 546 airports (including 'RDD'), one can reach 534 of these with exactly 5 flights. We also see that the total number of walks grows extremely rapidly. This is because walks can have cycles.

 Let's close this little investigation by looking at how many walks of length ranging from 1 to 5 there are between 'RDD' and its neighbour 'SFO'.

 %% Cell type:code id: tags:

 ``` python
 index_SFO = np.nonzero(G_matrix[0]) #We know that 'SFO' its the only neighbour of 'RDD'

 power_G = G_matrix.copy()
 for i in range(1,6):
    print(power_G[0][index_SFO])
    power_G = power_G @ G_matrix # matrix multiplication
 ```

 %% Cell type:markdown id: tags:

 # EXERCISE 1

 Is there a direct flight between Indianapolis and Fairbanks, Alaska (FAI)? A direct flight is one with no intermediate stops.

 %% Cell type:code id: tags:

 ``` python
 ```

 %% Cell type:markdown id: tags:

 # EXERCISE 2

 If I wanted to fly from Indianapolis to Fairbanks, Alaska what would be an itinerary with the fewest number of flights?

 %% Cell type:code id: tags:

 ``` python
 ```

 %% Cell type:markdown id: tags:

 # EXERCISE 3
 By default, NetworkX uses the Dijkstra algorithm to compute shortest paths (you can check the source code [here](https://networkx.org/documentation/stable/_modules/networkx/algorithms/shortest_paths/generic.html#shortest_path)).
 Below, you will find an implementation of the Dijkstra algorithm (taken from [here](https://gist.github.com/aeged/db5bfda411903ecd89a3ba3cb7791a05)) using a NetworkX graph and a Python [PriorityQueue](https://docs.python.org/3/library/queue.html).

 Carefully read it and identify the algorithm's steps given in the lecture's slides.

 %% Cell type:code id: tags:

 ``` python
 # dependencies for our dijkstra's implementation
 from queue import PriorityQueue
 from math import inf
 # graph dependency
 import networkx as nx


 """Dijkstra's shortest path algorithm"""
 def dijkstra(graph: 'networkx.classes.graph.Graph', start: str, end: str) -> 'List':
    """Get the shortest path of nodes by going backwards through prev list
    credits: https://github.com/blkrt/dijkstra-python/blob/3dfeaa789e013567cd1d55c9a4db659309dea7a5/dijkstra.py#L5-L10"""
    def backtrace(prev, start, end):
        node = end
        path = []
        while node != start:
            path.append(node)
            node = prev[node]
        path.append(node)
        path.reverse()
        return path

    """get the cost of edges from node -> node"""
    def cost(u, v):
        return 1 # here we consider each edge with a unit length.

    """main algorithm"""
    # predecessor of current node on shortest path
    prev = {}
    # initialize distances from start -> given node i.e. dist[node] = dist(start, node)
    dist = {v: inf for v in list(nx.nodes(graph))}
    # nodes we've visited
    visited = set()
    # prioritize nodes from start -> node with the shortest distance!
    ## elements stored as tuples (distance, node)
    pq = PriorityQueue()

    dist[start] = 0  # dist from start -> start is zero
    pq.put((dist[start], start))

    while 0 != pq.qsize():
        curr_cost, curr = pq.get()
        visited.add(curr)
        #print(f'visiting {curr}')
        # look at curr's adjacent nodes
        for neighbor in dict(graph.adjacency()).get(curr):
            # if we found a shorter path
            path = dist[curr] + cost(curr, neighbor)
            if path < dist[neighbor]:
                # update the distance, we found a shorter one!
                dist[neighbor] = path
                # update the previous node to be prev on new shortest path
                prev[neighbor] = curr
                # if we haven't visited the neighbor
                if neighbor not in visited:
                    # insert into priority queue and mark as visited
                    visited.add(neighbor)
                    pq.put((dist[neighbor],neighbor))
                # otherwise update the entry in the priority queue
                else:
                    # remove old
                    _ = pq.get((dist[neighbor],neighbor))
                    # insert new
                    pq.put((dist[neighbor],neighbor))
    print("=== Dijkstra's Algo Output ===")
    #print("Distances")
    #print(dist)
    #print("Visited")
    #print(visited)
    #print("Previous")
    #print(prev)
    # we are done after every possible path has been checked
    return backtrace(prev, start, end), dist[end]
 ```

 %% Cell type:markdown id: tags:

 Use this newly defined dijkstra function to compute the distance between 'IND' and 'FAI'.

 %% Cell type:code id: tags:

 ``` python
-dijkstra(G,'IND','FAI')
 ```

 %% Cell type:markdown id: tags:

 You might find a different shortest path from the one found before (but with the same length). Indeed, by printing all the shortest paths between the two nodes we can see that, in this case, the shortest path is not unique.

 %% Cell type:code id: tags:

 ``` python
 print([p for p in nx.all_shortest_paths(G, 'IND', 'FAI')])
 ```

 %% Cell type:markdown id: tags:

 # EXERCISE 4

 Is it possible to travel from any airport in the US to any other airport in the US, possibly using connecting flights? In other words, does there exist a path in the network between every possible pair of airports?

 %% Cell type:code id: tags:

 ``` python
 ```

 %% Cell type:markdown id: tags:

 Finally, open the US flight dataset with [Gephi](https://gephi.org/). You can run gephi by entering the command `gephi` in a terminal of the IMATH servers (from a thinlinc client), or you can install it on your computer.

 Try different layouts and visualizations, for example the "Geo Layout" allows you to use plot the network following the geographical locations of each airport (you might have to install the GeoLayout plugin in Tool > Plugins). You can also detect the connected components in the "Statistics" panel and display them with different colors in the "Appearance" panel (Nodes > Partition> Component ID).