From 8c0dbd08fe4609dc638b4d5a29f8807cd3d456a9 Mon Sep 17 00:00:00 2001 From: Samuel Koovely <samuel.koovely@math.uzh.ch> Date: Fri, 16 Sep 2022 20:08:15 +0200 Subject: [PATCH] Upload New File --- tutorials/ES1/ES1_Tutorial3.ipynb | 706 ++++++++++++++++++++++++++++++ 1 file changed, 706 insertions(+) create mode 100644 tutorials/ES1/ES1_Tutorial3.ipynb diff --git a/tutorials/ES1/ES1_Tutorial3.ipynb b/tutorials/ES1/ES1_Tutorial3.ipynb new file mode 100644 index 0000000..8a8f977 --- /dev/null +++ b/tutorials/ES1/ES1_Tutorial3.ipynb @@ -0,0 +1,706 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "import networkx as nx\n", + "%matplotlib inline\n", + "import numpy as np" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Tutorial 3\n", + "\n", + "This tutorial is based on a tutorial shared on \"https://github.com/CambridgeUniversityPress/FirstCourseNetworkScience\".\n", + "\n", + "Contents:\n", + "\n", + "1. Paths\n", + "2. Connected components\n", + "3. Directed paths & components\n", + "4. Dataset: US air traffic network" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 1. Paths\n", + "\n", + "Let's start with a very simple, undirected network." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "G = nx.Graph()\n", + "\n", + "G.add_nodes_from([1,2,3,4])\n", + "\n", + "G.add_edges_from([(1,2),(2,3),(1,3),(1,4)])\n", + "\n", + "nx.draw(G, with_labels=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A *path* in a network is a sequence of edges connecting two nodes. In this simple example, we can easily see that there is indeed at least one path that connects nodes 3 and 4. We can verify this with NetworkX:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "nx.has_path(G, 3, 4)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "There can be more than one path between two nodes. Again considering nodes 3 and 4, there are two such \"simple\" paths:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "list(nx.all_simple_paths(G, 3, 4))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A simple path is one without any cycles. If we allowed cycles, there would be infinitely many paths because one could always just go around the cycle as many times as desired.\n", + "\n", + "We are often most interested in *shortest* paths. In an unweighted network, the shortest path is the one with the fewest edges. We can see that of the two simple paths between nodes 3 and 4, one is shorter than the other. We can get this shortest path with a single NetworkX function:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "nx.shortest_path(G, 3, 4)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you only care about the path length, there's a function for that too:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "nx.shortest_path_length(G, 3, 4)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Note that a path length is defined here by the number of *edges* in the path, not the number of nodes, which implies\n", + "\n", + " nx.shortest_path_length(G, u, v) == len(nx.shortest_path(G, u, v)) - 1\n", + " \n", + "for nodes $u$ and $v$." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. Connected components\n", + "\n", + "In the simple network above, we can see that for *every* pair of nodes, we can find a path connecting them. This is the definition of a *connected* graph. We can check this property for a given graph:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "nx.is_connected(G)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Not every graph is connected:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "G = nx.Graph()\n", + "\n", + "nx.add_cycle(G, (1,2,3))\n", + "G.add_edge(4,5)\n", + "\n", + "nx.draw(G, with_labels=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "nx.is_connected(G)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "And NetworkX will raise an error if you ask for a path between nodes where none exists:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "nx.has_path(G, 3, 5)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true, + "tags": [ + "raises-exception" + ] + }, + "outputs": [], + "source": [ + "nx.shortest_path(G, 3, 5)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Visually, we can identify two connected components in our graph. Let's verify this:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "nx.number_connected_components(G)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The `nx.connected_components()` function takes a graph and returns a list of sets of node names, one such set for each connected component. Verify that the two sets in the following list correspond to the two connected components in the drawing of the graph above:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "list(nx.connected_components(G))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In case you're not familiar with Python sets, they are collections of items without duplicates. These are useful for collecting node names because node names should be unique. As with other collections, we can get the number of items in a set with the `len` function:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "components = list(nx.connected_components(G))\n", + "len(components[0])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We often care about the largest connected component, which is sometimes referred to as the *core* of the network. We can make use of Python's builtin `max` function in order to obtain the largest connected component. By default, Python's `max` function sorts things in lexicographic (i.e. alphabetical) order, which is not helpful here. We want the maximum connected component when sorted in order of their sizes, so we pass `len` as a key function:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "max(nx.connected_components(G), key=len)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "While it's often enough to just have the list of node names, sometimes we need the actual subgraph consisting of the largest connected component. One way to get this is to pass the list of node names to the `G.subgraph()` function:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "core_nodes = max(nx.connected_components(G), key=len)\n", + "core = G.subgraph(core_nodes)\n", + "\n", + "nx.draw(core, with_labels=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Those of you using tab-completion will also notice a `nx.connected_component_subgraphs()` function. This can also be used to get the core subgraph but the method shown is more efficient when you only care about the largest connected component." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 3. Directed paths & components\n", + "\n", + "Let's extend these ideas about paths and connected components to directed graphs." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "D = nx.DiGraph()\n", + "D.add_edges_from([\n", + " (1,2),\n", + " (2,3),\n", + " (3,2), (3,4), (3,5),\n", + " (4,2), (4,5), (4,6),\n", + " (5,6),\n", + " (6,4),\n", + "])\n", + "nx.draw(D, with_labels=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Directed paths\n", + "\n", + "We know that in a directed graph, an edge from an arbitrary node $u$ to an arbitrary node $v$ does not imply that an edge exists from $v$ to $u$. Since paths must follow edge direction in directed graphs, the same asymmetry applies for paths. Observe that this graph has a path from 1 to 4, but not in the reverse direction." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "nx.has_path(D, 1, 4)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "nx.has_path(D, 4, 1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The other NetworkX functions dealing with paths take this asymmetry into account as well:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "nx.shortest_path(D, 2, 5)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "nx.shortest_path(D, 5, 2)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Since there is no edge from 5 to 3, the shortest path from 5 to 2 cannot simply backtrack the shortest path from 2 to 5 -- it has to go a longer route through nodes 6 and 4." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Directed components" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Directed networks have two kinds of connectivity. *Strongly connected* means that there exists a directed path between every pair of nodes, i.e., that from any node we can get to any other node while following edge directionality. Think of cars on a network of one-way streets: they can't drive against the flow of traffic." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "nx.is_strongly_connected(D)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "*Weakly connected* means that there exist a path between every pair of nodes, regardless of direction. Think about pedestrians on a network of one-way streets: they walk on the sidewalks so they don't care about the direction of traffic." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "nx.is_weakly_connected(D)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If a network is strongly connected, it is also weakly connected. The converse is not always true, as seen in this example.\n", + "\n", + "The `is_connected` function for undirected graphs will raise an error when given a directed graph." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "raises-exception" + ] + }, + "outputs": [], + "source": [ + "# This will raise an error\n", + "nx.is_connected(D)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In the directed case, instead of `nx.connected_components` we now have `nx.weakly_connected_components` and `nx.strongly_connected_components`:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "list(nx.weakly_connected_components(D))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "list(nx.strongly_connected_components(D))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4. Dataset: US air traffic network\n", + "\n", + "This repository contains several example network datasets. Among these is a network of US air travel routes:" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "G = nx.read_graphml('./datasets/openflights/openflights_usa.graphml.gz')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The nodes in this graph are airports, represented by their [IATA codes](https://en.wikipedia.org/wiki/List_of_airports_by_IATA_code:_A); two nodes are connected with an edge if there is a scheduled flight directly connecting these two airports. We'll assume this graph to be undirected since a flight in one direction usually means there is a return flight.\n", + "\n", + "Thus this graph has edges\n", + "```\n", + "[('HOM', 'ANC'), ('BGM', 'PHL'), ('BGM', 'IAD'), ...]\n", + "```\n", + "where ANC is Anchorage, IAD is Washington Dulles, etc.\n", + "\n", + "These nodes also have **attributes** associated with them, containing additional information about the airports:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "G.nodes['IND']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Node attributes are stored as a dictionary, so the values can be accessed individually as such:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "G.nodes['IND']['name']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now, let's see now how big the network is" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "first_airport = list(G.nodes())[0]\n", + "print(G.nodes[first_airport])\n", + "\n", + "print(list(G.neighbors(first_airport)))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We want to explore how well connected a certain node is. Let's look for example at the first airport listed in the network." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "first_airport = list(G.nodes())[0]\n", + "print(G.nodes[first_airport])\n", + "\n", + "print(list(G.neighbors(first_airport)))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "So, the first listed airport is the Redding Municipal Airport ('RDD'), and it is directly connected only to 'SFO'. This fact also be seen from counting how many non-zero elements are present on the first row of adjacency matrix of the network (remember that for Pyhton objects, indexes starts from 0)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "G_matrix = nx.to_numpy_matrix(G)\n", + "print(np.count_nonzero(G_matrix[0]))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If we look at direct flights 'RDD' is quite isolated, but if we look at the first row of i-th power of the adjacency matrix, we can find how many airports are conncected to 'RDD' by walks of length i! Let's see this for walks with length ranging from 1 to 5." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for i in range(1,6): #notice how the iterator range(j,k) iterates over {j, j+1, ..., k-1}\n", + " power_i_G = np.linalg.matrix_power(G_matrix, i)\n", + " i_walks_node0 = power_i_G[0]\n", + " print(np.count_nonzero(i_walks_node0))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "So, out of the 546 airports (including 'RDD'), one can reach 534 of these with exactly 5 flights. Let's close this little investigation by looking at how many walks of length ranging from 1 to 5 there are between 'RDD' and its neighbour 'SFO'." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "index_SFO = np.nonzero(G_matrix[0]) #We know that 'SFO' its the only neighbour of 'RDD'\n", + "\n", + "for i in range(1,6):\n", + " power_i_G = np.linalg.matrix_power(G_matrix, i)\n", + " print(power_i_G[0][index_SFO])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# EXERCISE 1\n", + "\n", + "Is there a direct flight between Indianapolis and Fairbanks, Alaska (FAI)? A direct flight is one with no intermediate stops." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# EXERCISE 2\n", + "\n", + "If I wanted to fly from Indianapolis to Fairbanks, Alaska what would be an itinerary with the fewest number of flights?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# EXERCISE 3\n", + "\n", + "Is it possible to travel from any airport in the US to any other airport in the US, possibly using connecting flights? In other words, does there exist a path in the network between every possible pair of airports?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.8.3 ('venv': venv)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.3" + }, + "vscode": { + "interpreter": { + "hash": "3699540f6baed5bd29a193b0c2d028af3f2c80498e3cac18f2b44cdd848387e2" + } + } + }, + "nbformat": 4, + "nbformat_minor": 1 +} -- GitLab