{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Pandas - Data Science with Python" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Numpy and numpy arrays are our tool of choice for numeric data that resembles vectors, matrices (and higher dimensional tensors).\n", "\n", "Where data is gathered from experiments, and in particular where we want to extract meaning from the combination of different data sources, and where data is often incomplete, the pandas library offers a number of useful tools (and has become a standard tool for data scientists).\n", "\n", "In this section, we introduce the basics of Pandas.\n", "\n", "In particular, we introduce the two key data types in Pandas: the ``Series`` and the ``DataFrame`` objects." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By convention, the `pandas` library is imported under the name `pd` (the same way that `numpy` is imported under the name `np`:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Motivational example (Series)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Imagine we are working on software for a greengrocer or supermarket, and need to track the number of apples (10), oranges(3) and bananas (22) that are available in the supermarket. \n", "\n", "We could use a python list (or a numpy array) to track these numbers:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "stock = [10, 3, 22]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "However, we would need to remember separately that the entries are in the order of apples, oranges, and bananas. This could be achieved through a second list: " ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "stocklabels = ['apple', 'orange', 'banana']" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "apple : 10\n", "orange : 3\n", "banana : 22\n" ] } ], "source": [ "assert len(stocklabels) == len(stock) # check labels and \n", " # stock are consistent\n", "for label, count in zip(stocklabels, stock):\n", " print(f'{label:10s} : {count:4d}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The above 2-list solution is a little awkward in two ways: firstly, we have use two lists to describe one set of data (and thus need to be carefuly to update them simulatenously, for example), and secondly, the access to the data given a label is inconvenient: We need to find the index of the label with one list, then use this as the index to the other list, for example" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "There are 22 bananas [index=2].\n" ] } ], "source": [ "index = stocklabels.index('banana')\n", "bananas = stock[index]\n", "print(f\"There are {bananas} bananas [index={index}].\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have come across similar examples in the section on dictionaries, and indeed a dictionary is a more convenient solution:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": true }, "outputs": [], "source": [ "stock_dic = {'apple': 10, \n", " 'orange': 3,\n", " 'banana': 22}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In a way, the keys of the dictionary contain the stock labels and the values contain the actual values:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dict_keys(['apple', 'orange', 'banana'])" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stock_dic.keys()" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dict_values([10, 3, 22])" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stock_dic.values()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To retrieve (or change) the value for `apple`, we use `apple` as the key and retrieve the value through the dictionary's indexing notation:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "10" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stock_dic['apple']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And we can summarise the stock as follows:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "apple : 10\n", "orange : 3\n", "banana : 22\n" ] } ], "source": [ "for label in stock_dic:\n", " print(f'{label:10s} : {stock_dic[label]:4d}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is a vast improvement over the 2-lists solution: (i) we only maintain one structure, which contains a value for every key - so we don't need to check that the lists have the same length. (ii) we can access individual elements through the label (using it as a key for the dictionary). " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The Pandas Series object address the requriments above. It is similar to a dictionary, but with improvements for the given problem:\n", "\n", "* the order of the items is maintained\n", "* the values have to have the same type (higher execution performance)\n", "* a (large) number of convenience functionality, for example to deal with missing data, time series, sorting, plotting, and more " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Pandas `Series`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Stock example - `Series`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can create a `Series` object - for example - from a dictionary:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": true }, "outputs": [], "source": [ "stock = pd.Series({'apple': 10, \n", " 'orange': 3,\n", " 'banana': 22})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The default presentation shows the entries one per row, with the label on the left, and the value on the right. " ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "pandas.core.series.Series" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(stock)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "apple 10\n", "orange 3\n", "banana 22\n", "dtype: int64" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stock" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The items on the left are referred to as the `index` of the Series, and are available as the `index` attribute of the `series` object:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['apple', 'orange', 'banana'], dtype='object')" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stock.index" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "pandas.core.indexes.base.Index" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(stock.index)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also access the list of values for each item, using the `values` attribute:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([10, 3, 22])" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stock.values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Regarding data access, the `Series` object behaves like a dictionary:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "10" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stock['apple']" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": true }, "outputs": [], "source": [ "stock['potato'] = 101 # adding more values\n", "stock['cucumber'] = 1\n" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "apple 10\n", "orange 3\n", "banana 22\n", "potato 101\n", "cucumber 1\n", "dtype: int64\n" ] } ], "source": [ "print(stock)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "apple 10\n", "orange 3\n", "banana 22\n", "potato 101\n", "cucumber 1\n", "dtype: int64" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stock" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can plot the data as a bar chart:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAEkCAYAAAAhJPoXAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAFJ1JREFUeJzt3X2UZVV95vHvI4ioiQLSYSGoTSKJEhUhLeLLOBHyAiERVgaJMRM7hoQkQwIxJiPJJIM6y/FlJEaNih3QaROWikSDy2E0rBaiRkX7hdBKk5VeCALDS2sAETTh5Td/nFNUddPQ1XWr6lTt8/2sVavu2ffeur91u+upfffZe59UFZKkdj1q6AIkSQvLoJekxhn0ktQ4g16SGmfQS1LjDHpJapxBL0mNM+glqXEGvSQ1bs+hCwDYf//9a+XKlUOXIUnLyoYNG75VVSt29bglEfQrV65k/fr1Q5chSctKkutn8ziHbiSpcQa9JDXOoJekxhn0ktQ4g16SGmfQS1LjDHpJatwugz7JB5LcluRrM9r2S3Jpkn/pv+/btyfJu5JsTXJVkiMXsnhJ0q7Npkf/v4Hjdmg7C1hXVYcC6/pjgOOBQ/uv04D3zU+ZkqS52uXK2Kr6XJKVOzSfCPxkf3stcDnwur79Q9VdcfzLSfZJcmBV3TxfBUuamy3PeObQJfDMa7YMXcIozXWM/oAZ4X0LcEB/+yDghhmPu7FvkyQNZOKTsX3vvXb3eUlOS7I+yfpt27ZNWoYk6WHMNehvTXIgQP/9tr79JuApMx53cN/2EFW1pqpWVdWqFSt2ufmaJGmO5hr0nwRW97dXAxfPaH9VP/vmaOBOx+claVi7PBmb5MN0J173T3IjcDbwFuDCJKcC1wOn9A+/BPg5YCtwD/DqBahZkrQbZjPr5pcf5q5jd/LYAk6ftChJ0vxxZawkNc6gl6TGGfSS1DiDXpIaZ9BLUuMMeklqnEEvSY0z6CWpcQa9JDXOoJekxhn0ktQ4g16SGmfQS1LjDHpJapxBL0mNM+glqXEGvSQ1zqCXpMYZ9JLUOINekhpn0EtS4wx6SWqcQS9JjTPoJalxBr0kNc6gl6TGGfSS1DiDXpIaZ9BLUuMMeklqnEEvSY0z6CWpcRMFfZLXJPl6kq8l+XCSvZMckuSKJFuTfDTJXvNVrCRp98056JMcBJwBrKqqZwF7AK8A3gq8o6qeDtwOnDofhUqS5mbSoZs9gccm2RN4HHAzcAxwUX//WuCkCV9DkjSBOQd9Vd0EvB34Jl3A3wlsAO6oqvv6h90IHDRpkZKkuZtk6GZf4ETgEODJwOOB43bj+aclWZ9k/bZt2+ZahiRpFyYZuvkp4BtVta2q7gU+DrwI2KcfygE4GLhpZ0+uqjVVtaqqVq1YsWKCMiRJj2SSoP8mcHSSxyUJcCxwNXAZcHL/mNXAxZOVKEmaxCRj9FfQnXTdCGzuf9Ya4HXAHyTZCjwJOH8e6pQkzdGeu37Iw6uqs4Gzd2i+Fjhqkp8rSZo/royVpMYZ9JLUOINekhpn0EtS4wx6SWqcQS9JjTPoJalxBr0kNc6gl6TGGfSS1DiDXpIaZ9BLUuMMeklqnEEvSY0z6CWpcQa9JDXOoJekxhn0ktQ4g16SGmfQS1LjDHpJapxBL0mNM+glqXEGvSQ1zqCXpMYZ9JLUOINekhpn0EtS4wx6SWqcQS9JjTPoJalxBr0kNW6ioE+yT5KLklyTZEuSFyTZL8mlSf6l/77vfBUrSdp9k/bo3wl8uqqeARwObAHOAtZV1aHAuv5YkjSQOQd9kicCLwHOB6iqf6+qO4ATgbX9w9YCJ01apCRp7ibp0R8CbAM+mGRTkvOSPB44oKpu7h9zC3DApEVKkuZukqDfEzgSeF9VHQHczQ7DNFVVQO3syUlOS7I+yfpt27ZNUIYk6ZFMEvQ3AjdW1RX98UV0wX9rkgMB+u+37ezJVbWmqlZV1aoVK1ZMUIYk6ZHMOeir6hbghiQ/1jcdC1wNfBJY3betBi6eqEJJ0kT2nPD5vwdckGQv4Frg1XR/PC5McipwPXDKhK8hSZrAREFfVVcCq3Zy17GT/FxJ0vxxZawkNc6gl6TGGfSS1DiDXpIaZ9BLUuMMeklqnEEvSY0z6CWpcQa9JDXOoJekxhn0ktQ4g16SGmfQS1LjDHpJapxBL0mNM+glqXEGvSQ1zqCXpMYZ9JLUOINekhpn0EtS4wx6SWqcQS9JjTPoJalxBr0kNc6gl6TGGfSS1DiDXpIaZ9BLUuMMeklqnEEvSY0z6CWpcRMHfZI9kmxK8qn++JAkVyTZmuSjSfaavExJ0lzNR4/+TGDLjOO3Au+oqqcDtwOnzsNrSJLmaKKgT3IwcAJwXn8c4Bjgov4ha4GTJnkNSdJkJu3R/wXwX4EH+uMnAXdU1X398Y3AQRO+hiRpAnMO+iQ/D9xWVRvm+PzTkqxPsn7btm1zLUOStAuT9OhfBLwsyXXAR+iGbN4J7JNkz/4xBwM37ezJVbWmqlZV1aoVK1ZMUIYk6ZHMOeir6o+r6uCqWgm8AvhsVf0KcBlwcv+w1cDFE1cpSZqzhZhH/zrgD5JspRuzP38BXkOSNEt77vohu1ZVlwOX97evBY6aj58rSZqcK2MlqXEGvSQ1zqCXpMYZ9JLUOINekhpn0EtS4wx6SWqcQS9JjTPoJalxBr0kNc6gl6TGGfSS1DiDXpIaZ9BLUuMMeklqnEEvSY0z6CWpcQa9JDXOoJekxhn0ktQ4g16SGmfQS1LjDHpJapxBL0mNM+glqXEGvSQ1zqCXpMYZ9JLUOINekhpn0EtS4wx6SWqcQS9Jjdtzrk9M8hTgQ8ABQAFrquqdSfYDPgqsBK4DTqmq2ycvVdp9z1777KFLYPPqzUOXoJGbpEd/H/DaqjoMOBo4PclhwFnAuqo6FFjXH0uSBjLnoK+qm6tqY3/7LmALcBBwIrC2f9ha4KRJi5Qkzd28jNEnWQkcAVwBHFBVN/d33UI3tCNJGsjEQZ/kB4C/BX6/qr4z876qKrrx+50977Qk65Os37Zt26RlSJIexkRBn+TRdCF/QVV9vG++NcmB/f0HArft7LlVtaaqVlXVqhUrVkxShiTpEcw56JMEOB/YUlV/PuOuTwKr+9urgYvnXp4kaVJznl4JvAj4VWBzkiv7tj8B3gJcmORU4HrglMlKlCRNYs5BX1VfAPIwdx87158rSZpfroyVpMYZ9JLUOINekhpn0EtS4wx6SWqcQS9JjTPoJalxBr0kNc6gl6TGGfSS1DiDXpIaZ9BLUuMMeklqnEEvSY0z6CWpcZNceGRJWXnW/xm6BK57ywlDlyBJD2GPXpIaZ9BLUuMMeklqnEEvSY0z6CWpcQa9JDXOoJekxhn0ktQ4g16SGmfQS1LjDHpJapxBL0mNM+glqXEGvSQ1rpltijXD6584dAXw+juHrkBSzx69JDVuQYI+yXFJ/jnJ1iRnLcRrSJJmZ96DPskewHuA44HDgF9Octh8v44kaXYWYoz+KGBrVV0LkOQjwInA1QvwWpK0297z258dugROP/eYRXuthRi6OQi4YcbxjX2bJGkAg826SXIacFp/+N0k/zxULTPsD3xrrk/OW+exkuFN9F7whsxfJcOa7H0A8mu+Fw+K78WU333/vNTxtNk8aCGC/ibgKTOOD+7btlNVa4A1C/D6c5ZkfVWtGrqOpcD3ouP7MM33Ytpyey8WYujmq8ChSQ5JshfwCuCTC/A6kqRZmPcefVXdl+R3gc8AewAfqKqvz/frSJJmZ0HG6KvqEuCShfjZC2xJDSUNzPei4/swzfdi2rJ6L1JVQ9cgSVpAboEgSY0z6CWpcQa9JDXOoAeSPG7oGoaWzn9O8t/746cmOWroujS8JC9L8vb+6xeGrmcISfZIctnQdczVqPejT/JC4DzgB4CnJjkc+K2q+i/DVjaI9wIPAMcAbwTuAv4WeN6QRQ0lyQnAjwN7T7VV1RuHq2gYSd5Mt3/VBX3TGUleUFV/MmBZi66q7k/yQJInVtWyu9jCqIMeeAfws/QLuqrqn5K8ZNiSBvP8qjoyySaAqrq9X/A2OknOBR4HvJSuI3Ay8JVBixrOCcBzq+oBgCRrgU3AqIK+911gc5JLgbunGqvqjOFKmp2xBz1VdUO233/j/qFqGdi9/RbTBZBkBV0Pf4xeWFXPSXJVVb0hyTnA/x26qAHtA/xrf3sJXL5sMB/vv5adsQf9Df3wTSV5NHAmsGXgmobyLuATwA8leRNdL/ZPhy1pMN/rv9+T5MnAt4EDB6xnSG8GNvXj0wFeAvzxsCUNo6rWJnks8NSqWgqbMM7aqBdMJdkfeCfwU3T/if8eOLOqvj1oYQNJ8gzgWLr3Yl1VjfKPXpI/A95N9168h+5TznlV9WeDFjaQJAcyfa7mK1V1y5D1DKU/Ef12YK+qOiTJc4E3VtXLBi5tl0Yd9JqWZL+dNN9VVfcuejFLSJLHAHsvxxNw8yHJuqo6dldtY5BkA91khcur6oi+7WtV9axhK9u1UQ7dJHk3/Vj0ziyHkysLYCPd9tK30/Xo9wFuSXIr8JtVtWHI4hZbP6S3kv53JAlV9aFBi1pESfamOyG9f5J96f5PADyB8V5I6N6qunOHc3rL4jzWKIMeWD90AUvQpcBFVfUZgCQ/A/wn4IN0Uy+fP2BtiyrJXwM/AlzJ9Mn5AkYT9MBvAb8PPBnYwHTQfwf4y6GKGtjXk7wS2CPJocAZwBcHrmlWHLoBkjwBqKq6a+hahpJkc1U9e4e2q/rZJ1dW1XOHqm2xJdkCHFb+cpDk96rq3UPXsRT0Cyv/G/AzdH/4PgP8j6r6/qCFzcKogz7JKroe6w/S/cPdAfz62IYpAJL8PbAO+Ejf9EvATwPHAV+tqiOHqm2xJfkYcEZV3Tx0LUtBkmcBh7H94rExfbrZznLsGI496K8CTq+qz/fHLwbeW1XPGbayxdfPQDobeHHf9I/AG4A76aaTbR2qtsXWTyV8Lt0iqX+bal8OsyvmW5KzgZ+kC/pLgOOBL1TVyUPWNYQkzwM+QNcxhO53Y1l0DMce9Jumzp7PaNs4pt6rHirJf9xZe1X9w2LXMrQkm4HDgU1VdXiSA4C/qaqfHri0RbecO4ZjPRk75R+SvB/4MN3Jtl8CLk9yJEBVbRyyuMWU5EeBP2TGTBOAqjpmqJqGMsZAfwTfq6oHktzXD1ncRjc7a4zunwp5gKr6QpL7hixotsYe9If338/eof0IuuAfU8h9DDiXbm+XsW4DAUCSo+kWTD0T2Ivu2sd3V9UTBi1sGOuT7AP8Fd3sm+8CXxq2pMU11fHjYTqGQ9W1O0Y9dKNpSTZU1U8MXcdSkGQ98Aq6P36rgFcBP1pVo1z6PyXJSuAJVXXVwKUsql1sT1zL4VPvqIM+yZOYPgFZwBfoljSPbguEJK+n+1j+CbY/AfmvD/ecViVZX1WrpqaX9m0POZ8zBq6MbcPYh24+AnyObmEQwK8AH6Xb+2ZsVvff/2hGWwE/PEAtQ7un36L5yiRvA25mZBfpcWXsQ/VDWK/ioeexlvxK+rH36B+yT8XOFg5pXJI8DbiVbnz+NXRb8753ZFNMz2R6Zez/m3HXd4C/qqrRrY5N8kXgy8BmZmx9UFVrBytqlsYe9H9ON1f6wr7pZOCoqvrD4aoajgtjtCNXxk5bzlOvxx70dwGPZ3qWyR5MXzmmxjTLwoUx05K8CHg98DS2/4g+umGsfgjrt+n2oYdulsn7x7iraZLX0M06+hTL7DzWqIMeHtye91C278WObh61C2OmJbmGbshmAzOmmo70JP15wKOBqeGJX6WbT/4bw1U1jCSnA2+i2yplKjhrOXQARn0yNslv0F1V6mC6nQqPptuNbowzClwYM+3OqhrzpQNnel5VHT7j+LNJ/mmwaob1WuDpVfWtoQvZXaOaSbATZ9JdOef6qnop3UKpUV5ggocujNnIyBbGzHBZkv+V5AVJjpz6Grqogdyf5EemDpL8MONdULcVuGfoIuZi1D164PtV9f0kJHlMVV2T5MeGLmqxpbuSwpur6g7g3CSfZoQLY2aY2nt/1Yy2sa2UnvJHdH/4ru2PVwKvHq6cQd1NN+X2MrYfo1/y0yvHHvQ39r3YvwMuTXI7cP3ANS26qqoklwDP7o+vG7aiYfWf7tT5R+D9dMOZd9DtwT7WT3p/138tO6M/GTul37HwicCnq+rfh65nsSVZC/xlVX116FqWgiQnAD/O9ifp3zhcRcNIciHd3PkL+qZXAvtU1cuHq0q7y6AX8OBMk0OB6+g+ooaus7/kt2Cdb0nOpVsV+lK6Td5OBr5SVacOWtgAklxdVYftqm0MknyDnVxr2lk3Wk5+FtgX+A/98efoPqqP0Qv7SyheVVVvSHIOMNZZOBuTHF1VXwZI8nzGe83lmeds9gZeDuw3UC27ZeyzbjTtJOCvgf2BFf3t0V1Rqfe9/vs9SZ4M3AscOGA9Q/oJ4ItJrktyHd34/POSbO4vxDEaVfXtGV83VdVfACcMXdds2KPXlFOBo6vqboAkb6X7pR7j8vdP9Sfp30Y31RS6IZwxOm7oApaKHabYPoquh78sMnRZFKlFEbafH30/0zsWjs3bgd+hG8b6EvB54H2DVjSQqhrdLLRHcM6M2/cB3wBOGaiW3WLQa8oHgSuSfKI/Pgk4f8B6hrQWuAt4V3/8SuBDLJNfai2M5Tzt1lk3elD/0fTF/eHnq2rTkPUMxZkm2pkk/xN4W7+wkH6f/tdW1Z8OW9mueTJWD6qqjVX1rv5rlCHf29hfNxYY/UwTTTt+KuQBqup24OcGrGfWHLqRev0OnkW3W+MXk3yzP34acM2QtWlJ2KPfKuXfAJI8FnjMwDXNikEvTfv5oQvQknYBsC7JB/vjVzO9ffOS5hi9JM1SkuOZ3sb80qr6zJD1zJZBL0mNc+hGkmahv/ToVM94L7pzOXcvh0uOGvSSNAtV9YNTt/trOJxId1W6Jc+hG0maoySbquqIoevYFXv0kjQLSX5xxuHUXjffH6ic3WLQS9Ls/MKM2/fRXbthWezwatBL0uw8Cjhzhy0QzgF+fdCqZsEtECRpdp6zky0Qlvz4PBj0kjRbj+p78QAk2Y9lMiqyLIqUpCXgHOBLST7WH78ceNOA9cya0yslaZaSHAYc0x9+tqquHrKe2TLoJalxjtFLUuMMeklqnEEvSY0z6CWpcQa9JDXu/wM5ZaZ+Wdqm2gAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%matplotlib inline\n", "stock.plot(kind='bar')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can sort the data according to the values in the Series (and then plot to visualise):" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAEkCAYAAAAhJPoXAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAFJFJREFUeJzt3X2UZVV95vHvI4iIiQLSw0JQm0QSJSpCWsSXcSLkBUIirBkkxkxkDBmSGRKIMRlJJhnUWY4vIzFqVCSg02ZYKhINLofRsBCiRkWbl4ACWWEhCAwvrQFE0ISX3/xxTqWLpqGLulW9q/b5ftaqVffse27d3zrd9dS5++y9T6oKSVK/HtO6AEnS8jLoJalzBr0kdc6gl6TOGfSS1DmDXpI6Z9BLUucMeknqnEEvSZ3bvnUBALvttlutXbu2dRmStKpcfPHF366qNVvbb0UE/dq1a9mwYUPrMiRpVUly/UL2s+tGkjpn0EtS5wx6SeqcQS9JnTPoJalzBr0kdc6gl6TObTXok3wwyW1Jvj6vbdck5yX5h/H7LmN7krw7yTVJLk9ywHIWL0nauoWc0f8v4NDN2k4Czq+qfYDzx22Aw4B9xq/jgPcvTZmSpMXa6szYqvp8krWbNR8B/NT4eD1wIfD6sf3DNdxx/CtJdk6yR1XdvFQFS9Ks3vubn2tdAsefevA2e6/F9tHvPi+8bwF2Hx/vCdwwb78bxzZJUiMzX4wdz97r0b4uyXFJNiTZsHHjxlnLkCQ9jMUG/a1J9gAYv982tt8EPHXefnuNbQ9RVadV1bqqWrdmzVYXX5MkLdJig/5TwDHj42OAc+a1v3ocfXMQcKf985LU1lYvxib5CMOF192S3AicDLwVOCvJscD1wNHj7ucCPw9cA9wDvGYZapYkPQoLGXXzyw/z1CFb2LeA42ctSpK0dJwZK0mdM+glqXMGvSR1zqCXpM4Z9JLUOYNekjpn0EtS5wx6SeqcQS9JnTPoJalzBr0kdc6gl6TOGfSS1DmDXpI6Z9BLUucMeknqnEEvSZ0z6CWpcwa9JHXOoJekzhn0ktQ5g16SOmfQS1LnDHpJ6pxBL0mdM+glqXMGvSR1zqCXpM4Z9JLUOYNekjpn0EtS5wx6SercTEGf5LVJvpHk60k+kmTHJHsnuSjJNUk+lmSHpSpWkvToLTrok+wJnACsq6pnA9sBrwTeBryzqp4B3A4cuxSFSpIWZ9aum+2BxyfZHtgJuBk4GDh7fH49cOSM7yFJmsGig76qbgLeAXyLIeDvBC4G7qiq+8bdbgT2nLVISdLizdJ1swtwBLA38BTgCcChj+L1xyXZkGTDxo0bF1uGJGkrZum6+Wngm1W1saruBT4BvBjYeezKAdgLuGlLL66q06pqXVWtW7NmzQxlSJIeySxB/y3goCQ7JQlwCHAlcAFw1LjPMcA5s5UoSZrFLH30FzFcdL0EuGL8WacBrwd+N8k1wJOBM5agTknSIm2/9V0eXlWdDJy8WfO1wIGz/FxJ0tJxZqwkdc6gl6TOGfSS1DmDXpI6Z9BLUucMeknqnEEvSZ0z6CWpcwa9JHXOoJekzhn0ktQ5g16SOmfQS1LnDHpJ6pxBL0mdM+glqXMGvSR1zqCXpM4Z9JLUOYNekjpn0EtS5wx6SeqcQS9JnTPoJalzBr0kdc6gl6TOGfSS1DmDXpI6Z9BLUucMeknqnEEvSZ0z6CWpczMFfZKdk5yd5OokVyV5YZJdk5yX5B/G77ssVbGSpEdv1jP6dwGfqapnAvsBVwEnAedX1T7A+eO2JKmRRQd9kicBLwXOAKiqf66qO4AjgPXjbuuBI2ctUpK0eLOc0e8NbAQ+lOTSJKcneQKwe1XdPO5zC7D7rEVKkhZvlqDfHjgAeH9V7Q/czWbdNFVVQG3pxUmOS7IhyYaNGzfOUIYk6ZHMEvQ3AjdW1UXj9tkMwX9rkj0Axu+3benFVXVaVa2rqnVr1qyZoQxJ0iNZdNBX1S3ADUl+fGw6BLgS+BRwzNh2DHDOTBVKkmay/Yyv/23gzCQ7ANcCr2H443FWkmOB64GjZ3wPSdIMZgr6qroMWLeFpw6Z5edKkpaOM2MlqXMGvSR1zqCXpM4Z9JLUOYNekjpn0EtS5wx6SeqcQS9JnTPoJalzBr0kdc6gl6TOGfSS1DmDXpI6Z9BLUucMeknqnEEvSZ0z6CWpcwa9JHXOoJekzhn0ktQ5g16SOmfQS1LnDHpJ6pxBL0mdM+glqXMGvSR1zqCXpM4Z9JLUOYNekjpn0EtS5wx6SeqcQS9JnZs56JNsl+TSJJ8et/dOclGSa5J8LMkOs5cpSVqspTijPxG4at7224B3VtUzgNuBY5fgPSRJizRT0CfZCzgcOH3cDnAwcPa4y3rgyFneQ5I0m1nP6P8U+C/AA+P2k4E7quq+cftGYM8Z30OSNINFB32SXwBuq6qLF/n645JsSLJh48aNiy1DkrQVs5zRvxh4eZLrgI8ydNm8C9g5yfbjPnsBN23pxVV1WlWtq6p1a9asmaEMSdIjWXTQV9UfVNVeVbUWeCXwuar6FeAC4Khxt2OAc2auUpK0aMsxjv71wO8muYahz/6MZXgPSdICbb/1Xbauqi4ELhwfXwscuBQ/V5I0O2fGSlLnDHpJ6pxBL0mdM+glqXMGvSR1zqCXpM4Z9JLUOYNekjpn0EtS5wx6SeqcQS9JnTPoJalzBr0kdc6gl6TOGfSS1DmDXpI6Z9BLUucMeknqnEEvSZ0z6CWpcwa9JHXOoJekzhn0ktQ5g16SOmfQS1LnDHpJ6pxBL0mdM+glqXMGvSR1zqCXpM4Z9JLUOYNekjq3/WJfmOSpwIeB3YECTquqdyXZFfgYsBa4Dji6qm6fvVRJs7jqmc9qXQLPuvqq1iVM0ixn9PcBr6uqfYGDgOOT7AucBJxfVfsA54/bkqRGFh30VXVzVV0yPr4LuArYEzgCWD/uth44ctYiJUmLtyR99EnWAvsDFwG7V9XN41O3MHTtSJIamTnok/wQ8JfA71TVd+c/V1XF0H+/pdcdl2RDkg0bN26ctQxJ0sOYKeiTPJYh5M+sqk+Mzbcm2WN8fg/gti29tqpOq6p1VbVuzZo1s5QhSXoEiw76JAHOAK6qqj+Z99SngGPGx8cA5yy+PEnSrBY9vBJ4MfCrwBVJLhvb/hB4K3BWkmOB64GjZytRkjSLRQd9VX0RyMM8fchif64kaWk5M1aSOmfQS1LnDHpJ6pxBL0mdM+glqXMGvSR1zqCXpM4Z9JLUOYNekjpn0EtS5wx6SeqcQS9JnTPoJalzBr0kdc6gl6TOzXLjEWnFe87657QugSuOuaJ1CZo4z+glqXMGvSR1zqCXpM4Z9JLUOYNekjpn0EtS5wx6SeqcQS9JnTPoJalzBr0kdc6gl6TOGfSS1DmDXpI6Z9BLUudcprhHb3hS6wrgDXe2rkDSyDN6SercsgR9kkOT/H2Sa5KctBzvIUlamCUP+iTbAe8FDgP2BX45yb5L/T6SpIVZjj76A4FrqupagCQfBY4ArlyG9/oXa0/6P8v54xfkurce3roESXqI5ei62RO4Yd72jWObJKmBZqNukhwHHDdufi/J37eqZZ7dgG8v9sV52xJW0t5Mx4I3ZukqaWu24wDkP3gs/kU8FnN+6wNLUsfTF7LTcgT9TcBT523vNbY9SFWdBpy2DO+/aEk2VNW61nWsBB6LgcdhE4/FJqvtWCxH183XgH2S7J1kB+CVwKeW4X0kSQuw5Gf0VXVfkt8CPgtsB3ywqr6x1O8jSVqYZemjr6pzgXOX42cvsxXVldSYx2LgcdjEY7HJqjoWqarWNUiSlpFLIEhS5wx6SeqcQS9JnZts0CfZLskFretYKTL490n+27j9tCQHtq6rpSQ7ta5BK0uSlyd5x/j1i63rWajJrkdfVfcneSDJk6rKxdPhfcADwMHAm4C7gL8Ent+yqBaSvAg4Hfgh4GlJ9gN+o6r+c9vK2khyOPATwI5zbVX1pnYVtZHkLQxreZ05Np2Q5IVV9YcNy1qQyQb96HvAFUnOA+6ea6yqE9qV1MwLquqAJJcCVNXt44S3KXon8HOME/2q6u+SvLRtSW0kORXYCXgZwx+/o4CvNi2qncOB51XVAwBJ1gOXAgb9CveJ8Utw77jEdAEkWcNwhj9JVXVDHrwuy/2tamnsRVX13CSXV9Ubk5wC/N/WRTW0M/CP4+MVcCu3hZl00FfV+iSPB55WVSthUbWW3g18EvhXSd7McOb2R21LauaGsfumkjwWOBG4qnFNrXx//H5PkqcA3wH2aFhPS28BLh2v7QV4KfAHbUtamElPmBovprwD2KGq9k7yPOBNVfXyxqU1keSZwCEM/4nPr6pJhluS3YB3AT/NcCz+Gjixqr7TtLAGkvwx8B6G/xfvZfjEd3pV/XHTwhpJsgebrlt9tapuaVnPQk096C9muPh4YVXtP7Z9vaqe3baybS/Jrltovquq7t3mxWhFSvI4YMepDl5Icn5VHbK1tpVo0l03wL1VdedmfbFT7Ze+hGF56dsZzmJ3Bm5JcivwH6vq4pbFbQtJ3sN4jWJLJnqRfm4U0lrGvEhCVX24aVHbUJIdGS5I75ZkF4bfD4AnskpuqjT1oP9GklcB2yXZBzgB+FLjmlo5Dzi7qj4LkORngX8HfIhh6OULGta2rWxoXcBKk+QvgB8FLmPTBekCJhP0wG8AvwM8BbiYTUH/XeDPWhX1aEy962Yn4L8CP8vwj/dZ4L9X1Q+aFtZAkiuq6jmbtV0+jri4rKqe16q2VpI8Eaiquqt1La0kuQrYt6YcFKMkv11V72ldx2JMOujn+AsNSf4aOB/46Nj0S8DPAIcCX6uqA1rVtq0lWcfwSeaHGU4A7gB+bQrdV5tL8nHghKq6uXUtK0GSZwP78uDJYyv+082kgz7J84EPMvxCA9zJdH+hdwNOBl4yNv0t8EaGY/K0qrqmVW3bWpLLgeOr6gvj9kuA91XVc9tWtu2NQwmfxzBJ6p/m2qc4Mi3JycBPMQT9ucBhwBer6qiWdS3E1IPeX2g9RJJL50ZhzWu7ZEqfauYk+Tdbaq+qv9nWtbSW5ApgP+DSqtovye7A/66qn2lc2lZN/WLs/XMhD1BVX0xyX8uCWknyY8DvMW90BUBVHdyqpob+JskHgI8wXHj8JeDCJAcAVNUlLYvblqYY6I/g+1X1QJL7xu7e2xhGqq14kwz6uV9YHuYXulVdjX0cOJVhPZOpTvefs9/4/eTN2vdn+H8ymT9+SQ5imDD1LGAHhvtA311VT2xaWBsbkuwM/DnD6JvvAV9uW9LCTLLrZivLE9cUz2KTXFxVP9m6Dq0sSTYAr2Q4EVgHvBr4sapaFVP/l0uStcATq+ryxqUsyCSDXg+V5A0MH0U/yYMvuv3jw72mV0mezKYL0wV8kWFpjCkugbChqtbNDbUd2x5yDWMKnBm7So0fw17NQ/ulpzgD8pjx++/PayvgRxrU0tpHgc8zTBgD+BXgYwxr30zNPeNy1ZcleTtwMxO7YVEPM2MnfUaf5EvAV4ArmLf0QVWtb1aUmtvSekdbmlA2BUmeDtzK0D//Woaled83seG2J7JpZuz/m/fUd4E/r6oVPzt26kE/ySFzD2e1TgZZakn+hGHc+Flj01HAgVX1e+2qUmvOjF2lkryW4cr5p7FfetVOBllqSe4CnsCm0UfbsekOZDWlESdJXgy8AXg6D+7enFyX3tiF9ZsM69DDMELvA6thhdepB/3xwJsZprjPHYia6H/iVTsZZDmMyzbvw4M/3UxuTHmSqxm6bC5m3rDbiV6YPh14LDDXtfurDHNxfr1dVQsz6YuxwOuAZ1TVt1sXsgKs2skgSy3JrzPcVWovhlUbD2JY1XTFj65YBndW1ZRvHTjf86tqv3nbn0vyd82qeRQmdfV8C64B7mldxAqx+WSQS1glk0GWwYkMdxG6vqpexjBRapI32wAuSPI/k7wwyQFzX62LauT+JD86t5HkR1glkwunfkZ/N8OwsQt4cB/9pIZXZrjzyluq6g7g1CSfYRVNBlkGP6iqHyQhyeOq6uokP966qEbm7kOwbl7bpGYHz/P7DH/4rh231wKvaVfOwk096P9q/Jq0qqok5wLPGbeva1tRczeOn27+Cjgvye3A9Y1ramL8RKPB3wIfYOjCu4Ph/hWr4lPvpC/GapMk64E/q6qvta5lJRlXb3wS8Jmq+ufW9bSQ5HDgJ3jwhek3tauojSRnMYydP3NsehWwc1W9ol1VCzPpoE/yTbZwj9CJjrq5mmGUyXUMXVphONl3yeYJS3Iqw6zQlzEseHcU8NWqOrZpYQ0kubKq9t1a20o09a6b+f2OOwKvAHZtVEtrPwfsAvzrcfvzDB9PNW0vGm8neXlVvTHJKcBUR+FckuSgqvoKQJIXsEruMzzpUTdV9Z15XzdV1Z8Ch7euq5Ejgb8AdgPWjI8ndxchPcT3x+/3JHkKcC+wR8N6WvpJ4EtJrktyHUP//POTXDHexGjFmvQZ/WbDxB7DcIY/1WNyLHBQVd0NkORtDP+RV+WUby2ZT48Xpt/OMOwWhi6cKTq0dQGLNdVQm3PKvMf3Ad8Ejm5US2vhwWOC72fTKn2arncA/4mhS+/LwBeA9zetqJGqWrUjryYd9A4de5APARcl+eS4fSRwRsN6tDKsB+4C3j1uvwr4MNM9IVqVpj7q5n8Abx8nCjGuNf26qvqjtpW1MXZlvWTc/EJVXdqyHrW3mkeaaJNJX4wFDpsLeYCquh34+Yb1NFVVl1TVu8cvQ14wjjSZ21hNI020yaS7boDtxinu/wSQ5PHA4xrXJDU3rmZaDKs1finJt8btpwNXt6xNj97Ug/5M4PwkHxq3X8OmJUilKfuF1gVo6Uy6jx4gyWFsWn72vKr6bMt6JGmpTT7oJal3k+66GW8ZN/eXbgeG/si7p3SrOEn9m3TQV9UPzz0e12Q/guFuQpLUDbtuNpPk0qrav3UdkrRUJn1Gn+TfztucW+vmB43KkaRlMemgB35x3uP7GNZid8VGSV2ZetA/BjhxsyUQTgF+rWlVkrSEpr4EwnO3sASC/fOSujL1oH/MeBYPQJJd8VOOpM5MPdROAb6c5OPj9iuANzesR5KW3OSHVybZFzh43PxcVV3Zsh5JWmqTD3pJ6t3U++glqXsGvSR1zqCXpM4Z9JLUOYNekjr3/wFdjaZ+Fwg3rwAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "stock.sort_values().plot(kind='bar')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or sort the index to get alphabetical order of our fruit and vegetables:" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAEkCAYAAAAhJPoXAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAFJJJREFUeJzt3X2Q5VV95/H3RxBRE3mQWQpBHRJJlKgIGREf1o2QBwiJULtIjNk4a8iS7JJAjMlKssmibrk+rMSoiSIB3TFLqUg0WC6roUaIGpU4AwQUSGUKQWB5GA0ggiaA3/3j9+tMMzMwzb3dc7rPfb+quvr+zr2377fudH/m3PM75/xSVUiS+vWY1gVIkpaWQS9JnTPoJalzBr0kdc6gl6TOGfSS1DmDXpI6Z9BLUucMeknq3K6tCwDYZ599avXq1a3LkKQVZePGjd+sqlU7etyyCPrVq1ezYcOG1mVI0oqS5MaFPM6hG0nqnEEvSZ0z6CWpcwa9JHXOoJekzhn0ktQ5g16SOrfDoE/ygSR3JPnqvLa9k1yc5B/G73uN7Uny7iSbklyV5LClLF6StGML6dH/L+DordpOB9ZX1UHA+vEY4BjgoPHrZOB9i1OmJGlSO1wZW1WfS7J6q+bjgJ8Yb68DLgVeP7Z/qIYrjn85yZ5J9quqWxerYEma1p/++mdbl8ApZx25015r0jH6feeF923AvuPt/YGb5j3u5rFNktTI1Cdjx957PdrnJTk5yYYkGzZv3jxtGZKkhzFp0N+eZD+A8fsdY/stwFPnPe6AsW0bVXV2Va2pqjWrVu1w8zVJ0oQmDfpPAmvH22uBC+e1v3qcfXMEcLfj85LU1g5Pxib5MMOJ132S3AycAbwVOD/JScCNwInjwy8CfhbYBNwHvGYJapYkPQoLmXXziw9z11HbeWwBp0xblCRp8bgyVpI6Z9BLUucMeknqnEEvSZ0z6CWpcwa9JHXOoJekzhn0ktQ5g16SOmfQS1LnDHpJ6pxBL0mdM+glqXMGvSR1zqCXpM4Z9JLUOYNekjpn0EtS5wx6SeqcQS9JnTPoJalzBr0kdc6gl6TOGfSS1DmDXpI6Z9BLUucMeknqnEEvSZ0z6CWpcwa9JHXOoJekzhn0ktS5qYI+yWuTfC3JV5N8OMnuSQ5MclmSTUk+mmS3xSpWkvToTRz0SfYHTgXWVNWzgV2AVwJvA95ZVc8A7gROWoxCJUmTmXboZlfg8Ul2BZ4A3AocCVww3r8OOH7K15AkTWHioK+qW4B3AN9gCPi7gY3AXVX1wPiwm4H9py1SkjS5aYZu9gKOAw4EngI8ETj6UTz/5CQbkmzYvHnzpGVIknZgmqGbnwS+XlWbq+p+4OPAi4E9x6EcgAOAW7b35Ko6u6rWVNWaVatWTVGGJOmRTBP03wCOSPKEJAGOAq4BLgFOGB+zFrhwuhIlSdOYZoz+MoaTrpcDV48/62zg9cBvJ9kEPBk4dxHqlCRNaNcdP+ThVdUZwBlbNV8PHD7Nz5UkLR5XxkpS5wx6SeqcQS9JnTPoJalzBr0kdc6gl6TOGfSS1DmDXpI6Z9BLUucMeknqnEEvSZ0z6CWpcwa9JHXOoJekzhn0ktQ5g16SOmfQS1LnDHpJ6pxBL0mdM+glqXMGvSR1zqCXpM4Z9JLUOYNekjpn0EtS5wx6SeqcQS9JnTPoJalzBr0kdc6gl6TOGfSS1DmDXpI6N1XQJ9kzyQVJrktybZIXJtk7ycVJ/mH8vtdiFStJevSm7dG/C/h0VT0TOAS4FjgdWF9VBwHrx2NJUiMTB32SPYCXAucCVNU/V9VdwHHAuvFh64Djpy1SkjS5aXr0BwKbgQ8muSLJOUmeCOxbVbeOj7kN2HfaIiVJk5sm6HcFDgPeV1WHAvey1TBNVRVQ23tykpOTbEiyYfPmzVOUIUl6JNME/c3AzVV12Xh8AUPw355kP4Dx+x3be3JVnV1Va6pqzapVq6YoQ5L0SCYO+qq6DbgpyY+OTUcB1wCfBNaObWuBC6eqUJI0lV2nfP5vAucl2Q24HngNw38e5yc5CbgROHHK15AkTWGqoK+qK4E127nrqGl+riRp8bgyVpI6Z9BLUucMeknqnEEvSZ0z6CWpcwa9JHXOoJekzhn0ktQ5g16SOmfQS1LnDHpJ6pxBL0mdM+glqXMGvSR1zqCXpM4Z9JLUOYNekjpn0EtS5wx6SeqcQS9JnTPoJalzBr0kdc6gl6TOGfSS1DmDXpI6Z9BLUucMeknqnEEvSZ0z6CWpcwa9JHXOoJekzhn0ktS5qYM+yS5JrkjyqfH4wCSXJdmU5KNJdpu+TEnSpBajR38acO2847cB76yqZwB3AictwmtIkiY0VdAnOQA4FjhnPA5wJHDB+JB1wPHTvIYkaTrT9uj/GPgvwPfH4ycDd1XVA+PxzcD+U76GJGkKEwd9kp8D7qiqjRM+/+QkG5Js2Lx586RlSJJ2YJoe/YuBlye5AfgIw5DNu4A9k+w6PuYA4JbtPbmqzq6qNVW1ZtWqVVOUIUl6JBMHfVX9XlUdUFWrgVcCn62qXwIuAU4YH7YWuHDqKiVJE1uKefSvB347ySaGMftzl+A1JEkLtOuOH7JjVXUpcOl4+3rg8MX4uZKk6bkyVpI6Z9BLUucMeknqnEEvSZ0z6CWpcwa9JHXOoJekzhn0ktQ5g16SOmfQS1LnDHpJ6pxBL0mdM+glqXMGvSR1zqCXpM4Z9JLUOYNekjpn0EtS5wx6SeqcQS9JnTPoJalzBr0kdc6gl6TOGfSS1DmDXpI6Z9BLUucMeknqnEEvSZ0z6CWpcwa9JHXOoJekzhn0ktS5XSd9YpKnAh8C9gUKOLuq3pVkb+CjwGrgBuDEqrpz+lK1YG/Yo3UF8Ia7W1cgaTRNj/4B4HVVdTBwBHBKkoOB04H1VXUQsH48liQ1MnHQV9WtVXX5ePse4Fpgf+A4YN34sHXA8dMWKUma3KKM0SdZDRwKXAbsW1W3jnfdxjC0I0lqZOqgT/IDwF8Av1VV355/X1UVw/j99p53cpINSTZs3rx52jIkSQ9jqqBP8liGkD+vqj4+Nt+eZL/x/v2AO7b33Ko6u6rWVNWaVatWTVOGJOkRTBz0SQKcC1xbVX80765PAmvH22uBCycvT5I0rYmnVwIvBn4ZuDrJlWPb7wNvBc5PchJwI3DidCVKkqYxcdBX1ReAPMzdR036cyVJi8uVsZLUOYNekjpn0EtS5wx6SeqcQS9JnTPoJalzBr0kdc6gl6TOGfSS1DmDXpI6Z9BLUucMeknqnEEvSZ0z6CWpcwa9JHVumguPLCurT/8/rUvghrce27oESdqGPXpJ6pxBL0mdM+glqXMGvSR1zqCXpM4Z9JLUOYNekjpn0EtS5wx6SeqcQS9JnTPoJalzBr0kdc6gl6TOGfSS1LlutimW9MiufeazWpfAs667tnUJM8kevSR1bkmCPsnRSf4+yaYkpy/Fa0iSFmbRgz7JLsCfAscABwO/mOTgxX4dSdLCLMUY/eHApqq6HiDJR4DjgGuW4LWkR/Scdc9pXQJXr726dQmacUsxdLM/cNO845vHNklSA81m3SQ5GTh5PPxOkr9vVcs8+wDfnPTJedsiVtLeVO8Fb8ziVdLWdO8DkP/ge/Ev4nsx5zfevyh1PH0hD1qKoL8FeOq84wPGtoeoqrOBs5fg9SeWZENVrWldx3LgezHwfdjC92KLlfZeLMXQzVeAg5IcmGQ34JXAJ5fgdSRJC7DoPfqqeiDJbwCfAXYBPlBVX1vs15EkLcySjNFX1UXARUvxs5fYshpKasz3YuD7sIXvxRYr6r1IVbWuQZK0hNwCQZI6Z9BLUucMeknqnEEPJHlC6xq0PCTZJcklretYLjL490n+23j8tCSHt66rlSQvT/KO8evnW9ezUDO9H32SFwHnAD8APC3JIcCvVdV/bltZG0mOBX4M2H2urare1K6ina+qHkzy/SR7VNXdretZBt4LfB84EngTcA/wF8DzWxbVQpK3MOzldd7YdGqSF1bV7zcsa0FmOuiBdwI/w7igq6r+LslL25bURpKzgCcAL2P4z+8E4G+bFtXOd4Crk1wM3DvXWFWntiupmRdU1WFJrgCoqjvHhZCz6FjgeVX1fYAk64ArAIN+uauqm/LQ/TcebFVLYy+qqucmuaqq3pjkTOD/ti6qkY+PX4L7x63HCyDJKoYe/qzaE/jH8fYeLQt5NGY96G8ah28qyWOB04BZvdbZd8fv9yV5CvAtYL+G9TRTVeuSPB54WlUth832Wno38AngXyV5M8MnvT9oW1IzbwGuGM/hBHgp8HttS1qYmV4wlWQf4F3ATzL8w/0VcFpVfatpYQ0k+UPgPcBRDBeOKeCcqvrDpoU1MJ5kewewW1UdmOR5wJuq6uWNS2siyTMZfi8CrK+qWe0MkWQ/tpyf+Nuquq1lPQs100Gv7UvyOGD3WT0ZmWQjw8nHS6vq0LHtq1X17LaV7XxJ9t5O8z1Vdf9OL6axJOur6qgdtS1HMzl0k+Q9jGOO2zOjJ93mZiGtZvy9SEJVfahpUW3cX1V3b3XuZlbHpS9n2Hb8ToYe/Z7AbUluB/5jVW1sWdzOkGR3hokK+yTZi+F9AHgSK+SiSjMZ9MCG1gUsN0n+HPhh4Eq2nJAuYBaD/mtJXgXskuQg4FTgi41rauVi4IKq+gxAkp8G/h3wQYaply9oWNvO8mvAbwFPATayJei/DfxJq6IeDYdugCRPAqqq7mldSytJrgUOLn8h5hbQ/Vfgpxn+qD8D/Peq+l7TwhpIcnVVPWertqvGGVpXVtXzWtW2syX5zap6T+s6JjHTQZ9kDUPP5AcZ/qDvAn5lFj6Obi3Jx4BTq+rW1rUsF3YAIMlfAeuBj4xNvwD8FHA08JWqOqxVbS0keTZwMA9dVLjsP/XOetBfBZxSVZ8fj18CvLeqntu2sp1vnDL2PIZFUv801z6LM02SPB/4AEMHAOBuZrcDsA9wBvCSselvgDcyvCdPq6pNrWrb2ZKcAfwEQ9BfBBwDfKGqTmhZ10LMetBfMTerYl7b5bPWSwFI8m+2115Vf72za2nNDoC2J8nVwCHAFVV1SJJ9gf9dVT/VuLQdmtWTsXP+Osn7gQ8znHj8BeDSJIcBVNXlLYvbmWYx0B/Bg3MhD1BVX0jyQMuCWknyI8DvMG82FkBVHdmqpoa+W1XfT/LAOKx3B8OMpGVv1oP+kPH7GVu1H8oQ/DPzy5zkCIYFU88CdmO43u+9VfWkpoXtRHP/wfMwHYBWdTX2MeAshv2PZnV7kDkbkuwJ/BnD7JvvAF9qW9LCzPTQjbZIsgF4JcMf9hrg1cCPVNWKWOK9GHawPXHNYi82ycaq+vHWdSw3SVYDT6qqqxqXsiAzHfRJnsyWE00FfIFhqfssboGwoarWzE2dG9u2OYeh2ZLkDQxDFJ/goSfp//HhntMrV8auXB8BPsewAATgl4CPMux9M2vuG7efvTLJ24FbmdEL04wfz1/NtuPSs7hieu34/XfntRXwQw1qaaKHlbGz3qPfZv+S7S0QmQVJng7czjA+/1qGLVjfO0vT5+Yk+SLwZeBq5m19UFXrmhWlZpKcxpaVsf9v3l3fBv6sqpb96thZD/o/Ypg3fv7YdAJweFX9Truq1NqsTrF9OCt1kdBic2XsCpXkHuCJbJlNsAtbrihUMzbj5MXAG4Cn89Dhipn5iD4nyWsZZlR8CselV+wiocU2Dm3+OsM+9DDMxHr/StjJc6aDHv5lG9aDeGhvZebmlCe5jmHIZiPzptHN6InpU4A3M2yJMfcHUjP6n96KXSS02JKcAzwWmBvC+2WGNRe/2q6qhZnpk7FJfpXhqlIHMOzaeATDLoXL/iz6Eri7qmb10oFbex3wjKr6ZutCloEVu0hoCTy/qg6Zd/zZJH/XrJpHYSZnVcxzGsPVYm6sqpcxLJSayYttAJck+Z9JXpjksLmv1kU1sgm4r3URy8TWi4QuZ4UsEloCDyb54bmDJD/ECllENtM9euB7VfW9JCR5XFVdl+RHWxfVyNy+4mvmtc3U6uB57mWYZnoJDx2jn6nplRmuvPKWqroLOCvJp1lBi4SWwO8ydIiuH49XA69pV87CzXrQ3zz2Vv4SuDjJncCNjWtqYvxEo8Ffjl8zraoqyUXAc8bjG9pW1NzfAO9nGNq9i+E6BSvi083Mn4ydM+7euAfw6ar659b1tJDkWODHeOiJ6Te1q0itJVkH/ElVfaV1La0lOZ9h7vx5Y9OrgD2r6hXtqloYg14AJDmLYfXfyxg2sDqB4Sr3JzUtrIEkX2c71xSe0Vk31zHMSruBYUgrDJ39mduyOck1VXXwjtqWo1kfutEWLxovD3dVVb0xyZnArM7CmX+eYnfgFcDejWpp7WeAvYB/PR5/jmHYYhZdnuSIqvoyQJIXsEKuPz3rs260xXfH7/cleQpwP7Bfw3qaqapvzfu6par+GDi2dV2NHA/8ObAPsGq8PXNXHRv9OPDFJDckuYFhfP75Sa4eL1azbNmj15xPjSem384wjQ6GIZyZs9W00scw9PBn9W/lJOCIqroXIMnbGAJuRW4FMKWjWxcwqVn95dW23gH8J4aP6F8CPg+8r2lF7Zw57/YDwNeBExvV0lp46FzxB9mye+NMqaoVOyPPoNecdcA9wLvH41cBH2IGA86ppg/xQeCyJJ8Yj48Hzm1YjybgrBsBK3tGwWJL8j+At48LhRj3IH9dVf1B28raGIeyXjIefr6qrmhZjx49T8ZqzuXjdWOBlTWjYAkcMxfyAFV1J/CzDetpqqour6p3j1+G/Ark0M2MG3cnLIZd+b6Y5Bvj8dOB61rW1tAu45YY/wSQ5PHA4xrXJE3MoNfPtS5gGToPWJ/kg+Pxa9iyNa204jhGL21HkmPYsl31xVX1mZb1SNMw6CWpcw7dSFsZLzE51wPajeH8xb2zdGlJ9cWgl7ZSVT84d3vck/04hquPSSuSQzfSAiS5oqoObV2HNAl79NJWkvzbeYdze918r1E50tQMemlbPz/v9gMMe7HP6o6N6oBBL23rMcBpW22BcCbwK02rkibkFgjStp67nS0QHJ/XimXQS9t6zNiLByDJ3vjpVyuYv7zSts4EvpTkY+PxK4A3N6xHmorTK6XtSHIwcOR4+NmquqZlPdI0DHpJ6pxj9JLUOYNekjpn0EtS5wx6SeqcQS9Jnfv/O72mfsjQ/YkAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "stock.sort_index().plot(kind='bar')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `Series` object has a number of numerical methods available, including `mean` and `sum`:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "137" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stock.sum()" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "27.4" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stock.mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It also behaves like a sequence in that the `len` function returns the number of data points in the Series object:" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(stock)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### memory usage" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For larger data sets, it might be important to know how many bytes storing the Series costs. The bytes required to store the actual series data are available as" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "40" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stock.nbytes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "or from the underlying numpy array directly:" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "40" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stock.values.nbytes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is 40 bytes, because we have 5 elements stored as int64 (each needing 8 bytes):" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('int64')" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stock.dtype" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The Series object needs additional memory. This can be queried using:" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "240" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stock.memory_usage()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Statistics" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A number of statistical descriptors of the data in the `stock` Series object is available using `describe()`:" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "count 5.000000\n", "mean 27.400000\n", "std 41.955929\n", "min 1.000000\n", "25% 3.000000\n", "50% 10.000000\n", "75% 22.000000\n", "max 101.000000\n", "dtype: float64" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stock.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As usual, the documentation strings provide documentation (`help(stock.describe)`), and the pandas home page (`https://pandas.pydata.org`) provides links to the Pandas documentation." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## Create Series from list\n", "\n", "In the example above, we showed how to create a Series from a dictionary where the keys of the dictionary entries served as the index for the Series object. \n", "\n", "We can also create a Series from a list, an provide an additional index:" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "collapsed": true }, "outputs": [], "source": [ "stock = pd.Series([10, 3, 22], index=['apple', 'orange', 'banana'])" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "apple 10\n", "orange 3\n", "banana 22\n", "dtype: int64" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stock" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we omit the `index` argument, the Series will assume an integer index:" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "collapsed": true }, "outputs": [], "source": [ "stock = pd.Series([10, 3, 22])" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 10\n", "1 3\n", "2 22\n", "dtype: int64" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stock" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "However, an index can be added subsequently:" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "collapsed": true }, "outputs": [], "source": [ "stock.index = ['apple', 'orange', 'banana']" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "apple 10\n", "orange 3\n", "banana 22\n", "dtype: int64" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stock\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Plotting data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Commonly used plots are easily accessible via the `plot()` method of the Series object. We have seen a bar plot above already. The `Series.plot()` method accepts an argument `kind` such as `kind=\"bar\"`, but there is an equivalent method `Series.plot.bar()` avaialble.\n", "\n", "Further examples:" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "stock.plot.pie()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To tailor the plot, we can either get the axis object and modify it subsequently:" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "ax = stock.plot.pie()\n", "ax.set_aspect(1)\n", "ax.set_ylabel(None);\n", "ax.set_title(None);" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAhsAAADtCAYAAAAWc7eQAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAEhxJREFUeJzt3XuQJWV9xvHvkwWCiHJxJwgIbFS0RLlolouKBsULiAKpGBGjYooEEzFqFJWyLEUqMSheAcECvCAakERRQFCQkACRqAtyFS0tXQTCZUEgKyDXX/44vWFYZthh5rzTe2a+n6qpc/o93aef2ToFz3S/3SdVhSRJUit/0HcASZI0t1k2JElSU5YNSZLUlGVDkiQ1ZdmQJElNWTYkSVJTlg1Jc1aStyS5sO8c0nxn2ZBGVJI3JFmS5HdJbkhyVpKd+861QpKlSV4229tKWv1YNqQRlOTdwGeAjwIbAZsDRwN7TeO91pjKmCRNl2VDGjFJ1gMOBQ6sqm9W1Z1VdV9VnV5V7+3W+XKSfxy3zS5Jrhu3vDTJ+5NcDtyZZI1JxjZJ8o0ky5L8Osk7xr3HIUlOSfKVJMuTXJVkcffaiQwK0OndkZf3TfB7LExyRpLbk/w2yQVJ/mCybZPs2e3j9iT/keRZ495rsyTf7HLemuSoSf7tDk9yYfdvKGmWWDak0fN8YG3g1Bm+z77AHsD6VXX/ymPAg8DpwGXApsCuwLuSvHLce+wJnNytfxpwFEBVvQn4DfCaqlq3qj4+wf7fA1wHjDE4OvOBwaaP3DbJM4CTgHd165/JoIyslWQBcAZwDbCoy3ry+B11JeY4YBvgFVV1xzT+vSRNk2VDGj1PAm4ZVxCm64iquraq7p5kbHtgrKoOrap7q+pXwHHA68etf2FVnVlVDwAnAts+hv3fB2wMbNEdmbmgJv+ypn2A71TVOVV1H/AJ4HHAC4AdgE2A93ZHeX5fVeMnha7JoKhsyKDA3PUYMkoaAs/LSqPnVmBhkjVmWDiuXcXYFsAmSW4fN7YAuGDc8o3jnt8FrP0Ych0OHAKcnQTg2Ko6bJJ1N2Fw5AKAqnowybUMjmLcB1zzKPt8OoMStENV3TuFXJKGzCMb0ui5CLgH2PtR1rkTWGfc8pMnWGeiowjjx64Ffl1V64/7eUJVvWqKOR/1K6WranlVvaeqnsrgdMy7k+w6ybb/w6D8AJBBO9kMuL7LufmjTGq9Gvgr4Kwkz5xidklDZNmQRkw33+BDwOeS7J1knSRrJtk9yYq5EZcCr0qyYZInM5jr8Fj9CFjeTRp9XJIFSZ6TZPspbn8T8NTJXkzy6iRP74rDHcADDOaJTLTtKcAeSXZNsiaD+R73AD/oct4AHJbk8UnWTvLC8fuqqpMYzAn5fpKnTTG/pCGxbEgjqKo+Cbwb+CCwjMFf928HvtWtciKDiZ1LgbOBr09jHw8Arwa2A34N3AIcD0z1So5/Bj7YXT1y0ASvbwl8H/gdg6M1R1fVeRNtW1U/B94IHNnleA2D+Rf3djlfw+B0yW8YTDrdZ4Lf5wQGV/H8e5JFU/wdJA1BJp+PJUmSNHMe2ZAkSU1ZNiRJUlOWDUmS1JRlQ5IkNTWrN/VauHBhLVq0aDZ3KUmSGrn44otvqaqxVa03q2Vj0aJFLFmyZDZ3KUmSGklyzarX8jSKJElqzLIhSZKasmxIkqSmLBuSJKkpy4YkSWrKsiFJkpqa1UtfJUlz19YnbN13hDnjiv2u6DvCUHlkQ5IkNWXZkCRJTVk2JElSU5YNSZLUlGVDkiQ1ZdmQJElNWTYkSVJTlg1JktSUZUOSJDVl2ZAkSU2tsmwk2SzJeUl+muSqJO/sxjdMck6SX3SPG7SPK0mSRs1UjmzcD7ynqrYCdgIOTLIVcDBwblVtCZzbLUuSJD3MKstGVd1QVZd0z5cDVwObAnsBJ3SrnQDs3SqkJEkaXY9pzkaSRcBzgR8CG1XVDd1LNwIbTbLNAUmWJFmybNmyGUSVJEmjaMplI8m6wDeAd1XV/45/raoKqIm2q6pjq2pxVS0eGxubUVhJkjR6plQ2kqzJoGh8raq+2Q3flGTj7vWNgZvbRJQkSaNsKlejBPgCcHVVfWrcS6cB+3XP9wO+Pfx4kiRp1K0xhXVeCLwJuCLJpd3YB4DDgFOS7A9cA7yuTURJkjTKVlk2qupCIJO8vOtw40iSpLnGO4hKkqSmLBuSJKkpy4YkSWrKsiFJkpqybEiSpKYsG5IkqSnLhiRJasqyIUmSmrJsSJKkpiwbkiSpKcuGJElqyrIhSZKasmxIkqSmLBuSJKkpy4YkSWrKsiFJkpqybEiSpKYsG5IkqSnLhiRJasqyIUmSmrJsSJKkpiwbkiSpKcuGJElqyrIhSZKasmxIkqSmLBuSJKmpVZaNJF9McnOSK8eNHZLk+iSXdj+vahtTkiSNqqkc2fgysNsE45+uqu26nzOHG0uSJM0VqywbVXU+8NtZyCJJkuagmczZeHuSy7vTLBsMLZEkSZpTpls2jgGeBmwH3AB8crIVkxyQZEmSJcuWLZvm7iRJ0qiaVtmoqpuq6oGqehA4DtjhUdY9tqoWV9XisbGx6eaUJEkjalplI8nG4xb/DLhysnUlSdL8tsaqVkhyErALsDDJdcCHgV2SbAcUsBR4a8OMkiRphK2ybFTVvhMMf6FBFkmSNAd5B1FJktSUZUOSJDVl2ZAkSU1ZNiRJUlOWDUmS1JRlQ5IkNWXZkCRJTVk2JElSU5YNSZLUlGVDkiQ1ZdmQJElNWTYkSVJTlg1JktSUZUOSJDVl2ZAkSU1ZNiRJUlOWDUmS1JRlQ5IkNWXZkCRJTVk2JElSU5YNSZLUlGVDkiQ1ZdmQJElNWTYkSVJTlg1JktSUZUOSJDVl2ZAkSU2tsmwk+WKSm5NcOW5swyTnJPlF97hB25iSJGlUTeXIxpeB3VYaOxg4t6q2BM7tliVJkh5hlWWjqs4HfrvS8F7ACd3zE4C9h5xLkiTNEWtMc7uNquqG7vmNwEaTrZjkAOAAgM0333yau5tdiw7+Tt8R5pSlh+3RdwRJUo9mPEG0qgqoR3n92KpaXFWLx8bGZro7SZI0YqZbNm5KsjFA93jz8CJJkqS5ZLpl4zRgv+75fsC3hxNHkiTNNVO59PUk4CLgmUmuS7I/cBjw8iS/AF7WLUuSJD3CKieIVtW+k7y065CzSJKkOcg7iEqSpKYsG5IkqSnLhiRJasqyIUmSmrJsSJKkpiwbkiSpKcuGJElqyrIhSZKasmxIkqSmLBuSJKkpy4YkSWrKsiFJkpqybEiSpKYsG5IkqSnLhiRJasqyIUmSmrJsSJKkpiwbkiSpKcuGJElqyrIhSZKasmxIkqSmLBuSJKkpy4YkSWrKsiFJkpqybEiSpKYsG5Ikqak1ZrJxkqXAcuAB4P6qWjyMUJIkae6YUdnovKSqbhnC+0iSpDnI0yiSJKmpmZaNAs5OcnGSAyZaIckBSZYkWbJs2bIZ7k6SJI2amZaNnavqecDuwIFJXrzyClV1bFUtrqrFY2NjM9ydJEkaNTMqG1V1ffd4M3AqsMMwQkmSpLlj2mUjyeOTPGHFc+AVwJXDCiZJkuaGmVyNshFwapIV7/MvVfXdoaSSJElzxrTLRlX9Cth2iFkkSdIc5KWvkiSpKcuGJElqyrIhSZKasmxIkqSmLBuSJKkpy4YkSWrKsiFJkpqybEiSpKZmcgdRSbPtkPX6TjC3HHJH3wmkecEjG5IkqSnLhiRJasqyIUmSmrJsSJKkpiwbkiSpKcuGJElqyrIhSZKasmxIkqSmLBuSJKkpy4YkSWrKsiFJkpqybEiSpKYsG5IkqSnLhiRJasqyIUmSmrJsSJKkpiwbkiSpqRmVjSS7Jfl5kl8mOXhYoSRJ0twx7bKRZAHwOWB3YCtg3yRbDSuYJEmaG2ZyZGMH4JdV9auquhc4GdhrOLEkSdJcscYMtt0UuHbc8nXAjiuvlOQA4IBu8XdJfj6DferhFgK39B1iVfKxvhOoByPx2eQj6TuBZt9IfDbzlpH5bG4xlZVmUjampKqOBY5tvZ/5KMmSqlrcdw5pZX42tbrys9mPmZxGuR7YbNzyU7oxSZKk/zeTsvFjYMskf5xkLeD1wGnDiSVJkuaKaZ9Gqar7k7wd+B6wAPhiVV01tGSaCk9PaXXlZ1OrKz+bPUhV9Z1BkiTNYd5BVJIkNWXZkCRJTVk2JElSU5YNSZLUlGVjBCVZp+8M0soy8MYkH+qWN0+yQ9+5JPXPq1FGSJIXAMcD61bV5km2Bd5aVW/rOZpEkmOAB4GXVtWzkmwAnF1V2/ccTSLJHsCzgbVXjFXVof0lml88sjFaPg28ErgVoKouA17cayLpITtW1YHA7wGq6jZgrX4jSZDk88A+wN8DAf6CKX6nh4bDsjFiquralYYe6CWI9Ej3JVkAFECSMQZHOqS+vaCq3gzcVlUfAZ4PPKPnTPOKZWO0XNudSqkkayY5CLi671BS5wjgVOCPkvwTcCHw0X4jSQDc3T3elWQT4D5g4x7zzDvNv/VVQ/W3wGeBTRl86d3ZwIG9JpI6VfW1JBcDuzI4VL13VVmGtTo4I8n6wOHAJQyOvh3fb6T5xQmikoYiyYYTDC+vqvtmPYw0iSR/CKxdVXf0nWU+sWyMgCRH0p0Hn0hVvWMW40gTSrIU2Ay4jcGRjfWBG4GbgL+pqov7S6f5rjsFvYhxR/Sr6iu9BZpnPI0yGpb0HUCagnOAf6uq7wEkeQXw58CXgKOBHXvMpnksyYnA04BLeWhSfQGWjVnikY0RlOSJQFXV8r6zSCskuaKqtl5p7PKq2ibJpVW1XV/ZNL8luRrYqvwfXm+8GmWEJFmc5ArgcuDKJJcl+ZO+c0mdG5K8P8kW3c/7gJu6y2G9BFZ9uhJ4ct8h5jOPbIyQJJcDB1bVBd3yzsDRVbVNv8kkSLIQ+DCwczf0X8BHgDuAzavql31l0/yW5DxgO+BHwD0rxqtqz95CzTOWjRGS5CdV9dyVxi6pquf1lUmSVndJ/nSi8ar6z9nOMl9ZNkZIks8AjwNOYjC5aR8Gt4b+KkBVXdJfOs13SZ4BHMQjZ/y/tK9MklYPlo0R0h0KnEz5H3X1KcllwOeBixl3G30veVXfkuwEHAk8i8H39SwA7qyqJ/YabB7x0tcRUlUv6TuD9Cjur6pj+g4hTeAo4PXAvwKLgTfjd6PMKq9GGSFJnpTkiCSXJLk4yWeTPKnvXFLn9CRvS7Jxkg1X/PQdSgLoJigvqKoHqupLwG59Z5pPPLIxWk4GzmdwoySAvwS+Dryst0TSQ/brHt87bqyAp/aQRRrvriRrAZcm+ThwA/6xPaucszFCklxZVc9ZaewRN1KSJD0kyRYMbpu/FvAPwHoMbhvg5dizxLIxQpJ8isF14qd0Q68Fdqiqg/pLJT0kyXOArYC1V4z5/ROSLBsjJMly4PE8NNN/AXBn97ycWa0+JfkwsAuDsnEmsDtwYVW9ts9cUpIXAocAW/Dwy7I9xTdLLBsjpptwtyUP/8vRG9Ood92t9LcFflJV2ybZCPhqVb2852ia55L8jMHpk5Uvy761t1DzjBNER0iSvwbeCTyFwbcX7gT8ANi1z1xS5+6qejDJ/d2XBd7M4Cvnpb7dUVVn9R1iPnM27mh5J7A9cE13z43nMvjeCWl1sCTJ+sBxDP6CvAS4qN9IEgDnJTk8yfOTPG/FT9+h5hNPo4yQJD+uqu2TXArsWFX3JLmqqp7ddzbNb0kCPKWqru2WFwFPrKrL+8wlwaR3X/auy7PI0yij5bruL8dvAeckuQ24pudMElVVSc4Etu6Wl/abSHqId1/un0c2RlT3LYbrAd+tqnv7ziMlOQE4qqp+3HcWaWVJ9gCezcMn1x/aX6L5xbIhaSi6Gf9bAksZXJIdBgc9tukzl5Tk88A6wEuA4xnco+hHVbV/r8HmEcuGpKHo7tK4AfCibuh84Paq8lSfepXk8qraZtzjusBZVfWiVW6sofBqFEnDsjdwIrAQGOue79lrImng7u7xriSbAPcBG/eYZ95xgqikYdkf2Kmq7gRI8jEGl74e2WsqCc7oJtd/nMFl2TA4naJZYtmQNCxh3N0Zu+fpKYs03ieAv2Nwiu8i4ALgmF4TzTOWDUnD8iXgh0lO7Zb3Br7QYx5phROA5cAR3fIbgK8Ar+st0TzjBFFJQ9PdlXHnbvGCqvpJn3kkgCQ/raqtVjWmdjyyIWloquoSBrcpl1YnlyTZqar+GyDJjsCSnjPNK5YNSdKc1H0TcQFrAj9I8ptueQvgZ31mm288jSJJmpO6e79MynvAzB7LhiRJasqbekmSpKYsG5IkqSnLhiRJasqyIUmSmvo//ZDMeR4Qc1IAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "fig, ax = plt.subplots(1, 1, figsize=(9, 3))\n", "stock.plot.bar(ax=ax)\n", "ax.set_title(\"Current stock\");" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also fetch the data from the series and drive the plotting \"manually\" ourselves:" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAhsAAADSCAYAAADjXwLoAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAADqZJREFUeJzt3X+MZWV9x/H3R1eqgCB0xw0FllGLP9AoyoBSoGIAA6Wp2lqRtghq3dIilKpp1tYUSGOCTSy1VWgXJGBQWutPqkYhWMoPf8BCKS4sCMXdAAF2tyAutP5av/3jnsXrMLszzNxnZu7l/Uomc85zn3ue780+OfuZ55x7b6oKSZKkVp620AVIkqTRZtiQJElNGTYkSVJThg1JktSUYUOSJDVl2JAkSU0ZNiQtmCTrkhy50HVIasuwIWlKSQ5N8o0kjyR5KMl1SQ5MclKSaxe6PknDY8lCFyBp8UmyC/Al4I+BTwM7AIcBP1rIuiQNJ1c2JE3lhQBVdWlVbamq/6uqy4GfAP8IHJzk0STfB0iya5JPJNmYZH2SDyR5/PyS5F1J1ibZnOS2JK+aPGCSlyT5XpLj5+k1Sponhg1JU/kusCXJxUmOSbIbQFWtBU4GvllVO1fVc7r+/wDsCjwfeC3wNuDtAEl+Fziza9sF+C3gf/oH68LH14BTq+rSxq9N0jwzbEh6gqr6AXAoUMD5wMYklyVZNrlvkqcDbwXeX1Wbq2od8GHghK7LHwJ/U1U3VM9dVbW+7xCHAZcBb6uqL7V7VZIWimFD0pSqam1VnVRVewEvA34F+Lspui4FngH0B4j1wJ7d9t7Af29nqJOBb1TVVXMuWtKiZNiQNK2quh24iF7omPxV0Zvo3cuxT1/bcuC+bvse4AXbOfzJwPIk5wykWEmLjmFD0hMkeXGS9ybZq9vfGzge+BbwILBXkh0AqmoLvXesfDDJs5PsA7wHuKQ73AXA+5IckJ5f7fpstRk4Gvj1JGfPywuUNK8MG5Kmshl4NfDtJI/RCxlrgPcCXwduBR5IsqnrfyrwGHA3cC3wKeBCgKr6V+CDXdtm4AvA7v2DVdX3gaOAY5L8ddNXJmnepWryiqgkSdLguLIhSZKaMmxIkqSmDBuSJKkpw4YkSWrKsCFJkpqa1299Xbp0aY2Pj8/nkJIkqZEbb7xxU1WNTddvXsPG+Pg4q1evns8hJUlSI0nWT9/LyyiSJKkxw4YkSWrKsCFJkpoybEiSpKYMG5Ikqal5fTeKJGn+jK/88kKXoEVg3dnHLnQJrmxIkqS2DBuSJKkpw4YkSWrKsCFJkpoybEiSpKYMG5IkqSnDhiRJasqwIUmSmjJsSJKkpgwbkiSpqWnDRpK9k/x7ktuS3JrkT7v23ZNckeTO7vdu7cuVJEnDZiYrGz8F3ltV+wGvAU5Jsh+wEriyqvYFruz2JUmSfsG0YaOq7q+qm7rtzcBaYE/gDcDFXbeLgTe2KlKSJA2vJ3XPRpJx4JXAt4FlVXV/99ADwLKBViZJkkbCjMNGkp2BzwKnV9UP+h+rqgJqG89bkWR1ktUbN26cU7GSJGn4zChsJHkGvaDxyar6XNf8YJI9usf3ADZM9dyqWlVVE1U1MTY2NoiaJUnSEJnJu1ECfBxYW1V/2/fQZcCJ3faJwBcHX54kSRp2S2bQ5xDgBOA7SW7u2v4COBv4dJJ3AuuBt7QpUZIkDbNpw0ZVXQtkGw8fMdhyJEnSqPETRCVJUlOGDUmS1JRhQ5IkNWXYkCRJTRk2JElSU4YNSZLUlGFDkiQ1ZdiQJElNGTYkSVJThg1JktSUYUOSJDVl2JAkSU0ZNiRJUlOGDUmS1JRhQ5IkNWXYkCRJTRk2JElSU4YNSZLUlGFDkiQ1ZdiQJElNGTYkSVJThg1JktSUYUOSJDVl2JAkSU0ZNiRJUlOGDUmS1NS0YSPJhUk2JFnT13ZmkvuS3Nz9/EbbMiVJ0rCaycrGRcDRU7SfU1X7dz9fGWxZkiRpVEwbNqrqauCheahFkiSNoLncs/HuJLd0l1l2G1hFkiRppMw2bJwHvADYH7gf+PC2OiZZkWR1ktUbN26c5XCSJGlYzSpsVNWDVbWlqn4GnA8ctJ2+q6pqoqomxsbGZlunJEkaUrMKG0n26Nt9E7BmW30lSdJT25LpOiS5FDgcWJrkXuAM4PAk+wMFrAP+qGGNkiRpiE0bNqrq+CmaP96gFkmSNIL8BFFJktSUYUOSJDVl2JAkSU0ZNiRJUlOGDUmS1JRhQ5IkNWXYkCRJTRk2JElSU4YNSZLUlGFDkiQ1ZdiQJElNGTYkSVJThg1JktSUYUOSJDVl2JAkSU0ZNiRJUlOGDUmS1JRhQ5IkNWXYkCRJTRk2JElSU4YNSZLUlGFDkiQ1ZdiQJElNGTYkSVJThg1JktSUYUOSJDVl2JAkSU1NGzaSXJhkQ5I1fW27J7kiyZ3d793alilJkobVTFY2LgKOntS2EriyqvYFruz2JUmSnmDasFFVVwMPTWp+A3Bxt30x8MYB1yVJkkbEklk+b1lV3d9tPwAs21bHJCuAFQDLly+f5XDTG1/55WbH1vBYd/axC12CJGmSOd8gWlUF1HYeX1VVE1U1MTY2NtfhJEnSkJlt2HgwyR4A3e8NgytJkiSNktmGjcuAE7vtE4EvDqYcSZI0amby1tdLgW8CL0pyb5J3AmcDRyW5Eziy25ckSXqCaW8Qrarjt/HQEQOuRZIkjSA/QVSSJDVl2JAkSU0ZNiRJUlOGDUmS1JRhQ5IkNWXYkCRJTRk2JElSU4YNSZLUlGFDkiQ1ZdiQJElNGTYkSVJThg1JktSUYUOSJDVl2JAkSU0ZNiRJUlOGDUmS1JRhQ5IkNWXYkCRJTRk2JElSU4YNSZLUlGFDkiQ1ZdiQJElNGTYkSVJThg1JktSUYUOSJDVl2JAkSU0tmcuTk6wDNgNbgJ9W1cQgipIkSaNjTmGj87qq2jSA40iSpBHkZRRJktTUXMNGAZcnuTHJikEUJEmSRstcL6McWlX3JXkucEWS26vq6v4OXQhZAbB8+fI5DidJkobNnFY2quq+7vcG4PPAQVP0WVVVE1U1MTY2NpfhJEnSEJp12EiyU5Jnb90GXg+sGVRhkiRpNMzlMsoy4PNJth7nU1X11YFUJUmSRsasw0ZV3Q28YoC1SJKkEeRbXyVJUlOGDUmS1JRhQ5IkNWXYkCRJTRk2JElSU4YNSZLUlGFDkiQ1ZdiQJElNGTYkSVJTc/3WV0mTjK/88kKXoEVg3dnHLnQJ0qLhyoYkSWrKsCFJkpoybEiSpKYMG5IkqSnDhiRJasqwIUmSmjJsSJKkpgwbkiSpKcOGJElqyrAhSZKaMmxIkqSmDBuSJKkpw4YkSWrKsCFJkpoybEiSpKYMG5Ikqak5hY0kRye5I8ldSVYOqihJkjQ6Zh02kjwd+BhwDLAfcHyS/QZVmCRJGg1zWdk4CLirqu6uqh8D/wy8YTBlSZKkUTGXsLEncE/f/r1dmyRJ0uOWtB4gyQpgRbf7aJI7Wo/5FLYU2LTQRSykfGihK1DHuehcXCyci23n4j4z6TSXsHEfsHff/l5d2y+oqlXAqjmMoxlKsrqqJha6Dsm5qMXCubg4zOUyyg3Avkmel2QH4K3AZYMpS5IkjYpZr2xU1U+TvBv4GvB04MKqunVglUmSpJEwp3s2quorwFcGVIvmzstVWiyci1osnIuLQKpqoWuQJEkjzI8rlyRJTRk2RlySk5J8dKHrkKQnI8l4kjULXYcGw7Ah6UlJj+cOSTPmCWMRS/KFJDcmubX7cDSSPJrknK7tyiRjXftVST6S5OYka5IcNMXxxpJ8NskN3c8h8/2aNBySvKebR2uSnN79lXlHkk8Aa4C9k5yXZHU3F8/qe+66JGcluSnJd5K8uGsfS3JF1/+CJOuTLO0e+4Mk13fz95+6716SliT5ZJK1ST6TZMckf9Wdv9YkWZUk8Pg58EPdPPpuksO69vEk13Tz8aYkv9a1H9495zNJbu/G2XqsKcfQ7Bk2Frd3VNUBwARwWpJfBnYCVlfVS4H/AM7o679jVe0P/Alw4RTH+whwTlUdCPwOcEHT6jWUkhwAvB14NfAa4F3AbsC+wLlV9dKqWg/8ZfdhSS8HXpvk5X2H2VRVrwLOA97XtZ0BfL2bu58BlnfjvQQ4Djikm79bgN9v/DI1HF5Eb869BPgBvXPbR6vqwKp6GfAs4Df7+i+pqoOA0/n5uXEDcFQ3H48D/r6v/yu7vvsBzwe2/gG2vTE0C80/rlxzclqSN3Xbe9M72f8M+Jeu7RLgc339LwWoqquT7JLkOZOOdySwX19I3yXJzlX1aJPqNawOBT5fVY8BJPkccBiwvqq+1dfvLd2K2xJgD3on7Fu6x7bOyxuB3+477psAquqrSR7u2o8ADgBu6Obms+j9ByHdU1XXdduXAKcB30vy58COwO7ArcC/dX365914t/0M4KNJtgbZF/Yd//qquhcgyc3dc64FXredMTQLho1FKsnh9MLBwVX1v0muAp45RdfaxvZU+08DXlNVPxxUnXpKeWzrRpLn0VuxOLCqHk5yEb84P3/U/d7C9OeZABdX1fsHWKtGw1TntHOBiaq6J8mZTD/v/gx4EHgFvXPgD6fo//hzkjxzmjE0C15GWbx2BR7ugsaL6S1nQ+/f7M3d9u/RS+FbHQeQ5FDgkap6ZNIxLwdO3brTJX1psmuAN3bXx3eitxpxzaQ+u9ALH48kWQYcM4PjXge8BSDJ6+ldmgG4Enhzkud2j+2eZEZf7qSRtzzJwd12//luU5Kd+fm5cHt2Be6vqp8BJ9D7xOvt2RosnswYmoYrG4vXV4GTk6wF7gC2Ll8/BhyU5AP0lpqP63vOD5P8J71lw3dMcczTgI8luYXev/3VwMmN6teQqqqbupWK67umC4CHJ/X5r26u3Q7cQy9ITOcs4NIkJwDfBB4ANlfVpm4+X969y+UnwCnA+kG8Hg21O4BTklwI3EbvHqDd6N2k/AC97+iazrnAZ5O8jd559bHtda6q7yc5/0mOoWn4CaJDJsmjVbXzFO1XAe+rqtXzX5U0vSS/BGzpvlfpYOC87oZQSSPOlQ1J82U58Olu9eLH9N7lIukpwJUNSZLUlDeISpKkpgwbkiSpKcOGJElqyrAhSZKaMmxIkqSmDBuSJKmp/wc2TlDCntmBUwAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "\n", "names = list(stock.index)\n", "values = list(stock.values)\n", "\n", "fig, ax = plt.subplots(1, 1, figsize=(9, 3))\n", "ax.bar(names, values)\n", "ax.set_title('Stock');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Missing values\n", "\n", "\"Real\" data sets tend to be incomplete. Dealing with missing values is an important topic in data science. The agreement in Pandas is that the special floating point value \"NaN\" (standing for `N`ot `a` `N`umber) represents missing data points. For example, if we have a table for the stock, but we don't know the value for `apple`, we would replace it with `NaN`. \n", "\n", "The special `Nan` value in Python can be created using `float('nan')` or using `numpy.nan` if the `numpy` module is imported." ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "collapsed": true }, "outputs": [], "source": [ "stock['apple'] = float('nan')" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "apple NaN\n", "orange 3.0\n", "banana 22.0\n", "dtype: float64" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stock" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that the `dtype` of the `stock` Series object has changed from `int64` to `float64` when we assigned `NaN` to `apple`: the whole series has been converted to float, because `NaN` is only defined for floating point numbers. \n", "\n", "(There is a proposal to create a `NaN` object as part of pandas - this would overcome the above limitation.)\n", "\n", "Assume we need to calculate how many items of stock we have in total using the `sum` function:" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([nan, 3., 22.])" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stock.values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A common situation is that we have an incomplete Series or DataFrame (which are multiple Series with the same index) and we want to process with our analysis, but treat the missing values in a special way." ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "25.0" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stock.sum()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The above example `sum` shows that `NaN` values are simply ignored, which can be convenient." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also 'tidy up' the Series object, by removing all entries that have a `NaN` value:" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "orange 3.0\n", "banana 22.0\n", "dtype: float64" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stock.dropna()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Series data access: explicit and implicit (`loc` and `iloc`)" ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "collapsed": true }, "outputs": [], "source": [ "stock = pd.Series({'apple': 10, \n", " 'orange': 3,\n", " 'banana': 22,\n", " 'cucumber' : 1,\n", " 'potato' : 110})" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "apple 10\n", "orange 3\n", "banana 22\n", "cucumber 1\n", "potato 110\n", "dtype: int64" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stock" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Indexing\n", "\n", "We can access single values through their index as if the stock Series object would be a dictionary:" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "22" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stock['banana']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There is an equivalent and recommended way of using this retrieval using the `loc` (for LOCation?) attribute:" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "22" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stock.loc['banana']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For convenience, pandas also (!) allows us to use integer indexing into the Series object. This is called *implicit* indexing as the series Object doesn't use integers as the index, but the name of the fruits.\n", "\n", "For example, we can also retrieve the value for `banana` through its implicit index 2, because it is in row 3 of the Series object (which would need index 2 as we start counting from 0):\n" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "22" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stock[2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this example, this works fine and seems convenient, but can become very confusing if the actual index of the object consists of integers. For that reason, the explicit (and recommended way) of using the indirect indexing is through the `iloc` (ImplicitLOCation) attribute:" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "22" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stock.iloc[2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Slicing" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "apple 10\n", "orange 3\n", "banana 22\n", "cucumber 1\n", "potato 110\n", "dtype: int64" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stock" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also slice the Series:" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "orange 3\n", "banana 22\n", "cucumber 1\n", "potato 110\n", "dtype: int64" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stock['orange':'potato']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or skip every second entry:\n", "\n" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "orange 3\n", "cucumber 1\n", "dtype: int64" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stock['orange':'potato':2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Data manipulation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Numerical operations on the series object can be carried for all data values at the same time inthe same way that numpy arrays are processed:" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "apple -19.2\n", "orange -26.2\n", "banana -7.2\n", "cucumber -28.2\n", "potato 80.8\n", "dtype: float64" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stock - stock.mean()" ] }, { "cell_type": "code", "execution_count": 57, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "apple 3.162278\n", "orange 1.732051\n", "banana 4.690416\n", "cucumber 1.000000\n", "potato 10.488088\n", "dtype: float64" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.sqrt(stock)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Where preferred, we can extract the numpy array and work with that:" ] }, { "cell_type": "code", "execution_count": 59, "metadata": { "collapsed": true }, "outputs": [], "source": [ "data = stock.values" ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "numpy.ndarray" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(data)" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([-19.2, -26.2, -7.2, -28.2, 80.8])" ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data - data.mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Import and Export\n", "\n", "Pandas (and its objects `Series` and `DataFrame`) support export to and import from a number of useful formats.\n", "\n", "For example, we can write a `Series` object into a comma separated value file:" ] }, { "cell_type": "code", "execution_count": 62, "metadata": { "collapsed": true }, "outputs": [], "source": [ "stock.to_csv('stock.csv', header=False)" ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "apple,10\n", "orange,3\n", "banana,22\n", "cucumber,1\n", "potato,110\n" ] } ], "source": [ "#NBVAL_IGNORE_OUTPUT\n", "!cat stock.csv" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also create a $\\LaTeX$ representation of the table:" ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'\\\\begin{tabular}{lr}\\n\\\\toprule\\n{} & 0 \\\\\\\\\\n\\\\midrule\\napple & 10 \\\\\\\\\\norange & 3 \\\\\\\\\\nbanana & 22 \\\\\\\\\\ncucumber & 1 \\\\\\\\\\npotato & 110 \\\\\\\\\\n\\\\bottomrule\\n\\\\end{tabular}\\n'" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stock.to_latex()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll come back to reading from files in the `DataFrame` section." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data Frame" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Stock Example - `DataFrame`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After having introduced the `Series` object above, we will focus on the second important type in pandas: the `DataFrame`.\n", "\n", "As a first description, we could say that the `DataFrame` is similar to a (2d) spreadsheet: it contains rows and columns.\n", "\n", "The series object we have studied above is a special case of the `DataFrame`, where the `DataFrame` has only one column.\n", "\n", "We'll continue with our stock example:" ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "apple 10\n", "orange 3\n", "banana 22\n", "cucumber 1\n", "potato 110\n", "dtype: int64" ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stock" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In addition to tracking how many objects of each type we have stocked, we have a second Series object that provides the price per item at which the item is sold:" ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "apple 0.55\n", "banana 0.50\n", "cucumber 0.99\n", "potato 0.17\n", "orange 1.76\n", "dtype: float64" ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "price = pd.Series({'apple': 0.55, 'banana': 0.50, 'cucumber' : 0.99, 'potato' : 0.17, 'orange': 1.76})\n", "price" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `DataFrame` object allows us to treat the two series together. In fact, a convenient way to create the `DataFrame` object is to combine a number of series as follows:" ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
stockprice
apple100.55
banana220.50
cucumber10.99
orange31.76
potato1100.17
\n", "
" ], "text/plain": [ " stock price\n", "apple 10 0.55\n", "banana 22 0.50\n", "cucumber 1 0.99\n", "orange 3 1.76\n", "potato 110 0.17" ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "shop = pd.DataFrame({'stock' : stock, 'price' : price})\n", "shop" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Because both `Series` objects had the same `index` elements, our data is nicely aligned in the `DataFrame` with name `shop`, even though the data was stored in different order in the `price` and `stock`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If one Series is missing a data point, pandas will insert a `NaN` entry into that field:" ] }, { "cell_type": "code", "execution_count": 68, "metadata": { "collapsed": true }, "outputs": [], "source": [ "price2 = price.copy()" ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "apple 0.55\n", "banana 0.50\n", "cucumber 0.99\n", "potato 0.17\n", "orange 1.76\n", "grapefruit 1.99\n", "dtype: float64" ] }, "execution_count": 69, "metadata": {}, "output_type": "execute_result" } ], "source": [ "price2['grapefruit'] = 1.99\n", "price2" ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
stockprice
apple10.00.55
banana22.00.50
cucumber1.00.99
grapefruitNaN1.99
orange3.01.76
potato110.00.17
\n", "
" ], "text/plain": [ " stock price\n", "apple 10.0 0.55\n", "banana 22.0 0.50\n", "cucumber 1.0 0.99\n", "grapefruit NaN 1.99\n", "orange 3.0 1.76\n", "potato 110.0 0.17" ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.DataFrame({'stock' : stock, 'price' : price2})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Accessing data in a DataFramea\n" ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
stockprice
apple100.55
banana220.50
cucumber10.99
orange31.76
potato1100.17
\n", "
" ], "text/plain": [ " stock price\n", "apple 10 0.55\n", "banana 22 0.50\n", "cucumber 1 0.99\n", "orange 3 1.76\n", "potato 110 0.17" ] }, "execution_count": 71, "metadata": {}, "output_type": "execute_result" } ], "source": [ "shop" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The data frame has an *index* which is the same for all columns, and shown in bold in the left most column. We can also ask for it:" ] }, { "cell_type": "code", "execution_count": 72, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['apple', 'banana', 'cucumber', 'orange', 'potato'], dtype='object')" ] }, "execution_count": 72, "metadata": {}, "output_type": "execute_result" } ], "source": [ "shop.index" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Each column has name (here `stock` and `price`):" ] }, { "cell_type": "code", "execution_count": 73, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['stock', 'price'], dtype='object')" ] }, "execution_count": 73, "metadata": {}, "output_type": "execute_result" } ], "source": [ "shop.columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Extracting columns of data\n", "\n", "Using the column names, we can extract one column into a Series object using the index operator (`[]`):" ] }, { "cell_type": "code", "execution_count": 74, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "apple 10\n", "banana 22\n", "cucumber 1\n", "orange 3\n", "potato 110\n", "Name: stock, dtype: int64" ] }, "execution_count": 74, "metadata": {}, "output_type": "execute_result" } ], "source": [ "shop['stock']" ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "apple 0.55\n", "banana 0.50\n", "cucumber 0.99\n", "orange 1.76\n", "potato 0.17\n", "Name: price, dtype: float64" ] }, "execution_count": 75, "metadata": {}, "output_type": "execute_result" } ], "source": [ "shop['price']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### Extracting rows of data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have two options of extracting a row of data. \n", "\n", "First, explicit indexing using the label of the index in that row:" ] }, { "cell_type": "code", "execution_count": 76, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "stock 10.00\n", "price 0.55\n", "Name: apple, dtype: float64" ] }, "execution_count": 76, "metadata": {}, "output_type": "execute_result" } ], "source": [ "shop.loc['apple'] # single row is returned as series" ] }, { "cell_type": "code", "execution_count": 77, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
stockprice
banana220.50
cucumber10.99
\n", "
" ], "text/plain": [ " stock price\n", "banana 22 0.50\n", "cucumber 1 0.99" ] }, "execution_count": 77, "metadata": {}, "output_type": "execute_result" } ], "source": [ "shop.loc['banana':'cucumber'] # multiple rows are returned as DataFrame" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Second, we can use the implicit indexing (as for Series objects):" ] }, { "cell_type": "code", "execution_count": 78, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "stock 10.00\n", "price 0.55\n", "Name: apple, dtype: float64" ] }, "execution_count": 78, "metadata": {}, "output_type": "execute_result" } ], "source": [ "shop.iloc[0]" ] }, { "cell_type": "code", "execution_count": 79, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
stockprice
banana220.50
cucumber10.99
\n", "
" ], "text/plain": [ " stock price\n", "banana 22 0.50\n", "cucumber 1 0.99" ] }, "execution_count": 79, "metadata": {}, "output_type": "execute_result" } ], "source": [ "shop.iloc[1:3]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Warning\n", "\n", "Note that there are some inconsistencies here: the explicit slicing with index labels (such as `.loc['banana':'cucumber']`) is inclusive of `cucumber`, whereas in the implicit slicing (such as `.iloc[1:3]`) the row with index `3` is *not* included.\n", "\n", "The behaviour of `.loc` is convenient and a good design choice if labels such as strings in our `stock` example are used. The behaviour of `.iloc` is reflecting the normal Python behaviour.\n", "\n", "It is thus understandable how we have arrived at the situation.\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Data manipulation with `shop`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The real strength of the DataFrames is that we can continue to process the data conveniently. \n", "\n", "For example, we could work out the financial value of the items we have in stock, and add this as an extra column:" ] }, { "cell_type": "code", "execution_count": 80, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
stockpricevalue
apple100.555.50
banana220.5011.00
cucumber10.990.99
orange31.765.28
potato1100.1718.70
\n", "
" ], "text/plain": [ " stock price value\n", "apple 10 0.55 5.50\n", "banana 22 0.50 11.00\n", "cucumber 1 0.99 0.99\n", "orange 3 1.76 5.28\n", "potato 110 0.17 18.70" ] }, "execution_count": 80, "metadata": {}, "output_type": "execute_result" } ], "source": [ "shop['value'] = shop['price'] * shop['stock']\n", "shop" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Of course we can compute the sum, for example, to estimate the value of the total stock:" ] }, { "cell_type": "code", "execution_count": 81, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "41.47" ] }, "execution_count": 81, "metadata": {}, "output_type": "execute_result" } ], "source": [ "shop['value'].sum()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If, for whatever reason, we want to swap columns with rows, we can `transpose` the data frame like a numpy array:" ] }, { "cell_type": "code", "execution_count": 82, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
applebananacucumberorangepotato
stock10.0022.01.003.00110.00
price0.550.50.991.760.17
value5.5011.00.995.2818.70
\n", "
" ], "text/plain": [ " apple banana cucumber orange potato\n", "stock 10.00 22.0 1.00 3.00 110.00\n", "price 0.55 0.5 0.99 1.76 0.17\n", "value 5.50 11.0 0.99 5.28 18.70" ] }, "execution_count": 82, "metadata": {}, "output_type": "execute_result" } ], "source": [ "shop.transpose()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example: European population 2017\n", "\n", "Here is a second example to demonstrate some use cases of pandas DataFrames." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, we get the data. It is originally from EUROSTAT (reference \"demo_gind\")" ] }, { "cell_type": "code", "execution_count": 83, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--2019-06-19 20:36:43-- https://fangohr.github.io/data/eurostat/population2017/eu-pop-2017.csv\n", "Resolving fangohr.github.io... 185.199.108.153, 185.199.109.153, 185.199.111.153, ...\n", "Connecting to fangohr.github.io|185.199.108.153|:443... connected.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 1087 (1.1K) [text/csv]\n", "Saving to: ‘eu-pop-2017.csv’\n", "\n", "eu-pop-2017.csv 100%[===================>] 1.06K --.-KB/s in 0s \n", "\n", "2019-06-19 20:36:43 (61.0 MB/s) - ‘eu-pop-2017.csv’ saved [1087/1087]\n", "\n" ] } ], "source": [ "#NBVAL_IGNORE_OUTPUT\n", "!wget https://fangohr.github.io/data/eurostat/population2017/eu-pop-2017.csv" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The data source is a comma-separated-value file (CSV), which looks like this:" ] }, { "cell_type": "code", "execution_count": 84, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "geo,pop17,pop18,births,deaths\n", "Belgium ,11351727,11413058,119690,109666\n", "Bulgaria,7101859,7050034,63955,109791\n", "Czechia,10578820,10610055,114405,111443\n", "Denmark,5748769,5781190,61397,53261\n", "Germany,82521653,82850000,785000,933000\n", "Estonia ,1315634,1319133,13784,15543\n", "Ireland,4784383,4838259,62084,30324\n", "Greece,10768193,10738868,88523,124530\n", "Spain,46527039,46659302,390024,421269\n" ] } ], "source": [ "#NBVAL_IGNORE_OUTPUT\n", "!head eu-pop-2017.csv" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pandas has very strong support of reading files from different formats, including MS Excel, CSV, HDF5 and others. Each reading routine has a number of options to tailor the process.\n", "\n", "Many data science projects leave the data in their original files, and use a few lines of Python code to import it." ] }, { "cell_type": "code", "execution_count": 85, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df = pd.read_csv('eu-pop-2017.csv')" ] }, { "cell_type": "code", "execution_count": 86, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
geopop17pop18birthsdeaths
0Belgium1135172711413058119690109666
1Bulgaria7101859705003463955109791
2Czechia1057882010610055114405111443
3Denmark574876957811906139753261
4Germany8252165382850000785000933000
5Estonia131563413191331378415543
6Ireland478438348382596208430324
7Greece107681931073886888523124530
8Spain4652703946659302390024421269
9France6698908367221943767691603141
10Croatia415421241054933655653477
11Italy6058944560483973458151649061
12Cyprus85480286423692295997
13Latvia195011619343792082828757
14Lithuania284790428089012869640142
15Luxembourg59066760200561744263
16Hungary9797561977837194646131877
17Malta46029747570143193571
18Netherlands1708150717181084169200150027
19Austria877286588222678763383270
20Poland3797296437976687401982402852
21Portugal103095731029102786154109586
22Romania1964435019523621189474260599
23Slovenia206589520668802024120509
24Slovakia543534354431205796953914
25Finland550329755131305032153722
26Sweden99951531012024211541691972
27United Kingdom6580857366238007755043607172
\n", "
" ], "text/plain": [ " geo pop17 pop18 births deaths\n", "0 Belgium 11351727 11413058 119690 109666\n", "1 Bulgaria 7101859 7050034 63955 109791\n", "2 Czechia 10578820 10610055 114405 111443\n", "3 Denmark 5748769 5781190 61397 53261\n", "4 Germany 82521653 82850000 785000 933000\n", "5 Estonia 1315634 1319133 13784 15543\n", "6 Ireland 4784383 4838259 62084 30324\n", "7 Greece 10768193 10738868 88523 124530\n", "8 Spain 46527039 46659302 390024 421269\n", "9 France 66989083 67221943 767691 603141\n", "10 Croatia 4154212 4105493 36556 53477\n", "11 Italy 60589445 60483973 458151 649061\n", "12 Cyprus 854802 864236 9229 5997\n", "13 Latvia 1950116 1934379 20828 28757\n", "14 Lithuania 2847904 2808901 28696 40142\n", "15 Luxembourg 590667 602005 6174 4263\n", "16 Hungary 9797561 9778371 94646 131877\n", "17 Malta 460297 475701 4319 3571\n", "18 Netherlands 17081507 17181084 169200 150027\n", "19 Austria 8772865 8822267 87633 83270\n", "20 Poland 37972964 37976687 401982 402852\n", "21 Portugal 10309573 10291027 86154 109586\n", "22 Romania 19644350 19523621 189474 260599\n", "23 Slovenia 2065895 2066880 20241 20509\n", "24 Slovakia 5435343 5443120 57969 53914\n", "25 Finland 5503297 5513130 50321 53722\n", "26 Sweden 9995153 10120242 115416 91972\n", "27 United Kingdom 65808573 66238007 755043 607172" ] }, "execution_count": 86, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We look at the dataframe as it is, and use the 'head()' command which will only show the first 5 lines of data:" ] }, { "cell_type": "code", "execution_count": 87, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
geopop17pop18birthsdeaths
0Belgium1135172711413058119690109666
1Bulgaria7101859705003463955109791
2Czechia1057882010610055114405111443
3Denmark574876957811906139753261
4Germany8252165382850000785000933000
\n", "
" ], "text/plain": [ " geo pop17 pop18 births deaths\n", "0 Belgium 11351727 11413058 119690 109666\n", "1 Bulgaria 7101859 7050034 63955 109791\n", "2 Czechia 10578820 10610055 114405 111443\n", "3 Denmark 5748769 5781190 61397 53261\n", "4 Germany 82521653 82850000 785000 933000" ] }, "execution_count": 87, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The meaning of the colums, we have to get from metada information. In this case, we have the following description of the data:\n", "\n", "- **geo**: the country in question\n", "- **pop17**: the population count of that country as of 1 January 2017\n", "- **pop18**: the population count of that country as of 1 January 2018\n", "- **births**: the number of (live) births in the country during the year 2017\n", "- **deaths**: the number of deaths in that country during the year 2017\n", "\n", "The data is provided for all of the 28 European Union members (as of 2017)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We want to use the country as the country name as the index. We can achieve this either with" ] }, { "cell_type": "code", "execution_count": 88, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df2 = df.set_index('geo')" ] }, { "cell_type": "code", "execution_count": 89, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
pop17pop18birthsdeaths
geo
Belgium1135172711413058119690109666
Bulgaria7101859705003463955109791
Czechia1057882010610055114405111443
Denmark574876957811906139753261
Germany8252165382850000785000933000
\n", "
" ], "text/plain": [ " pop17 pop18 births deaths\n", "geo \n", "Belgium 11351727 11413058 119690 109666\n", "Bulgaria 7101859 7050034 63955 109791\n", "Czechia 10578820 10610055 114405 111443\n", "Denmark 5748769 5781190 61397 53261\n", "Germany 82521653 82850000 785000 933000" ] }, "execution_count": 89, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that we cannot change the index in a given DataFrame, so the `set_index()` method returns a new DataFrame. (This happens for many operations.)\n", "\n", "An as alternative, we can also modify the import statement to already indicate which column we want to use as the index:" ] }, { "cell_type": "code", "execution_count": 90, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df = pd.read_csv('eu-pop-2017.csv', index_col=\"geo\")" ] }, { "cell_type": "code", "execution_count": 91, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
pop17pop18birthsdeaths
geo
Belgium1135172711413058119690109666
Bulgaria7101859705003463955109791
Czechia1057882010610055114405111443
Denmark574876957811906139753261
Germany8252165382850000785000933000
\n", "
" ], "text/plain": [ " pop17 pop18 births deaths\n", "geo \n", "Belgium 11351727 11413058 119690 109666\n", "Bulgaria 7101859 7050034 63955 109791\n", "Czechia 10578820 10610055 114405 111443\n", "Denmark 5748769 5781190 61397 53261\n", "Germany 82521653 82850000 785000 933000" ] }, "execution_count": 91, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We explore the data by plotting some of it:" ] }, { "cell_type": "code", "execution_count": 92, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 92, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAW4AAAFYCAYAAABzgRY/AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAIABJREFUeJzt3Xe8HHW9//HXmyQQ6S0qipDYAKUbkKLSrgVBFJWrCBYsyLUgWLjq7ypY7hUrogIaRVBAqoJiQZEiIDWBQJCilAhRlIAgTVDw8/vjO5szZ8/M7Mye3ZNM8n4+Hvs4Z2e/M/vd3dnPfudbFRGYmVl7LLe4M2BmZs04cJuZtYwDt5lZyzhwm5m1jAO3mVnLOHCbmbXM0AK3pO9KulvS9TXSHiFpbnb7vaT7h5UvM7O207D6cUt6CfAQ8P2I2LjBfu8HtoiItw8lY2ZmLTe0EndEXAT8Lb9N0rMknSNpjqSLJW1YsOvewMnDypeZWdtNnuDnmwUcEBF/kPRC4Ghg586DktYHZgDnT3C+zMxaY8ICt6SVge2A0yV1Nq/QleyNwBkR8cRE5cvMrG0mssS9HHB/RGxekeaNwHsnKD9mZq00Yd0BI+IB4HZJewEo2azzeFbfvQZw2UTlycysjYbZHfBkUhDeQNICSe8A9gHeIela4HfAq3O7vBE4JTxdoZlZpaF1BzQzs+HwyEkzs5Zx4DYza5mh9CpZe+21Y/r06cM4tJnZUmnOnDn3RMS0OmmHErinT5/O7Nmzh3FoM7OlkqQ/1k3rqhIzs5Zx4DYzaxkHbjOzlpnoSabMbBn1r3/9iwULFvDoo48u7qwsVlOnTmXddddlypQpfR/DgdvMJsSCBQtYZZVVmD59OrmJ5pYpEcG9997LggULmDFjRt/HcVWJmU2IRx99lLXWWmuZDdoAklhrrbXGfdXhwG1mE2ZZDtodg3gPHLjNzBq66KKL2HLLLZk8eTJnnHHGou0XXHABm2+++aLb1KlTOeusswb+/K7jrnDUAcUL8bz3mzsXbjez+qZ/9GcDPd78w3cb6PGqrLfeehx//PF86UtfGrV9p512Yu7cuQD87W9/49nPfjYve9nLBv78LnGb2TJj/vz5bLjhhuyzzz5stNFGvP71r+eRRx7hvPPOY4sttmCTTTbh7W9/O4899hiQRoEfcsghbLLJJmy99dbccssti7ZvuummLLdceQg944wz2HXXXVlxxRUH/jocuM1smXLzzTfznve8hxtvvJFVV12Vr3zlK7ztbW/j1FNPZd68eTz++OMcc8wxi9KvttpqzJs3j/e9730cdNBBtZ/nlFNOYe+99x7GS6gXuCUdLOl3kq6XdLKkqUPJjZnZkD3jGc9g++23B2DfffflvPPOY8aMGTz3uc8F4K1vfSsXXXTRovSd4Lv33ntz2WX1Fui66667mDdvHi9/+csHnPukZ+CW9HTgQGBmRGwMTCKtVmNm1jrdvTpWX3312unr9gg57bTT2HPPPcc1yKZK3aqSycCTJE0GVgT+PJTcmJkN2R133LGo5PyDH/yAmTNnMn/+/EX11yeccAI77LDDovSnnnrqor/bbrttrec4+eSTh1ZNAjUCd0T8CfgScAdwF/D3iPjV0HJkZjZEG2ywAUcddRQbbbQR9913HwcffDDHHXcce+21F5tssgnLLbccBxxwwKL09913H5tuuilHHnkkRxxxBABXXXUV6667Lqeffjrvfve7ef7zn78o/fz587nzzjtHBf9B69kdUNIapEV9ZwD3A6dL2jciTuxKtz+wP6SuMmZmVSay+17e5MmTOfHEUeGLXXbZhWuuuaYw/Uc+8hE+//nPj9q21VZbsWDBgsL006dP509/+tNgMluiTlXJfwC3R8TCiPgX8CNgu+5EETErImZGxMxp02ot4mBmZn2oMwDnDmAbSSsC/wB2Aby8jZm1zvTp07n++utrp58/f/7wMjMOdeq4rwDOAK4G5mX7zBpyvszMrEStIe8RcShw6JDzYmZLuYhY5ieaiohxH8MjJ81sQkydOpV77713IIGrrTrzcU+dOr4xjJ5kyswmxLrrrsuCBQtYuHDh4s7KYtVZAWc8HLjNbEJMmTJlXKu+2AhXlZiZtYwDt5lZyzhwm5m1jAO3mVnLOHCbmbWMA7eZWcs4cJuZtYwDt5lZyzhwm5m1jAO3mVnLOHCbmbWMA7eZWcs4cJuZtUzPwC1pA0lzc7cHJB00EZkzM7Oxek7rGhE3A5sDSJoE/Ak4c8j5MjOzEk2rSnYBbo2IPw4jM2Zm1lvTwP1G4ORhZMTMzOqpHbglLQ/sAZxe8vj+kmZLmr2sL01kZjZMTUrcuwJXR8Rfix6MiFkRMTMiZk6bNm0wuTMzszGaBO69cTWJmdliVytwS1oJeCnwo+Fmx8zMeqm1yntEPAysNeS8mJlZDR45aWbWMg7cZmYtU6uqxJYNT71gbuH2v+y0+QTnxMyquMRtZtYyDtxmZi3jqhIzsyH78ht2L9z+oVN/2tfxXOI2M2sZB24zs5Zx4DYzaxkHbjOzlnHgNjNrGQduM7OWceA2M2sZB24zs5Zx4DYzaxkHbjOzlqm7As7qks6QdJOkGyVtO+yMmZlZsbpzlRwJnBMRr89We19xiHkyM7MKPQO3pNWAlwBvA4iIfwL/HG62zMysTJ0S9wxgIXCcpM2AOcAHsnUozWqb/tGfFW6ff/huE5wTs3arU8c9GdgSOCYitgAeBj7anUjS/pJmS5q9cOHCAWfTzMw66gTuBcCCiLgiu38GKZCPEhGzImJmRMycNm3aIPNoZmY5PQN3RPwFuFPSBtmmXYAbhporMzMrVbdXyfuBk7IeJbcB+w0vS2ZmVqVW4I6IucDMIefFzMxq8MhJM7OWceA2M2sZB24zs5Zx4DYzaxkHbjOzlnHgNjNrGQduM7OWceA2M2sZB24zs5Zx4DYzaxkHbjOzlnHgNjNrGQduM7OWceA2M2sZB24zs5Zx4DYza5laCylImg88CDwBPB4RXlTBzGwxqbt0GcBOEXHP0HJiZma1uKrEzKxl6gbuAH4laY6k/YeZITMzq1a3quRFEfEnSU8GzpV0U0RclE+QBfT9AdZbb70BZ9PMzDpqlbgj4k/Z37uBM4GtC9LMioiZETFz2rRpg82lmZkt0jNwS1pJ0iqd/4GXAdcPO2NmZlasTlXJU4AzJXXS/yAizhlqrszMrFTPwB0RtwGbTUBezMysBncHNDNrGQduM7OWceA2M2sZB24zs5ZpMleJtcx55z+rcPsuO986wTkxs0FyidvMrGUcuM3MWsaB28ysZRy4zcxaxoHbzKxlHLjNzFrGgdvMrGUcuM3MWsaB28ysZRy4zcxaxoHbzKxlagduSZMkXSPpp8PMkJmZVWtS4v4AcOOwMmJmZvXUCtyS1gV2A74z3OyYmVkvdUvcXwUOAf49xLyYmVkNPQO3pN2BuyNiTo90+0uaLWn2woULB5ZBMzMbrU6Je3tgD0nzgVOAnSWd2J0oImZFxMyImDlt2rQBZ9PMzDp6Bu6I+FhErBsR04E3AudHxL5Dz5mZmRVyP24zs5ZptOZkRFwIXDiUnJiZWS1eLNjMCj31grmF2/+y0+YTnBPr5qoSM7OWceA2M2sZB24zs5Zx4DYzaxkHbjOzlnHgNjNrGQduM7OWceA2M2sZB24zs5Zx4DYzaxkHbjOzlnHgNjNrGQduM7OWceA2M2sZB24zs5aps1jwVElXSrpW0u8kfWoiMmZmZsXqLKTwGLBzRDwkaQpwiaRfRMTlQ86bmZkV6Bm4IyKAh7K7U7JbDDNTZmZWrlYdt6RJkuYCdwPnRsQVw82WmZmVqRW4I+KJiNgcWBfYWtLG3Wkk7S9ptqTZCxcuHHQ+zcws06hXSUTcD1wAvKLgsVkRMTMiZk6bNm1Q+TMzsy51epVMk7R69v+TgJcCNw07Y2ZmVqxOr5J1gO9JmkQK9KdFxE+Hmy0zMytTp1fJdcAWE5AXMzOrwSMnzcxaxoHbzKxlHLjNzFrGgdvMrGUcuM3MWsaB28ysZRy4zcxaxoHbzKxlHLjNzFrGgdvMrGUcuM3MWsaB28ysZRy4zcxaxoHbzKxlHLjNzFrGgdvMrGXqLF32DEkXSLpB0u8kfWAiMmZmZsXqLF32OPChiLha0irAHEnnRsQNQ86bmZkV6Fnijoi7IuLq7P8HgRuBpw87Y2ZmVqxRHbek6aT1J68YRmbMzKy32oFb0srAD4GDIuKBgsf3lzRb0uyFCxcOMo9mZpZTp44bSVNIQfukiPhRUZqImAXMApg5c2YMLIdmtlSa/tGfFW6ff/huE5yT9qnTq0TAscCNEfGV4WfJzMyq1Kkq2R54M7CzpLnZ7ZVDzpeZmZXoWVUSEZcAmoC8mJlZDR45aWbWMg7cZmYt48BtZtYytboDmtmS57DDDmu03ZYeLnGbmbWMA7eZWcs4cJuZtYwDt5lZyzhwm5m1jAO3mVnLOHCbmbWMA7eZWcs4cJuZtYwDt5lZyzhwm5m1jAO3mVnLOHCbmbVMz9kBJX0X2B24OyI2bvoEXhDUzGyw6pS4jwdeMeR8mJlZTXXWnLxI0vThZ6W5Tb63SeH2eW+dN8E5mRief9nMYIALKUjaH9gfYL311hvUYc2sJZalatGjDji/cPt7v7nzhDz/wAJ3RMwCZgHMnDkzBnVcs7b68ht2L9z+oVN/OsE5saWNe5WYmbWMA7eZWcvU6Q54MrAjsLakBcChEXHssDNmZoN13vnPKty+y863TnBObLzq9CrZeyIyYmZm9biqxMysZRy4zcxaZmDdAQfmsNVKtv99YvNhZraEconbzKxlHLjNzFpmyasqGaIbN9yocPtGN904wTkxM+vfMhW4zWzZ0WQSurYV6lxVYmbWMi5xL0YLPnpx4fZ1D3/xBOfEzNrEgdv6tixN42m2JHFViZlZyzhwm5m1jKtKbJnVtp4EZh0ucZuZtYxL3GbWDp7HaBEHbltqNBlwYdZmtQK3pFcARwKTgO9ExOFDzVVLeXFYM5sIdZYumwQcBbwUWABcJeknEXHDsDNnyzhfGpsVqlPi3hq4JSJuA5B0CvBqwIHblilHHXB+4fb3fnPngRzfI2mtrjq9Sp4O3Jm7vyDbZmZmi4EiojqB9HrgFRHxzuz+m4EXRsT7utLtD+yf3d0AuLngcGsD9zTIX5P0wzy20zu90y876RdXXtaPiGm1jhARlTdgW+CXufsfAz7Wa7+SY80eVvphHtvpnd7pl530S1Jeym51qkquAp4jaYak5YE3Aj+psZ+ZmQ1Bz8bJiHhc0vuAX5K6A343In439JyZmVmhWv24I+LnwM8H8Hyzhph+mMd2eqd3+mUn/ZKUl0I9GyfNzGzJ4kmmzMxaxoHbzKxlPMlUy0h6ErBeRBT1k1+iSdokIvqa8UnSGsAzIuK6AWdraCS9CvhZRPx7cefFFp9s2pDdgOnkYm5EfKXfYy5xgVvSc4DPAc8Dpna2R8QzK/Z5clfaO0rS7RoRv+jadkBEfLMk/ZdZgnrRZIHgS8DywAxJmwOfjog9FnO+ar3/wNGSVgCOB06KiMpJRyRdCOxBOk/nAHdL+m1EfLBin92A53fl59MlaacC7yhI//aS9GcD3Y1CfwdmA9+KiEe7HnsD8FVJPySdRzeV5Ts7/jTgvxl77heOqZe0PTA3Ih6WtC+wJXBkRPxxPK9V0ryC17lIRGxaM71S8tHpC56v5/nT9LOqq+lrze23AvA6xgbjonPtbOBRYB4wkB/xJS5wA8cBhwJHADsB+1FSpSNpD+DLwNOAu4H1gRtJH26RT0h6LCLOz/Y/JHuOwsCdHWuWpMlZvk7uFWzqkPQg1SfLqiUPHUaaO+bCLN1cSTMqnue1wOeBJ5O+RJ0vUtnxG2n6/kfEi7Mf5rcDcyRdCRwXEeeWPMVqEfGApHcC34+IQyWVlrglfRNYkfSZfgd4PXBlxUs4AbgJeDnwaWCfLP9lbgOmASdn998APAg8F/g28Oau17uvpFWBvYHjJQUj59GDBcc/CTiVVDo7AHgrsLAiP8cAm0naDPgQ6TV/H9hhnK+1M83le3P7ku1Tlb6RhudPo89K0jbA14GNSAWdScDDBed+09fa8WPSj/Yc4LEeadft9ePV2HhH8NQYJbQpqdT02s6tR/o52d953dsK0l4LrAVck93fCTi24thrA5cDLwb+F/ghsHyN17ABcDjwR+AHwE4Dem8+A7wHWAVYFfgvUgm6LP3l2d9rctuuq0h/C7BRg/y8CNgv+38aMKNH+kbvf26/SaTSyp9IX76bis4LUgllHeBXwFY1Xu91XX9XBi6uSH9NV/opnfe4JP1VZduA31XstxZwEDAf+AXwB+D9Fef+dVXPmXvs6uzvJ4F35LeN97V2n2fdzzmIW5Pzp4/PajbwbOCa7HzbD/jcoF4rcH2D1/l54GWDet8i6o2c7Juk7wLfJX1JX5Xdev06PyZpOeAPkt4naU/SF7DIvyLiXmA5SctFxAXAzLIDR8Q9pB+Ro0i/8q+PiH/2eA2TgA2z2z2kk+2D2SyJ+XTPkXSGpBsk3da59Xite0TE0RHxYEQ8EBHHkGZeLPM7SW8CJmXP93Xg0or0f42IWgsoSjqUdJn+sWzTFODEHrs1ev8lbSrpCFKw3hl4VURslP1/RMEunyYN/LolIq6S9ExS0Cvzj+zvI5KeBvyLFPhL85/9vV/SxsBqpKuTMitLWi/3etZj5Nwccx5JerWkM0lXSFOArSNiV6BTQi7Lz12SdpO0BbBmRX4elPQxYF/gZ9n3ZkpJ2qavNXsJ2j53ZzsqOjRI2kbSVZIekvRPSU9IeqDi+E3On8b5j4hbgEkR8UREHAe8oiJ5o9cKXCqpeOWOsS4HzpT0D0kPSHqwx/vS2yB/BQp+aW7oY5+tSF+GdUmXlT8CtilJ++ss7ddJl69HApcWpHsQeCB3exR4qLO9Ii9HkEqt3yJ96fKP3dx1/xJgF+A60iXfYVSUnrN9LiVdkk0inST7FOU/l35F0pXCVdnts8DUivRHki6996bHFQ8wl1SVUqs03+T9z6X/DfAW4EkFj715AOfbJ4DVSQWFvwB3AZ+pSP9OYA1S1cJtpMv1AyrSvxK4A7iAFIz/SKrWWAk4qCD98cBLSo61S8G23UkBaePsOeaQftzK8vNU4IPAi7P76wFvGcRrzfZ5AamgMj97rXOBLSvSNy3l1j5/+visLiJVkXwf+AJwMHDtAF/rDaQf65tJ3/l5Zd8X4HZSzYPGe453bkMdgCPpWODLMaRFFyStRArCIgW91UiNXvcWpBWpV0JZw1nR8fcDTouIhwseWy1y9d2S5kTECyTNi4hN8tsqjj+ddLJuT6rz/i0pAMyvm8ce+T+uYHNEcYPUlRGxtaSrI2LL7L29LCrq5pq8/+PI/5gTtCj/BfuuQPpRG8iqC1lpdhtSMN0w23xzjG2Q7KSfBPw6InZq8BzbR8Rve22baJJWA+j1XkqaHREzJV3XOW8kXRMRW5SkH9r5I2l9UnCfQgraqwFHRyqFV+1X97WuX7Q9ihuGLwJ2jAH2Lhp24N6BNCHVX0gV+KWtzJK+GhEHlbTcEwPoOZEPqg32eTqpBJ1vOb6oIN2lpDriM4DzSfW3h0fEBuPK9OjnOBfYKyLuz+6vAZwSES8fwLE/DDyHtNLR50gNiD+IiK8P4Nh99TqQ9Lrc3anAnsCfI+LArnQ7R8T5WWPsGBHxo670+0bEiZIKe6dESTetqiBUkv480hVOrR+Pzo9mjW21G7f7fa3Zvk16TnQC1H+QGkk7Vzxvi4jNyp6jl/Hkv+HzNHqt2T4vAp4TEccp9QhaOSJuL0h3PPBMUvvGoobM8eR92L1KjiW1tNfpBtNpzf1Sr4NKuiQiXlRwAvfqNXG1pK0i4qpez5E9z+Gk2RBvAJ7INgfpMqzbB0hVGQeSGh13JvUKqDr+NOBdjD1ZykqUa3eCdpbuPqWuVN3HPSQivpDVgRf9CB5YsO1Lkl5KqkraAPhklPT26OP976vXQUT8sOt5TyZVSXXbgfRj+aqiw5Cq2/JWyv6u0jBL52U/Jj+KeiWeh4B52Q/uoqu2gh+ebYHtgGldAWpVUpXDKBGxSrbfZ0jB8QRGSq3ddfr9vlZo1nMC0nd9EvA+Uin3GaRgOErD86dR/iWdFhH/WVZYqLiCbPRaszahmaTvynGMtAltX5D89uy2fHYbt2GXuC+LiG0bpJ9E6vbVqytOv/m5iVQH90fSF6lXie9mYNOIqHPS9pOfS4GLSSdL54dhTMDKpZ8D7Nmp7sku184sKJG9KiLOllT4wxER3ys49gzgrs6lv9JAn6cMqtpmECRtQBrQ8uySxydFxBNFjw3o+R8kBZLHGbnELy0o1H3/syvTHUldAPNdUx8Ezo6IwgZZSdd2l2ZLtk0CDoyIogbgUpKuj4iNm+yzuElaJyLualKVke3X6LVKmgtsQep5skW27bqKHwYkrZzl4aG6z1Nm2CXuayT9gNQBPX+J0F0C6mx/QtL6kpaPHr09OrKT8imMLrGW1WM3rVK4jfRLWhq4x1nFs2JE/HeD/Pw/4BJJvyEFjRczsupQ/jnPzv6OCdAVTieV+jqeyLZtVbaDpBMi4s29tuUeq9u3tpO+u0T2F1LPlzK3SzqH1CB7fq9ScdMrnk5Jt666739E/Ab4jaTjywJLiYcl7QOcQnqf9iZXss8d/wlJe1Pcc6fKpWow2rWklNsZoPTZorrrut/fup9VRNyV/fu8KBhsR/mYjUavFfhnRIRS3/xOfX0hpV4wJ5D1EJJ0D6kRue+BfcMO3E8iBb2X5bYVXbrm3Qb8VtJPGH15OaY+SNL7SYN1/spIVUyQWnDH6Hwp1DVSq8IjwNysrjL/w5O/1K1dxVPgp5JeGWna3J4i4hxJW5IaySA1ZJYumaRmI/Em538sI+KfSgtnVBk1UEJpoFJpYyzwDVLV0+mky8y3kAavFOVdwPMrfoSLbEiqlnkvcKykn5LaAIqqVyBdHl9M6t3Qs6Qu6SVF24vaPLL0tUYB53/008sec/yyH/83kRq3j2SkcftNJWl/K+kbpB+1/Pfq6pL0kNps3ibpdnq0UWV+QXoff5DdfyOp+vAvpB42o6qyGn5/G31WNB9s1/S1nibpW8Dqkt5FahP6dknaWcAHI3V3RNKOWdrtStL3tMRN65rVHY0REZ8qSHsLaf3LWq3QKhmpFRGFI/2aVDX0I3fp/Ripn2qvS+9OPeYzI+LTSv2InxoRhaMDJf2K9EX9MLmReEWl/Kwe9usR8ZPs/qtJl9e7FKT9GPBx0g/zI53NpO5RsyLiY937ZPs17XXQuDE5t+8apIC2T0SMqSfO0syNiM0bHPPs3N2ppFGsc0p+CJF0CSOjgF9FNgo4Ij7ZlW6HqufNSuTjIumC4kMX5z3bp2l1Q2njatFn2eT728dntTbwU+AjpP7bGwJ7l13JN32t2T4vJRVKRVresaxNqFaVViMxoH6FRTdSpf13u28DPP4FpJJi3fR9jfSreeztgXOB35OuGm4Hbhvw+3kMafDQjdn9NageWVd7JB7wLNJAgTuAO0l9zJ/dIz+lfXRL0jftW/s9shGTDZ5jB+Do7DM4DXhdRdrPAq8cx+fxDOCHNd7/nqOA+3z+aaQf0Fm9vl+kH/ue20r2fTKpj/h6pAnOytJdS268A6ma7drs/6KRibW/v/18Vlm+r8viUK0+1HVfa8N8nEkaYzA9u/0PqW2q72MOu3GyVneurn2mAYcwdjKZnXNpOi3vzye16v6MGt1sciW+a4EtIuLfVb98dS91s7Q3kQJRd0NjZWkiKxk+p+v4ZZfendLLolJqj/xfHhHbSPol8DXgz8AZEfGsivw0akBpmP/1SZfFyzPSt/aoiLi1JH3TxuT5pMEfpwE/iYL+913pG13xFOwv0lD355U83qiLaJPzLXf8Wo3bJaXhXuMMml6hbkX68ViZ9F4+QBo48ztgt4g4rSv9sdT8/uY+q39mt8LPqqBdZHlSY3IUpW/6WguOP0rR8bPvyKdI5wKkz+ywiLiv7Di9DLWOu/sEUnl3rrzORDu7Uz7RTqeR6I7sVrebzf1ZYLoIOEnS3RQ05uQcR80Jr4C/R1djSC9Kkyd9gDRKdC6p7voyUlfCIv/KGnM69aHTqO5m+VmlAQUfIjUKrkoKmEV5GdWPtVPXGtX9WJvm/zURcSSpR8ansmN8gFSlUaRpY/Jbun80VDGAJRo2Nmp098rlgM2Bqjripl1Em5xvUKNxW9KGpALOahrdz31VerfzfIb0mf46IraQtBNpeH2hSN1sN1HxIJbTCnap/f2t+1k1/Uxzar3WaNYVs7PPfaRzYGAmtI5bPbpzZWk6IxDz9aBXRURp74YGz99opJYajIZU6vM9idTwmi89lH6xlVrhtyJNlrN59iX7v4goHEii1IPgDaQGwONJs9/9T0Sc3vPF96DUG6PTjzVfevvyAPNfVOqrHNSSNca+iKzxrcf7WWsAS9fjTa4Y8kH3cWB+2Y9CP5qcb9ljnyUNES9t3M7aKl5DmqPnJ7mHHiQ13JbOddPHFeoni7ZX/fjXlWvfmRERn5H0DGCdKGnfyfZp8tk2fa09661V0tMsl5e+BxUOtcSdu6xQ9rdXdy7ommiHdHlfONGOGo4k7Fw6K021eXZRmi6jJrwiXeqWTXj1wuxvfpKcoLz0CfBoRDwqCUkrRMRN2Y9boYg4Sakv9y6k9/Q1UTGJlNKkTEcC25JK5pcBB0dE0eRX60ZE1SQ8fedfqSvam4AZSr2FOlYB/laR/08CezHSC+k4SadHxGe70jUawJLbr9EVQ0R8T6mnTacnTOViFiVf3Kr5u5ucb2R5/7ik0qqeiPgx8GNJ20bEZVX5LdD0CjX/2FTSVXPV+dmzWjTnaNI5vDOpdPwQqb2nsEDXx9Vg49eq3l0xOz3NXkuaV6YzadvepCrD/vVTMT7MG8UT7exRknZuwbYxjSC5x95N+vGYT40GRBpMeNXnaz2TNCnSYaQT5sfAz3vsU3vqVVJj45tJP9CTSZd+V5SknQVsMoz8k+oLdyR9cXbI3bakonGKFBin5u4/ia7JvbLtO5CqGO7K/nY8wgM7AAAZXUlEQVRuHyQNSS47/jxSwJib3d+QNCqyLP2OpPr232Sv93ZKJpHK0h9J6hrXmRnzRFIAOgo4YSLPN1Jj8KqkcQnnkaof9+2xz0qkH77JpCqeA4G1GjznCsCFFY//irQ4wo3ZZ/hd4PMlaTtT2OYnQatq2G762TZ6raQqxR+TZgxdCJwFTC9JO7vOtkaf5yBOioJMbZj93bLoNsDnmUOu5TcLEFVz6P6BNGx8GK95NeArpNLUbFJDx2oN9t+BdDlbOj94FozOBn6f3X8aqfqgLP2Y2crKTnYazHbWb/77eE8vAFbP3V+dNLCmLP36DY/fmUt7LrBC9n/VvNpzgA1y959LRS8R+py/u+FrWIPULfElnVtJuk4A25M0FcVqVYFvgHm7per9zP7W6fV0RRZYOwF8GtWFtEaf7ZDfhxvJ9eABZpD1DOv3Nqyqkg+SRvQV1Y9WVh9I+lrB5r+TfqF+3LW91kjCnFsZ6Xfck6SZ2XOsz+jRWkW9Gr4LXA/8Z3b/zaRS05j6XklFVT+dEVsrU159sCfZMNssH3+WVNUY8wtJH2Xkcu4NwM87zx8R+efZteI4pZRWX3lxdvfiKOgnW9ES36sXx99Jc5Cfm+3/UuDKzjkSY3snPSLpi9S79AZYIGl1UmnpXEn3kUrUZaZEbq3PiPi9pLL5ryGbvztGpigonL+7q/pojCipC21YHdDJ527A6RHxdxUM9smOWziHCD0+L40eOTmJFFw/U/rCGlSLknpFnQk8WdL/krXvVBy71mfb77kp6bmk7rlPiYiNJW1Kqhn4bEHyg4ELlebnFymevLsi7z0tiQNwZpEuazoNbq8jXZKuRarWOKgr/dqMjCS8PKpHEm5BCqZXUD4SMp/+ZlIH/lGTZEXx1I1jBggUbcu2387IF2HRYRk5Wcq6fzWaejV7ns6x6X6+oudR/fUjOz1C3sVIHfSepAE4455RMDt+VQ8MYuycH7UHHBU81w6kUug5RT8+WZrvks6DTl3lPqSJ+svWqHwlaaTeraT3fgZpxaMLgXdFxFezdAtJfedPJp2boyJqlAzAadI4nDWev4a02MTWpKuXn0bEC7vT9kujB7E8TlrI4/GK9LuTusY9g5FeT5+KbBBYQfoNGWnfOS/qLxLS87NtKissfoTUVtHpmls634lSr63OdMA3xXjnPxryJcJrC267AE+u2Ody0pehc38yqRQxia6FGUgf4L6kmewgdZrfuuLYV5KqM/YjfanfCry1Iv0lDV7rZcCLcve3JwXVQb6fHyYt6nAbKWBeRvESWFuRRlR27r+V1KPga8CaJcfeg1SV9DDph/Lf9Li0JFWprJS7vxINqldqvN5XkUYa1k3fdOmv9YpuFelXIF1N/ii7HUx2Gd5jn82yW+GiF9m5/QrSgKNrSINNnl/j9Tat6lmz890idVN8ao3n2Iw029/7SBOuVaUtqrcfs63Pc+FrwHbDSp/7HJ5W81zovPf5OvcxbW7Z9sZxsNdt2HOVvIPUo+GC7P6OpHrCGZI+HREnFOyzBulystMHdCVSsHkiaz3Py7c0f5rUxemHlE+MNCUqVggvcKik75Aac3pNknUA8H1lfViB++g9rWt3F6fKIexRf+rVb5HmRUZpfo3PAe8n9TueRbrM7Naoz27nJTB63ogn6CotjlOjVdJpdukNaeBH50pnKqlEfDPlix0/pjTfx3mk8+7m6F2CewEjEyNtJomI+H7XcZ8AzgHOyUpme5MurT8VEd+oOHbtqh5Jb8n9n3/o+2NTL0rXfUV1kqSqK6pGc9cozUj5fsZOHFVUNTQH+J+s19KZpN5js8uO3TS9Gs57BNwj6VmMjKl4PalxvEgnDp5POtd2pHccrDaIX8OKX6VfkuqAOvefkm1bk5LFNrMXeTupSuN4UunynaQA/sWutE1bmv+PVAe+TpaHNSkpgWbpTyQ1NH4vy89xlA8pnpH9XRVYNb+t4vi1h7CTSgMX1Hzfr839fxRplFavUsHszr5kpdyq9zJ7/INZ+sOy21wKlvAa5zm0Kqk+8HLSFcb+wColaRst/VWw/5bAdyoe341UpXEhqWfJHcCuFelPIE0dcDSpKuDrwNdK0q5AKomdTlqW7hPA0xvkfQcqGodzz/910gRHt5FG0VYds9YVFWmd0gdJ1SOd5QEfBO6leumya0m9N3Yi19uoR57WJP2YnAf8ocb7Uis9aYnCJj1mnkma8OoRUrfNSyhpHKePONjz+fvZqcGLK6rauCH7v6pFeB3SormvBp5Wka5pS/PtBbeq7oBjup5VpB3Tm4Ue81LQ/IfnPGr0VCE1kk7O/r+JXE+DshOFhutH5vbbMvvyHUgauDCM86jWKukl+zb6ISE3r0jBYzeRm7+FNL/LTRXpb4Tec2SQSr1Xk6pINq6Rfs2qW83XuTqpzrfyvWB0d8ypZe8PaYRno3mIKOma2mOfrUmdHm4hzVU+kPQ0nPcot99KlBQkcmn6ioNVt2FXlVyoNLVmvqHxwqxR7f58QqURcnl3Zn+fKumpUTxirlFLc0TMaJj/SyU9LyrWzNT4hhQ3HcJea0UVUuD9jdK8v/8gNQAh6dmMVEF1e3WW9mBGRpVWDXefRKpP3ZDqYd99U5o/Yj/SfCXfJ7Vf3C1pRVL3xTqNoB8Evlpy/Hy12XKkH6E/VxzrwRi9ZuFtpJJlmetJAy/KLqE79iV9nh8ADsxVZZT1bJhDReM2qTTYy8M10h0HXKG0Uj2kxs1jixJGGmnYdHTzkUqzgf6KHqONJX2B1Ph9K6kB+jORWw1qvOlJn+WFkurOe3Qr6Srw4uxWNbd27ThY17AD93tJmews5/N90mxqQbo8yisdWk1JF8JoOJIQQNJ2jK1TK6vn24Y0H/ftlM/RuwHpEn11Rs83/CDpEq1K0y5OnUYxKO4pkh6I+F+lOcTXAX6Vvd+QgtP7u9NnQfinkRa2/TepaqhSpDaHm/Pd3QYl+4F5KuncOSKyYcqStpe0SkTcKukddQ9X8Vi+K+XjpDrvogmaOj/IsyX9nDTvRpBGdVYtg7c2cIOkKxkdDEbV4UZE1XwkY0TEjKx9pPbi1xo9inMSaTGLovlD8s/zFUkXMjI50n4RcU3FLo2WBgQ2IXWb3ZnR9cpF3RlvBbaNil5j40zfdN6j55FGS78Y+GJWl35dROxZkLZJHKxliesO2ITS7GJfj4i5uW2HRcRhJelPIF3eziW3hmRBibWTfv2i7VHcHbCfIcW1ujgpzTexbkQcld2/klQtFMB/x2DmKmm0sG22z0WkfuVXMvoKYFwLO2elk49F12okkjYhdXcrWluy7Fh3RMR648zPcVWPR8R+JfvtUJJ+3PNrZ8evPV95V14eJwXvN0TEewvSTiU1tj+bVF1ybFR068vt13Q2x1tIK9XUXe1qD9IgI4DfRLbS06DSN5E1vG5Fqpd/Eak677qIGFf/7LqGUuJW/wMuRrV+55WUil8OzJT05dzje5AayorMJJ0oPX+tslLoL7OqgDr2lPQ7UnXDOaTW6IMj4sSixF1VDb16SxxCWk2kY3lSa/3KpMvZcQdu6lfDdErETyE1oOW9mN7VAnU8pTtoZ3mZJ2l6QX6qzrcnlT2J0iCKDzP2CmxUia8sMPcyqABdoXYJNyJ+ozSO4U2kK4XbKbi6yHyP1EPnYtLArI1IbQy9NJ3N8XrSlerdvRJK+hypvvqkbNOBWWHp4+NJr/6XHnyA9KP2FeDbUTF9c3bF9nnSXN+iRhzsZSiBO/qfWhFGd+WbSiqNXk1xt6W7SZcaJ0p6IamOsOrSuG6dYz9VAS+LiEMk7UlqRHstaT6LwsDd8PjLR8SdufuXRBr1+DdVrHXXUL4appevUlwi/hup505hPWgDq1c8NiYQj+N8O500QOY71Fu6rEn3te4flOVJoxdL19jswwuBfSSVlnCzH6e9s9s9pPpeZdViZZ4XI7MTHku6ouopIv6osSNpr63YZXXgJklXUVGVlNkN2Dwi/p3lq9PnvTBwN0jf79KDe5NK2u8B3qk0N/pFEXFeQdovkHo31RowVMew67iR9CLSRD/HKY1yXCUibi9LHxGj6mCV+qmeUnb47NL+VZIOI7UMr1aSFmrWOeasQRpyXacqoPaQ4j6Ov0b+TkS8L3d3Wq8nqdL54Yhmy7E1KhH3Ybakd0XEqDX8lIZ4zxnA8Tsej4hjGqQ/i/SjdDbVjcjA6B+UrE761YyM8h2EOiXcm0gl5907DauSCudkz+n0hyciHq9xHpMdt7vf94mq7vd9aK0Dj1idkekgqr7nTdIvhOZXRzEy6+KGpKuSg0hXxkVXeH8dZNCG4U/reiipemID0iX98qQS6PZV+3V5mDQwIn/cTuPVoqGxEXGYpCdIndvLHNbgeWFsVUCVs7M6vn8A/5X1EOmetrPf419REsjeTc3SUIWzSL0pkPTDiHhdj/TQsETch4OAM5WmzewE6pmk86eo8acRjcwVc7ak95AaiPM/5GVzxTwaEUVz6fSUVc+dlX0nPtrPMQqO+ceugtE0xk4D+1pSNdsFSnOun0LvQVKbSXog+1/Ak7L7vS7x30FaQ7IzffLnSX3vCwN3Vn3zFEausq+MiLJqk88B1yitnSlS3XXV+1g3fT/nP0qDwjYjNYJeRGpkLfsuzpZ0avZcvQby1TLspcvmkk2KFCPj+RctkFCyT76uaTlS6+1pEfHRXJq+G6+yBsfnRMSvlbqVTYqI0i5dTdJnAeHvWTVIp3/nX8qOXZfS/CGdD73TVeoFpEEbr4mIvuf21ehl0CoXNcjtczJplr6iEvFLI+IN/ean63g7kQbTQGoPOH9Ax72dsd3pOiLK54p5E2li/p7d17L0+e6hy5F+fHaIiG37zHr38RcVjCLiuZKeRrraG1Mwys7HV5Mu8XcmVT2eGRG/GkResueYR1oj9NHs/lTSgLLCBlRJ/wl8kTSgqTNJ3Eci4oyS9OswOshXfrfqpG96/it1ebyTNLHXNaSunK8jVY8eVvSjX9K4HVEyx00dww7cjSZFyvbpbv3+Y0Qs6EpTuiKOKlraJb2LNPJuzYh4ltIaf9+MgpXM66aXdEhEfCH7f6/I9fCQ9H9FjSc9GtNKSzSSdmZkWPFAAplyK8Sox2oxuX2eQiql/pOCEvEgfqwmgqSp0bWYQdG23GOfI5WsbiXXfS3KV3nPf2EfJ325v11Rqmykn4JRlmYNUgPlG8rO/T7z80HSNA/5ft/HRzaZVkH6a0k/9Hdn96eRplzIryJTeT52/2j2kb7R+S/pauA/IuJvStNJnMLIdBIbRUTRdBIDN+zA/WFSCeWlpEuXtwM/qKjz6t5/beDe6MqkpD9ExHNK9rklSpZGy070rUkjtjonelWg75m+6oOvGwgXp6x6qdOw9SRGpr2t0wNoKCXiiVL0+VR9ZmrYfW3Y+ikYTUCeOkvNQWqcLO33XfBdWo40cji/7QJGXx2NigXdP5p9pG90/iu3PJmko0izTx6W3R81G2inUKfRa5Xm89L3OpTDXiy47qRISNoGOJzUmPAZUmvv2sBykt4SEefkkvfbePVYRPxTWWOLUl/Mql+uOulV8n/R/SVORJQu7VVj3wsYmUCsNSQ9FXg6qe52C0Y+p1VJs+aVqdV9reyL2jGeL2yX0yR9C1g9uzp8O2kekgmlsf2+j44a/b5Jk2r9kjTSF9KkYt0Lbv83cGdE3JU911vJVU0UHLNR+j7O/0mSJmevbxdGz//fHU87DZJVk2H1Zei9SrJAfW6n9FyR9BukrjqrkWbR2jUiLldqtT2Z1De6o9/Gq99I+jjpC/tSUleeqk75ddJHyf9F923J8HLgbaR6yvyQ5gcp714G9buv5b+on6J574lamhSMhqyvft8R8ZGsHaBTQp8VEWd2JfsmzWa6bJq+qSbTSdwBY+eNz9L/13gyMZSqkqrSM9Bdeu7ss+gyQ9KNEbFR7rHCRoOml+rZpdg7gJeRSlm/JM0GV/gmdKWHNCDnO11pqi61pkZE1QopthhJel1ElA1CKUrfeCRknQavttPoVeknkxoCG1cRZt+3vSPipNy22lUT/aTvRxbfOtNJdHrQPBdYOV+HrrTizV4RMadr/0+R+nX3XY06rBJ3k9JzR75f7D+6HisMrE0v1SNNhHMWcFZELCxLp9FDzL+dXYZOA14g6f7ItXqPp6rBFg9J+0Ya0TpdoyeaAsonForUfW1ML6MeTze0qy4NYURenxr1+5a0Kmn+jqeTuvSem93/MGmq15NyyZtUTfSTvrGIuLxg2+8Lku4FnC5pn4i4TOmNOYZ0dbTjePIwrMA9ObJuRkoThV8OEBE3VXyonb6j+X6jZPd7zbJXKXvDDiWt4rFctu0J0jwnRTPg9RpiXthdyVqjM9q0u88zVARa5XoZkea8eTrp0nxgPTMaGviIvD417fd9AmmhkctIc+1/PEv7msjNO5RpOtNlPzNjDkVEzJH0GlK17nsZmXTuFbEkLl1Gbm5quuap7r4/ETfS1J7nwsjCBqQpLX9Jmk+kO/1VXfe/kfv/8onOv28Teq6Uzt9NmpxseUbPnz5mfmpSXXlnQYHuxQUeGGBef7u4368+8z0v9/8kUmNv4bJuWZptSG1X+UUdngtsOYj0Q3ydnTnSX0SabuAUUpVx7XnTy27DquNeoup9JV1D6i96T9f2aaR6qi26tld1Kbw1Ip41vNza4qSK2QQlXRERL+zUW2f1uVfHBHe/08jAnh1II4gHNiJvIrSx22w/NDLQC0Z3T6xcFLyOYU0ytaTV+07pDtoAEbFQUtGPyDCHmNuSraqCtmmvpGHJjwx+hJHGc0iBYYkO3PQ/pL5VovnCLbW1ej7uuqp+0Yse0xCHmNuSrUeJu1GvpGGTtH1E/LbXNlv6LCuBu1N1M+YhKqpuNIQh5rb4qcf83RFReCUq6cnRNVxd0gYRcfMQstlTSaFjqax2sNGGPgBnSdBv1U0WqB2slzLR//zdF0v6REScBiDpQ6QS+PMGlrkaJG0LbAdM6+rOuCq9uyfaUmCZCNxmA7IjMEvSXqQVgG4kzWUz0ZYndWWczOh1Mx9g/CMDbUA0Mn1woSifPrj3sZeFqhKzQcn6436MNGDsjRFx6WLMy/qR5uReMSIe6b2HTSSNnj54PVLfdZGmTrhjPI2XjVaXNluWSfo1abmwjUkrHX1VUtMlrwbpaZJuIFuzVNJmko5ejPmxnIiYkXX5+zVpoNTaEbEWsDtpTve+OXCb1feNiHhLRNwfaRGP7Zjg0XhdvkqaMOtegEjrO76kcg9bHLaJiJ937kTEL0jnTt9cx21WU0Sc1XX/cdIkaotNRNzZNY1Ez0WPbcL9WdL/MLJw+D7An8dzQJe4zWqS9KCkB7Lbo5KekLQ4S9x3StoOCElTlBYuWdzzlthYe5MmqTuTNDhqWratb26cNOtDNnHZq0mXwQNZ/LePPKwNHEmaf1qketMPRETVvPe2mEhaKbJpYMd9LAdus/4tC/Nt2/hkV0XfIc3XvZ6kzYB3R8R7+j2m67jNalLxqu2FCwsPOR+frHg4ImKx1rvbGEeQGpF/AqkRWWl1nr45cJvVl5/cqbNqe/eyZROh6HJ7JdIozrVYzA2mNtagG5EduM1qioj9urdJOojULW8i8/Hl3POvAnwA2I803/OXy/azxWZUIzLp8xpXI7LruM3GoWo2wSE/75qkBUL2IS3We2RE3DfR+bDeShqRDxzPkHeXuM3Gp3qBxWE8ofRF4LWkVcs3iYiHJjoP1sgGEbFPfoOk7YG+p991idtsHBZHiVvSv0lzxT/O6Olpl6qFCJYWw5h+1yVusx56zd89wdkhIjxwrgWGOf2uA7dZD+OYv9uWbUObftdVJWZmQ9SZfnegx3TgNjMbPElfjYiDJJ1NQVVbRPQ9BsBVJWZmw3FC9nfgc7a7xG1m1jIucZuZDVHWZ/swYH1SzO1023xm38d0idvMbHgk3QQcDMwhN0fJeKbfdYnbzGy4/p4tVzYwLnGbmQ2RpMNJA25+RBrxCkBEXN33MR24zcyGR9IFBZsjInbu+5gO3GZm7eI6bjOzIeianwTSIJx7gEsi4vbxHNuT1ZiZDccqXbdVScvd/ULSG8dzYFeVmJlNoGwRjF+PZ1pXl7jNzCZQtvLNuBbgcOA2M5tAknYCxrXMnBsnzcyGQNI8xs4KuCbwZ+At4zq267jNzAZP0vpdmwK4NyIeHvexHbjNzNrFddxmZi3jwG1m1jIO3GZmLePAbWbWMg7ctlSR9AlJN0u6RNLJkj4s6VmSzpE0R9LFkjbM0k6XdL6k6ySdJ2m9xZ1/szocuG2pIWkr4HXAZsCupHkhAGYB74+IFwAfBo7Otn8d+F5EbAqcBHxtYnNs1h93B7SlhqSDgDUi4tDs/leAvwH/D7g5l3SFiNhI0j3AOhHxL0lTgLsiYu0Jz7hZQx45aUu75YD7I2LzxZ0Rs0FxVYktTX4LvErSVEkrA7sDjwC3S9oLQMlmWfpLgc70mvsAF090hs364aoSW6pIOgx4E/BX4G7gHODXwDHAOsAU4JSI+HQ2JPk4YG1gIbBfRNyxOPJt1oQDty1VJK0cEQ9JWhG4CNh/PIuymi2JXMdtS5tZkp4HTCX1GHHQtqWOS9xmZi3jxkkzs5Zx4DYzaxkHbjOzlnHgNjNrGQduM7OWceA2M2uZ/w/KwQTdHcq76AAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df.plot(kind='bar', y='pop17')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The above shows the population as of 1 Jan 2017. \n", "\n", "We'll try to improve this in two ways:\n", "\n", "- we want to count population in millions. We can do this by dividing all the data by $10^6$.\n", "\n", "- it would be interesting to sort the countries in order of size for this plot.\n", "\n" ] }, { "cell_type": "code", "execution_count": 93, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df_millions = df / 1e6" ] }, { "cell_type": "code", "execution_count": 94, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 94, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df_millions['pop17'].sort_values(ascending=False).plot(kind='bar')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The example above selects one column from the data frame (`['pop17'`) and that returns a `Series` object. Then we sort this `Series` object using `sort_values()` according to the values (that's the number of poeple in each country), then we plot this." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Alternatively, we could also create a plot for the whole data frame, but say that the `pop17` is the column for sorting, and that we want to plot only the column with `pop17`:" ] }, { "cell_type": "code", "execution_count": 95, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 95, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df_millions.sort_values(by='pop17').plot(kind='bar', y='pop17')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also plot more than one column at the same time:" ] }, { "cell_type": "code", "execution_count": 96, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "ax = df_millions.sort_values(by='pop17').plot(kind='bar', y=['pop17', 'pop18'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also fine tune the plot with the usual `matplotlib` commands:" ] }, { "cell_type": "code", "execution_count": 97, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "ax = df_millions.sort_values(by='pop17').plot(kind='bar', y='pop17', figsize=(10, 4))\n", "ax.set_ylabel(\"population 2017 [in millions]\")\n", "ax.grid()\n", "ax.set_xlabel(None); # get rid of default label for x-axis ('geo')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Based on the number of births and deaths, we can compute change in population for each country for 2017. This is sometimes called the \"natural-change\":" ] }, { "cell_type": "code", "execution_count": 98, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df['natural-change'] = df['births'] - df['deaths']" ] }, { "cell_type": "code", "execution_count": 99, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "geo\n", "Italy -190910\n", "Germany -148000\n", "Romania -71125\n", "Bulgaria -45836\n", "Hungary -37231\n", "Greece -36007\n", "Spain -31245\n", "Portugal -23432\n", "Croatia -16921\n", "Lithuania -11446\n", "Latvia -7929\n", "Finland -3401\n", "Estonia -1759\n", "Poland -870\n", "Slovenia -268\n", "Malta 748\n", "Luxembourg 1911\n", "Czechia 2962\n", "Cyprus 3232\n", "Slovakia 4055\n", "Austria 4363\n", "Denmark 8136\n", "Belgium 10024\n", "Netherlands 19173\n", "Sweden 23444\n", "Ireland 31760\n", "United Kingdom 147871\n", "France 164550\n", "Name: natural-change, dtype: int64" ] }, "execution_count": 99, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['natural-change'].sort_values()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "From this, we can see that the population change due to births and deaths in Italy and Germany is decreasing most in absolute terms." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To relate this to the overall size of the population, one often uses rates per year and per 1000 people in the country, such as the birth rate per 1000 inhabitants [1] (and death rate accordingly):\n", "\n", "[1] https://en.wikipedia.org/wiki/Birth_rate" ] }, { "cell_type": "code", "execution_count": 100, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df['birth-rate'] = df['births'] / df['pop17'] * 1000\n", "df['death-rate'] = df['deaths'] / df['pop17'] * 1000\n", "df['natural-change-rate'] = df['natural-change'] / df['pop17'] * 1000" ] }, { "cell_type": "code", "execution_count": 101, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
pop17pop18birthsdeathsnatural-changebirth-ratedeath-ratenatural-change-rate
geo
Belgium11351727114130581196901096661002410.5437709.6607330.883037
Bulgaria7101859705003463955109791-458369.00538915.459473-6.454085
Czechia1057882010610055114405111443296210.81453310.5345400.279993
Denmark574876957811906139753261813610.6800269.2647661.415260
Germany8252165382850000785000933000-1480009.51265511.306123-1.793469
\n", "
" ], "text/plain": [ " pop17 pop18 births deaths natural-change birth-rate \\\n", "geo \n", "Belgium 11351727 11413058 119690 109666 10024 10.543770 \n", "Bulgaria 7101859 7050034 63955 109791 -45836 9.005389 \n", "Czechia 10578820 10610055 114405 111443 2962 10.814533 \n", "Denmark 5748769 5781190 61397 53261 8136 10.680026 \n", "Germany 82521653 82850000 785000 933000 -148000 9.512655 \n", "\n", " death-rate natural-change-rate \n", "geo \n", "Belgium 9.660733 0.883037 \n", "Bulgaria 15.459473 -6.454085 \n", "Czechia 10.534540 0.279993 \n", "Denmark 9.264766 1.415260 \n", "Germany 11.306123 -1.793469 " ] }, "execution_count": 101, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now look at the natural rate of change of population for each country, which is normalised by the population in that country." ] }, { "cell_type": "code", "execution_count": 102, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "ax = df.sort_values(by='natural-change-rate').plot(kind='bar', y='natural-change-rate', figsize=(10, 4))\n", "ax.set_title(\"Natural change due to births and deaths per 1000 in 2017\");" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can show the data together with the underlying birth and death rate data:" ] }, { "cell_type": "code", "execution_count": 103, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 103, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "tmp = df.sort_values(by='natural-change-rate')\n", "\n", "fig, axes = plt.subplots(2, 1, figsize=(12, 6))\n", "\n", "tmp.plot(kind='bar', y=['natural-change-rate'], sharex=True, ax=axes[0])\n", "axes[0].set_title(\"Population change per 1000 in 2017\")\n", "tmp.plot(kind='bar', y=['death-rate', 'birth-rate'], sharex=True, ax=axes[1])\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We haven't used the information we have about the population on 1 January 2018 yet. \n", "\n", "Let's first look at the absolute changes in the population based on the (census?) data from 1 Jan 2017 and 1 Jan 2018:\n" ] }, { "cell_type": "code", "execution_count": 104, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df['change'] = df['pop18'] - df['pop17']" ] }, { "cell_type": "code", "execution_count": 105, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "ax = df.sort_values(by='change').plot(y='change', kind='bar')\n", "ax.set_title(\"Total change in population per country in 2017\");" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With that information, we can estimate migration. (It is important to note that this estimated number will also absorb all inaccuracies or changes of the data gathering method, in the original data described as \"statistical adjustment\".)" ] }, { "cell_type": "code", "execution_count": 106, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df['migration'] = df['change'] - df['natural-change']" ] }, { "cell_type": "code", "execution_count": 107, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
pop17pop18birthsdeathsnatural-changebirth-ratedeath-ratenatural-change-ratechangemigration
geo
Belgium11351727114130581196901096661002410.5437709.6607330.8830376133151307
Bulgaria7101859705003463955109791-458369.00538915.459473-6.454085-51825-5989
Czechia1057882010610055114405111443296210.81453310.5345400.2799933123528273
Denmark574876957811906139753261813610.6800269.2647661.4152603242124285
Germany8252165382850000785000933000-1480009.51265511.306123-1.793469328347476347
\n", "
" ], "text/plain": [ " pop17 pop18 births deaths natural-change birth-rate \\\n", "geo \n", "Belgium 11351727 11413058 119690 109666 10024 10.543770 \n", "Bulgaria 7101859 7050034 63955 109791 -45836 9.005389 \n", "Czechia 10578820 10610055 114405 111443 2962 10.814533 \n", "Denmark 5748769 5781190 61397 53261 8136 10.680026 \n", "Germany 82521653 82850000 785000 933000 -148000 9.512655 \n", "\n", " death-rate natural-change-rate change migration \n", "geo \n", "Belgium 9.660733 0.883037 61331 51307 \n", "Bulgaria 15.459473 -6.454085 -51825 -5989 \n", "Czechia 10.534540 0.279993 31235 28273 \n", "Denmark 9.264766 1.415260 32421 24285 \n", "Germany 11.306123 -1.793469 328347 476347 " ] }, "execution_count": 107, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's plot the total change of the population per country in the top subfigure, and the contribution from natural changes and migration in the lower subfigure:" ] }, { "cell_type": "code", "execution_count": 108, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "tmp = df.sort_values(by='change')\n", "fig, axes = plt.subplots(2, 1, figsize=(12, 6))\n", "\n", "tmp.plot(kind='bar', y=['change'], sharex=True, ax=axes[0])\n", "axes[0].set_title(\"Population changes in 2017\")\n", "axes[0].legend(['total change of population (migration + natural change due to deaths and births'])\n", "tmp.plot(kind='bar', y=['migration', 'natural-change'], sharex=True, ax=axes[1])\n", "axes[1].legend(['Migration', \"natural change due to deaths and births\"])\n", "axes[1].set_xlabel(None);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Further reading\n", "\n", "There is a lot more to say about Pandas. The following resources may be useful but there are countless others available:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Further reading on `[]`, `.loc[]` and `.iloc[]` from Ted Petrou as a [Jupyter Notebook]( https://github.com/tdpetrou/Learn-Pandas/blob/master/Learn-Pandas/Selecting%20Subsets/01%20Selecting%20Subsets%20with%20%5B%20%5D%2C%20.loc%20and%20.iloc.ipynb) and [blog entry](https://medium.com/dunder-data/selecting-subsets-of-data-in-pandas-6fcd0170be9c).\n", "\n", "- Jake VanderPlas: Python Data Science Handbook [online](https://jakevdp.github.io/PythonDataScienceHandbook/)" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.7" } }, "nbformat": 4, "nbformat_minor": 2 }