python_reference/tutorials/sorting_csvs.ipynb

{
 "metadata": {
  "name": "",
  "signature": "sha256:13c10e46fea51bcbbf203261897bb4d8ea2087e39925e077d452778f78d62b1e"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "[Sebastian Raschka](http://sebastianraschka.com)  \n",
      "last updated: 05/13/2014\n",
      "\n",
      "- Open in [IPython nbviewer](http://nbviewer.ipython.org/github/rasbt/python_reference/blob/master/tutorials/sorting_csvs.ipynb?create=1)  \n",
      "- Link to this [IPython notebook on Github](https://github.com/rasbt/python_reference/blob/master/tutorials/sorting_csvs.ipynb)  \n",
      "- Link to the GitHub Repository [`python_reference`](https://github.com/rasbt/python_reference)\n"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<hr>\n",
      "I am looking forward to comments or suggestions, please don't hesitate to contact me via\n",
      "[twitter](https://twitter.com/rasbt), [email](mailto:bluewoodtree@gmail.com), or [google+](https://plus.google.com/118404394130788869227).\n",
      "<hr>"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "# Sorting CSV files using the Python `csv` module"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<br>\n",
      "I wanted to summarize a way to sort CSV files by just using the [`csv` module](https://docs.python.org/3.4/library/csv.html) and other standard library Python modules   \n",
      "(you probably also want to consider using the [pandas](http://pandas.pydata.org) library if you are working with very large CSV files - I am planning to make this a separate topic)."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<br>\n",
      "<br>\n",
      "<hr>\n",
      "## Sections\n",
      "- [Reading in a CSV file](#reading)\n",
      "- [Printing the CSV file contents](#printing)\n",
      "- [Converting numeric cells to floats](#floats)\n",
      "- [Sorting the CSV file](#sorting)\n",
      "- [Marking min/max values in particular columns](#marking)\n",
      "- [Writing out the modified table to as a new CSV file](#writing)\n",
      "<hr>\n",
      "<br>\n",
      "<br>"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### Objective:\n",
      "\n",
      "Let us assume that we have an [example CSV](../Data/test.csv) file formatted like this:\n",
      "    \n",
      "<pre>name,column1,column2,column3\n",
      "abc,1.1,4.2,1.2\n",
      "def,2.1,1.4,5.2\n",
      "ghi,1.5,1.2,2.1\n",
      "jkl,1.8,1.1,4.2\n",
      "mno,9.4,6.6,6.2\n",
      "pqr,1.4,8.3,8.4</pre>\n",
      "\n",
      "And we want to sort particular columns and eventually mark min- of max-values in the table.\n",
      "<br>\n",
      "<br>"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<a name='sections'></a>"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<a name='reading'></a>"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "##Reading in a CSV file"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "[[back to top](#sections)]"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Because we will be iterating over our CSV file a couple of times, let us read in the CSV file using the `csv` module and hold the contents in memory using a Python list object (note: be careful with very large CSV files and possible memory issues associated with this approach).\n"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import csv\n",
      "\n",
      "def csv_to_list(csv_file, delimiter=','):\n",
      "    \"\"\" \n",
      "    Reads in a CSV file and returns the contents as list,\n",
      "    where every row is stored as a sublist, and each element\n",
      "    in the sublist represents 1 cell in the table.\n",
      "    \n",
      "    \"\"\"\n",
      "    with open(csv_file, 'r') as csv_con:\n",
      "        reader = csv.reader(csv_con, delimiter=delimiter)\n",
      "        return list(reader)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 4
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "csv_cont = csv_to_list('../Data/test.csv')\n",
      "\n",
      "print('first 3 rows:')\n",
      "for row in range(3):\n",
      "    print(csv_cont[row])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "first 3 rows:\n",
        "['name', 'column1', 'column2', 'column3']\n",
        "['abc', '1.1', '4.2', '1.2']\n",
        "['def', '2.1', '1.4', '5.2']\n"
       ]
      }
     ],
     "prompt_number": 5
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<a name='printing'></a>\n",
      "<br>\n",
      "<br>"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "##Printing the CSV file contents"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "[[back to top](#sections)]"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Also, let us define a short function that prints out the CSV file to the standard output screen in a slightly prettier format:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def print_csv(csv_content):\n",
      "    \"\"\" Prints CSV file to standard output.\"\"\"\n",
      "    print(50*'-')\n",
      "    for row in csv_content:\n",
      "        print('\\t'.join(row))\n",
      "    print(50*'-')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 6
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "csv_cont = csv_to_list('../Data/test.csv')\n",
      "\n",
      "print('\\n\\nOriginal CSV file:')\n",
      "print_csv(csv_cont)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "\n",
        "Original CSV file:\n",
        "--------------------------------------------------\n",
        "name\tcolumn1\tcolumn2\tcolumn3\n",
        "abc\t1.1\t4.2\t1.2\n",
        "def\t2.1\t1.4\t5.2\n",
        "ghi\t1.5\t1.2\t2.1\n",
        "jkl\t1.8\t1.1\t4.2\n",
        "mno\t9.4\t6.6\t6.2\n",
        "pqr\t1.4\t8.3\t8.4\n",
        "--------------------------------------------------\n"
       ]
      }
     ],
     "prompt_number": 7
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<a name='floats'></a>\n",
      "<br>\n",
      "<br>"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## Converting numeric cells to floats"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "To avoid problems with the sorting approach that can occur when we have negative values in some cells, let us define a function that converts all numeric cells into float values."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def convert_cells_to_floats(csv_cont):\n",
      "    \"\"\" \n",
      "    Converts cells to floats if possible\n",
      "    (modifies input CSV content list).\n",
      "    \n",
      "    \"\"\"\n",
      "    for row in range(len(csv_cont)):\n",
      "        for cell in range(len(csv_cont[row])):\n",
      "            try:\n",
      "                csv_cont[row][cell] = float(csv_cont[row][cell])\n",
      "            except ValueError:\n",
      "                pass "
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 8
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print('first 3 rows:')\n",
      "for row in range(3):\n",
      "    print(csv_cont[row])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "first 3 rows:\n",
        "['name', 'column1', 'column2', 'column3']\n",
        "['abc', 1.1, 4.2, 1.2]\n",
        "['def', 2.1, 1.4, 5.2]\n"
       ]
      }
     ],
     "prompt_number": 9
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<a name='sorting'></a>\n",
      "<br>\n",
      "<br>"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "##Sorting the CSV file"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "[[back to top](#sections)]"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Using the very handy [`operator.itemgetter`](https://docs.python.org/3.4/library/operator.html#operator.itemgetter) function, we define a function that returns a CSV file contents sorted by a particular column (column index or column name)."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import operator\n",
      "\n",
      "def sort_by_column(csv_cont, col, reverse=False):\n",
      "    \"\"\" \n",
      "    Sorts CSV contents by column name (if col argument is type <str>) \n",
      "    or column index (if col argument is type <int>). \n",
      "    \n",
      "    \"\"\"\n",
      "    header = csv_cont[0]\n",
      "    body = csv_cont[1:]\n",
      "    if isinstance(col, str):  \n",
      "        col_index = header.index(col)\n",
      "    else:\n",
      "        col_index = col\n",
      "    body = sorted(body, \n",
      "           key=operator.itemgetter(col_index), \n",
      "           reverse=reverse)\n",
      "    body.insert(0, header)\n",
      "    return body"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 10
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "To see how (and if) it works, let us sort the CSV file in [../Data/test.csv](../Data/test.csv) by the column name \"column3\"."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "csv_cont = csv_to_list('../Data/test.csv')\n",
      "\n",
      "print('\\n\\nOriginal CSV file:')\n",
      "print_csv(csv_cont)\n",
      "\n",
      "print('\\n\\nCSV sorted by column \"column3\":')\n",
      "csv_sorted = sort_by_column(csv_cont, 'column3')\n",
      "print_csv(csv_sorted)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "\n",
        "Original CSV file:\n",
        "--------------------------------------------------\n",
        "name\tcolumn1\tcolumn2\tcolumn3\n",
        "abc\t1.1\t4.2\t1.2\n",
        "def\t2.1\t1.4\t5.2\n",
        "ghi\t1.5\t1.2\t2.1\n",
        "jkl\t1.8\t1.1\t4.2\n",
        "mno\t9.4\t6.6\t6.2\n",
        "pqr\t1.4\t8.3\t8.4\n",
        "--------------------------------------------------\n",
        "\n",
        "\n",
        "CSV sorted by column \"column3\":\n",
        "--------------------------------------------------\n",
        "name\tcolumn1\tcolumn2\tcolumn3\n",
        "abc\t1.1\t4.2\t1.2\n",
        "ghi\t1.5\t1.2\t2.1\n",
        "jkl\t1.8\t1.1\t4.2\n",
        "def\t2.1\t1.4\t5.2\n",
        "mno\t9.4\t6.6\t6.2\n",
        "pqr\t1.4\t8.3\t8.4\n",
        "--------------------------------------------------\n"
       ]
      }
     ],
     "prompt_number": 11
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<br>\n",
      "<br>"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<a name='marking'></a>\n",
      "<br>\n",
      "<br>"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## Marking min/max values in particular columns"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "[[back to top](#sections)]"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "To visualize minimum and maximum values in certain columns if find it quite useful to add little symbols to the cells (most people like to highlight cells with colors in e.g., Excel spreadsheets, but CSV doesn't support colors, so this is my workaround - please let me know if you figured out a better approach, I would be looking forward to your suggestion)."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def mark_minmax(csv_cont, col, mark_max=True, marker='*'):\n",
      "    \"\"\"\n",
      "    Sorts a list of CSV contents by a particular column \n",
      "    (see sort_by_column function).\n",
      "    Puts a marker on the maximum value if mark_max=True,\n",
      "    or puts a marker on the minimum value mark_max=False\n",
      "    (modifies input CSV content list).\n",
      "    \n",
      "    \"\"\"\n",
      "    \n",
      "    sorted_csv = sort_by_column(csv_cont, col, reverse=mark_max)\n",
      "    if isinstance(col, str):  \n",
      "        col_index = sorted_csv[0].index(col)\n",
      "    else:\n",
      "        col_index = col\n",
      "    sorted_csv[1][col_index] += marker\n",
      "    return None"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 12
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def mark_all_col(csv_cont, mark_max=True, marker='*'):\n",
      "    \"\"\"\n",
      "    Marks all maximum (if mark_max=True) or minimum (if mark_max=False)\n",
      "    values in all columns of a CSV contents list - except the first column.\n",
      "    Returns a new list that is sorted by the names in the first column\n",
      "    (modifies input CSV content list).\n",
      "    \n",
      "    \"\"\"\n",
      "    for c in csv_cont[0][1:]:\n",
      "        mark_minmax(csv_cont, c, mark_max, marker)\n",
      "    marked_csv = sort_by_column(csv_cont, 0, False)\n",
      "    return marked_csv"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 13
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import copy\n",
      "csv_cont = csv_to_list('../Data/test.csv')\n",
      "\n",
      "csv_marked = copy.deepcopy(csv_cont)\n",
      "mark_all_col(csv_marked, mark_max=True, marker='*')\n",
      "mark_all_col(csv_marked, mark_max=False, marker='^')\n",
      "print_csv(csv_marked)\n",
      "\n",
      "print('^: min-value\\n*: max-value')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "--------------------------------------------------\n",
        "name\tcolumn1\tcolumn2\tcolumn3\n",
        "abc\t1.1^\t4.2\t1.2^\n",
        "def\t2.1\t1.4\t5.2\n",
        "ghi\t1.5\t1.2\t2.1\n",
        "jkl\t1.8\t1.1^\t4.2\n",
        "mno\t9.4*\t6.6\t6.2\n",
        "pqr\t1.4\t8.3*\t8.4*\n",
        "--------------------------------------------------\n",
        "^: min-value\n",
        "*: max-value\n"
       ]
      }
     ],
     "prompt_number": 14
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<a name='writing'></a>\n",
      "<br>\n",
      "<br>"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## Writing out the modified table to as a new CSV file"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "[[back to top](#sections)]"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "After the sorting and maybe marking of minimum and maximum values, we likely want to write out the modified data table as CSV file again."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def write_csv(dest, csv_cont):\n",
      "    \"\"\" Writes a comma-delimited CSV file. \"\"\"\n",
      "\n",
      "    with open(dest, 'w') as out_file:\n",
      "        writer = csv.writer(out_file, delimiter=',')\n",
      "        for row in csv_cont:\n",
      "            writer.writerow(row)\n",
      "\n",
      "write_csv('../Data/test_marked.csv', csv_marked)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 15
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Let us read in the written CSV file to confirm that the formatting is correct:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "csv_cont = csv_to_list('../Data/test_marked.csv')\n",
      "\n",
      "print('\\n\\nWritten CSV file:')\n",
      "print_csv(csv_cont)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "\n",
        "Written CSV file:\n",
        "--------------------------------------------------\n",
        "name\tcolumn1\tcolumn2\tcolumn3\n",
        "abc\t1.1^\t4.2\t1.2^\n",
        "def\t2.1\t1.4\t5.2\n",
        "ghi\t1.5\t1.2\t2.1\n",
        "jkl\t1.8\t1.1^\t4.2\n",
        "mno\t9.4*\t6.6\t6.2\n",
        "pqr\t1.4\t8.3*\t8.4*\n",
        "--------------------------------------------------\n"
       ]
      }
     ],
     "prompt_number": 16
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [],
     "language": "python",
     "metadata": {},
     "outputs": []
    }
   ],
   "metadata": {}
  }
 ]
}
sorting csvs 2014-05-14 02:41:40 +00:00			`{`
			`"metadata": {`
			`"name": "",`
added conversion to float 2014-05-14 05:34:47 +00:00			`"signature": "sha256:13c10e46fea51bcbbf203261897bb4d8ea2087e39925e077d452778f78d62b1e"`
sorting csvs 2014-05-14 02:41:40 +00:00			`},`
			`"nbformat": 3,`
			`"nbformat_minor": 0,`
			`"worksheets": [`
			`{`
			`"cells": [`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"[Sebastian Raschka](http://sebastianraschka.com) \n",`
			`"last updated: 05/13/2014\n",`
			`"\n",`
			`"- Open in [IPython nbviewer](http://nbviewer.ipython.org/github/rasbt/python_reference/blob/master/tutorials/sorting_csvs.ipynb?create=1) \n",`
			`"- Link to this [IPython notebook on Github](https://github.com/rasbt/python_reference/blob/master/tutorials/sorting_csvs.ipynb) \n",`
			"- Link to the GitHub Repository [`python_reference`](https://github.com/rasbt/python_reference)\n"
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"<hr>\n",`
			`"I am looking forward to comments or suggestions, please don't hesitate to contact me via\n",`
			`"[twitter](https://twitter.com/rasbt), [email](mailto:bluewoodtree@gmail.com), or [google+](https://plus.google.com/118404394130788869227).\n",`
			`"<hr>"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			"# Sorting CSV files using the Python `csv` module"
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"<br>\n",`
			"I wanted to summarize a way to sort CSV files by just using the [`csv` module](https://docs.python.org/3.4/library/csv.html) and other standard library Python modules \n",
			`"(you probably also want to consider using the [pandas](http://pandas.pydata.org) library if you are working with very large CSV files - I am planning to make this a separate topic)."`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"<br>\n",`
			`"<br>\n",`
			`"<hr>\n",`
			`"## Sections\n",`
			`"- [Reading in a CSV file](#reading)\n",`
			`"- [Printing the CSV file contents](#printing)\n",`
added conversion to float 2014-05-14 05:34:47 +00:00			`"- [Converting numeric cells to floats](#floats)\n",`
sorting csvs 2014-05-14 02:41:40 +00:00			`"- [Sorting the CSV file](#sorting)\n",`
			`"- [Marking min/max values in particular columns](#marking)\n",`
			`"- [Writing out the modified table to as a new CSV file](#writing)\n",`
			`"<hr>\n",`
			`"<br>\n",`
			`"<br>"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"### Objective:\n",`
			`"\n",`
			`"Let us assume that we have an [example CSV](../Data/test.csv) file formatted like this:\n",`
			`" \n",`
			`"<pre>name,column1,column2,column3\n",`
			`"abc,1.1,4.2,1.2\n",`
			`"def,2.1,1.4,5.2\n",`
			`"ghi,1.5,1.2,2.1\n",`
			`"jkl,1.8,1.1,4.2\n",`
			`"mno,9.4,6.6,6.2\n",`
			`"pqr,1.4,8.3,8.4</pre>\n",`
			`"\n",`
			`"And we want to sort particular columns and eventually mark min- of max-values in the table.\n",`
			`"<br>\n",`
			`"<br>"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"<a name='sections'></a>"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"<a name='reading'></a>"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"##Reading in a CSV file"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"[[back to top](#sections)]"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			"Because we will be iterating over our CSV file a couple of times, let us read in the CSV file using the `csv` module and hold the contents in memory using a Python list object (note: be careful with very large CSV files and possible memory issues associated with this approach).\n"
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"collapsed": false,`
			`"input": [`
			`"import csv\n",`
			`"\n",`
			`"def csv_to_list(csv_file, delimiter=','):\n",`
			`" \"\"\" \n",`
			`" Reads in a CSV file and returns the contents as list,\n",`
			`" where every row is stored as a sublist, and each element\n",`
			`" in the sublist represents 1 cell in the table.\n",`
			`" \n",`
			`" \"\"\"\n",`
			`" with open(csv_file, 'r') as csv_con:\n",`
			`" reader = csv.reader(csv_con, delimiter=delimiter)\n",`
datafiles 2014-05-14 03:44:33 +00:00			`" return list(reader)"`
sorting csvs 2014-05-14 02:41:40 +00:00			`],`
			`"language": "python",`
			`"metadata": {},`
			`"outputs": [],`
added conversion to float 2014-05-14 05:34:47 +00:00			`"prompt_number": 4`
sorting csvs 2014-05-14 02:41:40 +00:00			`},`
			`{`
			`"cell_type": "code",`
			`"collapsed": false,`
			`"input": [`
			`"csv_cont = csv_to_list('../Data/test.csv')\n",`
			`"\n",`
			`"print('first 3 rows:')\n",`
			`"for row in range(3):\n",`
			`" print(csv_cont[row])"`
			`],`
			`"language": "python",`
			`"metadata": {},`
			`"outputs": [`
			`{`
			`"output_type": "stream",`
			`"stream": "stdout",`
			`"text": [`
			`"first 3 rows:\n",`
			`"['name', 'column1', 'column2', 'column3']\n",`
			`"['abc', '1.1', '4.2', '1.2']\n",`
			`"['def', '2.1', '1.4', '5.2']\n"`
			`]`
			`}`
			`],`
added conversion to float 2014-05-14 05:34:47 +00:00			`"prompt_number": 5`
sorting csvs 2014-05-14 02:41:40 +00:00			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"<a name='printing'></a>\n",`
			`"<br>\n",`
			`"<br>"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"##Printing the CSV file contents"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"[[back to top](#sections)]"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"Also, let us define a short function that prints out the CSV file to the standard output screen in a slightly prettier format:"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"collapsed": false,`
			`"input": [`
			`"def print_csv(csv_content):\n",`
			`" \"\"\" Prints CSV file to standard output.\"\"\"\n",`
			`" print(50*'-')\n",`
			`" for row in csv_content:\n",`
			`" print('\\t'.join(row))\n",`
			`" print(50*'-')"`
			`],`
			`"language": "python",`
			`"metadata": {},`
			`"outputs": [],`
added conversion to float 2014-05-14 05:34:47 +00:00			`"prompt_number": 6`
sorting csvs 2014-05-14 02:41:40 +00:00			`},`
			`{`
			`"cell_type": "code",`
			`"collapsed": false,`
			`"input": [`
			`"csv_cont = csv_to_list('../Data/test.csv')\n",`
			`"\n",`
			`"print('\\n\\nOriginal CSV file:')\n",`
			`"print_csv(csv_cont)"`
			`],`
			`"language": "python",`
			`"metadata": {},`
			`"outputs": [`
			`{`
			`"output_type": "stream",`
			`"stream": "stdout",`
			`"text": [`
			`"\n",`
			`"\n",`
			`"Original CSV file:\n",`
			`"--------------------------------------------------\n",`
			`"name\tcolumn1\tcolumn2\tcolumn3\n",`
			`"abc\t1.1\t4.2\t1.2\n",`
			`"def\t2.1\t1.4\t5.2\n",`
			`"ghi\t1.5\t1.2\t2.1\n",`
			`"jkl\t1.8\t1.1\t4.2\n",`
			`"mno\t9.4\t6.6\t6.2\n",`
			`"pqr\t1.4\t8.3\t8.4\n",`
			`"--------------------------------------------------\n"`
			`]`
			`}`
			`],`
added conversion to float 2014-05-14 05:34:47 +00:00			`"prompt_number": 7`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"<a name='floats'></a>\n",`
			`"<br>\n",`
			`"<br>"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"## Converting numeric cells to floats"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"To avoid problems with the sorting approach that can occur when we have negative values in some cells, let us define a function that converts all numeric cells into float values."`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"collapsed": false,`
			`"input": [`
			`"def convert_cells_to_floats(csv_cont):\n",`
			`" \"\"\" \n",`
			`" Converts cells to floats if possible\n",`
			`" (modifies input CSV content list).\n",`
			`" \n",`
			`" \"\"\"\n",`
			`" for row in range(len(csv_cont)):\n",`
			`" for cell in range(len(csv_cont[row])):\n",`
			`" try:\n",`
			`" csv_cont[row][cell] = float(csv_cont[row][cell])\n",`
			`" except ValueError:\n",`
			`" pass "`
			`],`
			`"language": "python",`
			`"metadata": {},`
			`"outputs": [],`
			`"prompt_number": 8`
			`},`
			`{`
			`"cell_type": "code",`
			`"collapsed": false,`
			`"input": [`
			`"print('first 3 rows:')\n",`
			`"for row in range(3):\n",`
			`" print(csv_cont[row])"`
			`],`
			`"language": "python",`
			`"metadata": {},`
			`"outputs": [`
			`{`
			`"output_type": "stream",`
			`"stream": "stdout",`
			`"text": [`
			`"first 3 rows:\n",`
			`"['name', 'column1', 'column2', 'column3']\n",`
			`"['abc', 1.1, 4.2, 1.2]\n",`
			`"['def', 2.1, 1.4, 5.2]\n"`
			`]`
			`}`
			`],`
			`"prompt_number": 9`
sorting csvs 2014-05-14 02:41:40 +00:00			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"<a name='sorting'></a>\n",`
			`"<br>\n",`
			`"<br>"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"##Sorting the CSV file"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"[[back to top](#sections)]"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			"Using the very handy [`operator.itemgetter`](https://docs.python.org/3.4/library/operator.html#operator.itemgetter) function, we define a function that returns a CSV file contents sorted by a particular column (column index or column name)."
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"collapsed": false,`
			`"input": [`
			`"import operator\n",`
			`"\n",`
			`"def sort_by_column(csv_cont, col, reverse=False):\n",`
			`" \"\"\" \n",`
			`" Sorts CSV contents by column name (if col argument is type <str>) \n",`
			`" or column index (if col argument is type <int>). \n",`
			`" \n",`
			`" \"\"\"\n",`
			`" header = csv_cont[0]\n",`
			`" body = csv_cont[1:]\n",`
			`" if isinstance(col, str): \n",`
			`" col_index = header.index(col)\n",`
			`" else:\n",`
			`" col_index = col\n",`
			`" body = sorted(body, \n",`
			`" key=operator.itemgetter(col_index), \n",`
			`" reverse=reverse)\n",`
			`" body.insert(0, header)\n",`
			`" return body"`
			`],`
			`"language": "python",`
			`"metadata": {},`
			`"outputs": [],`
added conversion to float 2014-05-14 05:34:47 +00:00			`"prompt_number": 10`
sorting csvs 2014-05-14 02:41:40 +00:00			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"To see how (and if) it works, let us sort the CSV file in [../Data/test.csv](../Data/test.csv) by the column name \"column3\"."`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"collapsed": false,`
			`"input": [`
			`"csv_cont = csv_to_list('../Data/test.csv')\n",`
			`"\n",`
			`"print('\\n\\nOriginal CSV file:')\n",`
			`"print_csv(csv_cont)\n",`
			`"\n",`
			`"print('\\n\\nCSV sorted by column \"column3\":')\n",`
			`"csv_sorted = sort_by_column(csv_cont, 'column3')\n",`
			`"print_csv(csv_sorted)"`
			`],`
			`"language": "python",`
			`"metadata": {},`
			`"outputs": [`
			`{`
			`"output_type": "stream",`
			`"stream": "stdout",`
			`"text": [`
			`"\n",`
			`"\n",`
			`"Original CSV file:\n",`
			`"--------------------------------------------------\n",`
			`"name\tcolumn1\tcolumn2\tcolumn3\n",`
			`"abc\t1.1\t4.2\t1.2\n",`
			`"def\t2.1\t1.4\t5.2\n",`
			`"ghi\t1.5\t1.2\t2.1\n",`
			`"jkl\t1.8\t1.1\t4.2\n",`
			`"mno\t9.4\t6.6\t6.2\n",`
			`"pqr\t1.4\t8.3\t8.4\n",`
			`"--------------------------------------------------\n",`
			`"\n",`
			`"\n",`
			`"CSV sorted by column \"column3\":\n",`
			`"--------------------------------------------------\n",`
			`"name\tcolumn1\tcolumn2\tcolumn3\n",`
			`"abc\t1.1\t4.2\t1.2\n",`
			`"ghi\t1.5\t1.2\t2.1\n",`
			`"jkl\t1.8\t1.1\t4.2\n",`
			`"def\t2.1\t1.4\t5.2\n",`
			`"mno\t9.4\t6.6\t6.2\n",`
			`"pqr\t1.4\t8.3\t8.4\n",`
			`"--------------------------------------------------\n"`
			`]`
			`}`
			`],`
added conversion to float 2014-05-14 05:34:47 +00:00			`"prompt_number": 11`
sorting csvs 2014-05-14 02:41:40 +00:00			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"<br>\n",`
			`"<br>"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"<a name='marking'></a>\n",`
			`"<br>\n",`
			`"<br>"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"## Marking min/max values in particular columns"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"[[back to top](#sections)]"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"To visualize minimum and maximum values in certain columns if find it quite useful to add little symbols to the cells (most people like to highlight cells with colors in e.g., Excel spreadsheets, but CSV doesn't support colors, so this is my workaround - please let me know if you figured out a better approach, I would be looking forward to your suggestion)."`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"collapsed": false,`
			`"input": [`
			`"def mark_minmax(csv_cont, col, mark_max=True, marker='*'):\n",`
			`" \"\"\"\n",`
			`" Sorts a list of CSV contents by a particular column \n",`
			`" (see sort_by_column function).\n",`
			`" Puts a marker on the maximum value if mark_max=True,\n",`
			`" or puts a marker on the minimum value mark_max=False\n",`
			`" (modifies input CSV content list).\n",`
			`" \n",`
			`" \"\"\"\n",`
			`" \n",`
			`" sorted_csv = sort_by_column(csv_cont, col, reverse=mark_max)\n",`
			`" if isinstance(col, str): \n",`
			`" col_index = sorted_csv[0].index(col)\n",`
			`" else:\n",`
			`" col_index = col\n",`
			`" sorted_csv[1][col_index] += marker\n",`
			`" return None"`
			`],`
			`"language": "python",`
			`"metadata": {},`
			`"outputs": [],`
added conversion to float 2014-05-14 05:34:47 +00:00			`"prompt_number": 12`
sorting csvs 2014-05-14 02:41:40 +00:00			`},`
			`{`
			`"cell_type": "code",`
			`"collapsed": false,`
			`"input": [`
			`"def mark_all_col(csv_cont, mark_max=True, marker='*'):\n",`
			`" \"\"\"\n",`
			`" Marks all maximum (if mark_max=True) or minimum (if mark_max=False)\n",`
			`" values in all columns of a CSV contents list - except the first column.\n",`
			`" Returns a new list that is sorted by the names in the first column\n",`
			`" (modifies input CSV content list).\n",`
			`" \n",`
			`" \"\"\"\n",`
			`" for c in csv_cont[0][1:]:\n",`
			`" mark_minmax(csv_cont, c, mark_max, marker)\n",`
			`" marked_csv = sort_by_column(csv_cont, 0, False)\n",`
			`" return marked_csv"`
			`],`
			`"language": "python",`
			`"metadata": {},`
			`"outputs": [],`
added conversion to float 2014-05-14 05:34:47 +00:00			`"prompt_number": 13`
sorting csvs 2014-05-14 02:41:40 +00:00			`},`
			`{`
			`"cell_type": "code",`
			`"collapsed": false,`
			`"input": [`
			`"import copy\n",`
			`"csv_cont = csv_to_list('../Data/test.csv')\n",`
			`"\n",`
			`"csv_marked = copy.deepcopy(csv_cont)\n",`
			`"mark_all_col(csv_marked, mark_max=True, marker='*')\n",`
			`"mark_all_col(csv_marked, mark_max=False, marker='^')\n",`
			`"print_csv(csv_marked)\n",`
			`"\n",`
			`"print('^: min-value\\n*: max-value')"`
			`],`
			`"language": "python",`
			`"metadata": {},`
			`"outputs": [`
			`{`
			`"output_type": "stream",`
			`"stream": "stdout",`
			`"text": [`
			`"--------------------------------------------------\n",`
			`"name\tcolumn1\tcolumn2\tcolumn3\n",`
			`"abc\t1.1^\t4.2\t1.2^\n",`
			`"def\t2.1\t1.4\t5.2\n",`
			`"ghi\t1.5\t1.2\t2.1\n",`
			`"jkl\t1.8\t1.1^\t4.2\n",`
			`"mno\t9.4*\t6.6\t6.2\n",`
			`"pqr\t1.4\t8.3\t8.4\n",`
			`"--------------------------------------------------\n",`
			`"^: min-value\n",`
			`"*: max-value\n"`
			`]`
			`}`
			`],`
added conversion to float 2014-05-14 05:34:47 +00:00			`"prompt_number": 14`
sorting csvs 2014-05-14 02:41:40 +00:00			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"<a name='writing'></a>\n",`
			`"<br>\n",`
			`"<br>"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"## Writing out the modified table to as a new CSV file"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"[[back to top](#sections)]"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"After the sorting and maybe marking of minimum and maximum values, we likely want to write out the modified data table as CSV file again."`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"collapsed": false,`
			`"input": [`
			`"def write_csv(dest, csv_cont):\n",`
			`" \"\"\" Writes a comma-delimited CSV file. \"\"\"\n",`
			`"\n",`
			`" with open(dest, 'w') as out_file:\n",`
			`" writer = csv.writer(out_file, delimiter=',')\n",`
			`" for row in csv_cont:\n",`
			`" writer.writerow(row)\n",`
			`"\n",`
			`"write_csv('../Data/test_marked.csv', csv_marked)"`
			`],`
			`"language": "python",`
			`"metadata": {},`
			`"outputs": [],`
added conversion to float 2014-05-14 05:34:47 +00:00			`"prompt_number": 15`
sorting csvs 2014-05-14 02:41:40 +00:00			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"Let us read in the written CSV file to confirm that the formatting is correct:"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"collapsed": false,`
			`"input": [`
			`"csv_cont = csv_to_list('../Data/test_marked.csv')\n",`
			`"\n",`
			`"print('\\n\\nWritten CSV file:')\n",`
			`"print_csv(csv_cont)"`
			`],`
			`"language": "python",`
			`"metadata": {},`
			`"outputs": [`
			`{`
			`"output_type": "stream",`
			`"stream": "stdout",`
			`"text": [`
			`"\n",`
			`"\n",`
			`"Written CSV file:\n",`
			`"--------------------------------------------------\n",`
			`"name\tcolumn1\tcolumn2\tcolumn3\n",`
			`"abc\t1.1^\t4.2\t1.2^\n",`
			`"def\t2.1\t1.4\t5.2\n",`
			`"ghi\t1.5\t1.2\t2.1\n",`
			`"jkl\t1.8\t1.1^\t4.2\n",`
			`"mno\t9.4*\t6.6\t6.2\n",`
			`"pqr\t1.4\t8.3\t8.4\n",`
			`"--------------------------------------------------\n"`
			`]`
			`}`
			`],`
added conversion to float 2014-05-14 05:34:47 +00:00			`"prompt_number": 16`
			`},`
			`{`
			`"cell_type": "code",`
			`"collapsed": false,`
			`"input": [],`
			`"language": "python",`
			`"metadata": {},`
			`"outputs": []`
sorting csvs 2014-05-14 02:41:40 +00:00			`}`
			`],`
			`"metadata": {}`
			`}`
			`]`
			`}`