"- Open in [IPython nbviewer](http://nbviewer.ipython.org/github/rasbt/python_reference/blob/master/tutorials/sorting_csvs.ipynb?create=1) \n",
"- Link to this [IPython notebook on Github](https://github.com/rasbt/python_reference/blob/master/tutorials/sorting_csvs.ipynb) \n",
"- Link to the GitHub Repository [`python_reference`](https://github.com/rasbt/python_reference)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<hr>\n",
"I am looking forward to comments or suggestions, please don't hesitate to contact me via\n",
"[twitter](https://twitter.com/rasbt), [email](mailto:bluewoodtree@gmail.com), or [google+](https://plus.google.com/118404394130788869227).\n",
"<hr>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Sorting CSV files using the Python `csv` module"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<br>\n",
"I wanted to summarize a way to sort CSV files by just using the [`csv` module](https://docs.python.org/3.4/library/csv.html) and other standard library Python modules \n",
"(you probably also want to consider using the [pandas](http://pandas.pydata.org) library if you are working with very large CSV files - I am planning to make this a separate topic)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<br>\n",
"<br>\n",
"<hr>\n",
"## Sections\n",
"- [Reading in a CSV file](#reading)\n",
"- [Printing the CSV file contents](#printing)\n",
"- [Marking min/max values in particular columns](#marking)\n",
"- [Writing out the modified table to as a new CSV file](#writing)\n",
"<hr>\n",
"<br>\n",
"<br>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Objective:\n",
"\n",
"Let us assume that we have an [example CSV](../Data/test.csv) file formatted like this:\n",
" \n",
"<pre>name,column1,column2,column3\n",
"abc,1.1,4.2,1.2\n",
"def,2.1,1.4,5.2\n",
"ghi,1.5,1.2,2.1\n",
"jkl,1.8,1.1,4.2\n",
"mno,9.4,6.6,6.2\n",
"pqr,1.4,8.3,8.4</pre>\n",
"\n",
"And we want to sort particular columns and eventually mark min- of max-values in the table.\n",
"<br>\n",
"<br>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name='sections'></a>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name='reading'></a>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##Reading in a CSV file"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[[back to top](#sections)]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Because we will be iterating over our CSV file a couple of times, let us read in the CSV file using the `csv` module and hold the contents in memory using a Python list object (note: be careful with very large CSV files and possible memory issues associated with this approach).\n"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import csv\n",
"\n",
"def csv_to_list(csv_file, delimiter=','):\n",
" \"\"\" \n",
" Reads in a CSV file and returns the contents as list,\n",
" where every row is stored as a sublist, and each element\n",
" in the sublist represents 1 cell in the table.\n",
"To avoid problems with the sorting approach that can occur when we have negative values in some cells, let us define a function that converts all numeric cells into float values."
"Using the very handy [`operator.itemgetter`](https://docs.python.org/3.4/library/operator.html#operator.itemgetter) function, we define a function that returns a CSV file contents sorted by a particular column (column index or column name)."
"To visualize minimum and maximum values in certain columns if find it quite useful to add little symbols to the cells (most people like to highlight cells with colors in e.g., Excel spreadsheets, but CSV doesn't support colors, so this is my workaround - please let me know if you figured out a better approach, I would be looking forward to your suggestion)."