batch processing csvs

This commit is contained in:
rasbt 2014-05-14 08:55:21 -04:00
parent 8f0944c7cf
commit efa50ac899

View File

@ -1,7 +1,7 @@
{ {
"metadata": { "metadata": {
"name": "", "name": "",
"signature": "sha256:7ce6d9e0e1dc3da5c31fc5f3a5ab7687870a76cd4adaedd6da95bc6451755b12" "signature": "sha256:f56b7081a6e5b63610100fcfa0a226c7a0184dfe0d63128614a7a68555653428"
}, },
"nbformat": 3, "nbformat": 3,
"nbformat_minor": 0, "nbformat_minor": 0,
@ -60,6 +60,7 @@
"- [Sorting the CSV file](#sorting)\n", "- [Sorting the CSV file](#sorting)\n",
"- [Marking min/max values in particular columns](#marking)\n", "- [Marking min/max values in particular columns](#marking)\n",
"- [Writing out the modified table to as a new CSV file](#writing)\n", "- [Writing out the modified table to as a new CSV file](#writing)\n",
"- [Batch processing CSV files](#batch)\n",
"<hr>\n", "<hr>\n",
"<br>\n", "<br>\n",
"<br>" "<br>"
@ -646,10 +647,105 @@
], ],
"prompt_number": 14 "prompt_number": 14
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name='batch'></a>\n",
"<br>\n",
"<br>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Batch processing CSV files"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[[back to top](#sections)]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Usually, CSV files never come alone, but we have to process a whole bunch of similar formatted CSV files from some output device. \n",
"For example, if we want to process all CSV files in a particular input directory and want to save the processed files in a separate output directory, we can use a simple list comprehension to collect tuples of input-output file names."
]
},
{ {
"cell_type": "code", "cell_type": "code",
"collapsed": false, "collapsed": false,
"input": [], "input": [
"import os\n",
"\n",
"in_dir = '../Data'\n",
"out_dir = '../Data/processed'\n",
"csvs = [\n",
" (os.path.join(in_dir, csv), \n",
" os.path.join(out_dir, csv))\n",
" for csv in os.listdir(in_dir) \n",
" if csv.endswith('.csv')\n",
" ]\n",
"\n",
"for i in csvs:\n",
" print(i)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"('../Data/test.csv', '../Data/processed/test.csv')\n",
"('../Data/test_marked.csv', '../Data/processed/test_marked.csv')\n"
]
}
],
"prompt_number": 12
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<br>\n",
"Next, we can summarize the processes we want to apply to the CSV files in a simple function and loop over our file names:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def process_csv(csv_in, csv_out):\n",
" \"\"\" \n",
" Takes an input- and output-filename of an CSV file\n",
" and marks minimum values for every column.\n",
" \n",
" \"\"\"\n",
" csv_cont = csv_to_list(csv_in)\n",
" csv_marked = copy.deepcopy(csv_cont)\n",
" convert_cells_to_floats(csv_marked)\n",
" mark_all_col(csv_marked, mark_max=False, marker='*')\n",
" write_csv(csv_out, csv_marked)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 18
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"for inout in csvs:\n",
" process_csv(inout[0], inout[1])"
],
"language": "python", "language": "python",
"metadata": {}, "metadata": {},
"outputs": [] "outputs": []