batch processing csvs

2025-05-24 17:56:18 +00:00 · 2014-05-14 08:55:21 -04:00 · 2014-05-14 08:55:21 -04:00 · efa50ac899
commit efa50ac899
parent 8f0944c7cf
1 changed files with 98 additions and 2 deletions
--- a/tutorials/sorting_csvs.ipynb
+++ b/tutorials/sorting_csvs.ipynb
@ -1,7 +1,7 @@
 {
 "metadata": {
  "name": "",
-  "signature": "sha256:7ce6d9e0e1dc3da5c31fc5f3a5ab7687870a76cd4adaedd6da95bc6451755b12"
+  "signature": "sha256:f56b7081a6e5b63610100fcfa0a226c7a0184dfe0d63128614a7a68555653428"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
@ -60,6 +60,7 @@
      "- [Sorting the CSV file](#sorting)\n",
      "- [Marking min/max values in particular columns](#marking)\n",
      "- [Writing out the modified table to as a new CSV file](#writing)\n",
+      "- [Batch processing CSV files](#batch)\n",
      "<hr>\n",
      "<br>\n",
      "<br>"
@ -646,10 +647,105 @@
     ],
     "prompt_number": 14
    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "<a name='batch'></a>\n",
+      "<br>\n",
+      "<br>"
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "## Batch processing CSV files"
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "[[back to top](#sections)]"
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "Usually, CSV files never come alone, but we have to process a whole bunch of similar formatted CSV files from some output device.  \n",
+      "For example, if we want to process all CSV files in a particular input directory and want to save the processed files in a separate output directory, we can use a simple list comprehension to collect tuples of input-output file names."
+     ]
+    },
    {
     "cell_type": "code",
     "collapsed": false,
-     "input": [],
+     "input": [
+      "import os\n",
+      "\n",
+      "in_dir = '../Data'\n",
+      "out_dir = '../Data/processed'\n",
+      "csvs = [\n",
+      "    (os.path.join(in_dir, csv), \n",
+      "        os.path.join(out_dir, csv))\n",
+      "    for csv in os.listdir(in_dir) \n",
+      "    if csv.endswith('.csv')\n",
+      "    ]\n",
+      "\n",
+      "for i in csvs:\n",
+      "    print(i)"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [
+      {
+       "output_type": "stream",
+       "stream": "stdout",
+       "text": [
+        "('../Data/test.csv', '../Data/processed/test.csv')\n",
+        "('../Data/test_marked.csv', '../Data/processed/test_marked.csv')\n"
+       ]
+      }
+     ],
+     "prompt_number": 12
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "<br>\n",
+      "Next, we can summarize the processes we want to apply to the CSV files in a simple function and loop over our file names:"
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "def process_csv(csv_in, csv_out):\n",
+      "    \"\"\" \n",
+      "    Takes an input- and output-filename of an CSV file\n",
+      "    and marks minimum values for every column.\n",
+      "    \n",
+      "    \"\"\"\n",
+      "    csv_cont = csv_to_list(csv_in)\n",
+      "    csv_marked = copy.deepcopy(csv_cont)\n",
+      "    convert_cells_to_floats(csv_marked)\n",
+      "    mark_all_col(csv_marked, mark_max=False, marker='*')\n",
+      "    write_csv(csv_out, csv_marked)"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [],
+     "prompt_number": 18
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "for inout in csvs:\n",
+      "    process_csv(inout[0], inout[1])"
+     ],
     "language": "python",
     "metadata": {},
     "outputs": []