batch processing csvs

2024-10-06 04:59:28 +00:00 · 2014-05-14 08:55:21 -04:00 · 2014-05-14 08:55:21 -04:00 · efa50ac899
commit efa50ac899
parent 8f0944c7cf
1 changed files with 98 additions and 2 deletions
--- a/tutorials/sorting_csvs.ipynb
+++ b/tutorials/sorting_csvs.ipynb
@ -1,7 +1,7 @@
 {
 "metadata": {
  "name": "",
-  "signature": "sha256:7ce6d9e0e1dc3da5c31fc5f3a5ab7687870a76cd4adaedd6da95bc6451755b12"
+  "signature": "sha256:f56b7081a6e5b63610100fcfa0a226c7a0184dfe0d63128614a7a68555653428"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
@ -60,6 +60,7 @@
      "- [Sorting the CSV file](#sorting)\n",
      "- [Marking min/max values in particular columns](#marking)\n",
      "- [Writing out the modified table to as a new CSV file](#writing)\n",
      "- [Batch processing CSV files](#batch)\n",
      "<hr>\n",
      "<br>\n",
      "<br>"
@ -646,10 +647,105 @@
     ],
     "prompt_number": 14
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<a name='batch'></a>\n",
      "<br>\n",
      "<br>"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## Batch processing CSV files"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "[[back to top](#sections)]"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Usually, CSV files never come alone, but we have to process a whole bunch of similar formatted CSV files from some output device.  \n",
      "For example, if we want to process all CSV files in a particular input directory and want to save the processed files in a separate output directory, we can use a simple list comprehension to collect tuples of input-output file names."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
-     "input": [],
+     "input": [
      "import os\n",
      "\n",
      "in_dir = '../Data'\n",
      "out_dir = '../Data/processed'\n",
      "csvs = [\n",
      "    (os.path.join(in_dir, csv), \n",
      "        os.path.join(out_dir, csv))\n",
      "    for csv in os.listdir(in_dir) \n",
      "    if csv.endswith('.csv')\n",
      "    ]\n",
      "\n",
      "for i in csvs:\n",
      "    print(i)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "('../Data/test.csv', '../Data/processed/test.csv')\n",
        "('../Data/test_marked.csv', '../Data/processed/test_marked.csv')\n"
       ]
      }
     ],
     "prompt_number": 12
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<br>\n",
      "Next, we can summarize the processes we want to apply to the CSV files in a simple function and loop over our file names:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def process_csv(csv_in, csv_out):\n",
      "    \"\"\" \n",
      "    Takes an input- and output-filename of an CSV file\n",
      "    and marks minimum values for every column.\n",
      "    \n",
      "    \"\"\"\n",
      "    csv_cont = csv_to_list(csv_in)\n",
      "    csv_marked = copy.deepcopy(csv_cont)\n",
      "    convert_cells_to_floats(csv_marked)\n",
      "    mark_all_col(csv_marked, mark_max=False, marker='*')\n",
      "    write_csv(csv_out, csv_marked)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 18
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "for inout in csvs:\n",
      "    process_csv(inout[0], inout[1])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []