{ "metadata": { "name": "", "signature": "sha256:996358a25da6fc77c66d183e79209307af06bd2f9abb0656d3bb70cfc2fe597a" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Sebastian Raschka 05/09/2014" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Fixing CSV files" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have a directory `../CSV_files_raw/` with CSV files where some of them have 'tab-separated' and some of them 'comma-separated' columns. \n", "Here, we will 'fix' them, i.e., have them all comma-separated, and save them to a new directory `../CSV_fixed`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, we create a dictionary with the file basenames as keys. The values are lists of the file paths to the raw and new fixed CSV files. e.g., \n", "\n", " {\n", " 'abc.csv': ['../CSV_files_raw/abc.csv', '../CSV_fixed/abc.csv'], \n", " 'def.csv': ['../CSV_files_raw/def.csv', '../CSV_fixed/def.csv'], \n", " ...\n", " }" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import sys\n", "import os\n", "\n", "raw_dir = '../CSV_files_raw/'\n", "fixed_dir = '../CSV_fixed'\n", "\n", "if not os.path.exists(fixed_dir):\n", " os.mkdir(fixed_dir)\n", "\n", "f_dict = {os.path.basename(f):[os.path.join(raw_dir, f),\n", " os.path.join(fixed_dir, f)]\n", " for f in os.listdir(raw_dir) if f.endswith('.csv')} " ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 8 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, we can replace the tabs with commas for the new files very easily:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "for f in f_dict.keys():\n", " with open(f_dict[f][0], 'r') as raw, open(f_dict[f][1], 'w') as fixed:\n", " for line in raw:\n", " line = line.strip().split('\\t')\n", " fixed.write(','.join(line) + '\\n')" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 11 } ], "metadata": {} } ] }