Read Multiple Text Files and Add Filename as New Column in Numpy

Reading and Writing Text Files

Overview

Education: 60 min
Exercises: 30 min

Questions

  • How tin can I read in data that is stored in a file or write data out to a file?

Objectives

  • Be able to open a file and read in the information stored in that file

  • Understand the divergence betwixt the file name, the opened file object, and the data read in from the file

  • Be able to write output to a text file with simple formatting

Why do we desire to read and write files?

Beingness able to open and read in files allows the states to work with larger information sets, where it wouldn't be possible to type in each and every value and store them one-at-a-fourth dimension equally variables. Writing files allows us to process our data and so salvage the output to a file so nosotros tin look at information technology later.

Right now, nosotros volition practise working with a comma-delimited text file (.csv) that contains several columns of data. However, what you learn in this lesson tin exist applied to any general text file. In the next lesson, you volition learn another way to read and procedure .csv data.

Paths to files

In club to open up a file, nosotros demand to tell Python exactly where the file is located, relative to where Python is currently working (the working directory). In Spyder, we can do this past setting our current working directory to the folder where the file is located. Or, when nosotros provide the file name, nosotros tin can give a complete path to the file.

Lesson Setup

Nosotros will work with the exercise file Plates_output_simple.csv.

  1. Locate the file Plates_output_simple.csv in the directory dwelling/Desktop/workshops/fustigate-git-python.
  2. Re-create the file to your working directory, home/Desktop/workshops/YourName.
  3. Make sure that your working directory is also set to the folder habitation/Desktop/workshops/YourName.
  4. As yous are working, make sure that you lot relieve your file opening script(south) to this directory.

The File Setup

Allow's open and examine the structure of the file Plates_output_simple.csv. If you open the file in a text editor, y'all will see that the file contains several lines of text.

DataFileRaw

However, this is fairly difficult to read. If you open up the file in a spreadsheet program such as LibreOfficeCalc or Excel, y'all can come across that the file is organized into columns, with each column separated by the commas in the paradigm above (hence the file extension .csv, which stands for comma-separated values).

DataFileColumns

The file contains ane header row, followed by eight rows of data. Each row represents a single plate image. If we look at the column headings, we can see that we have collected data for each plate:

  • The proper name of the epitome from which the data was collected
  • The plate number (in that location were 4 plates, with each plate imaged at two different fourth dimension points)
  • The growth condition (either control or experimental)
  • The observation timepoint (either 24 or 48 hours)
  • Colony count for the plate
  • The average colony size for the plate
  • The percentage of the plate covered past bacterial colonies

Nosotros will read in this data file and and then piece of work to analyze the information.

Opening and reading files is a three-step process

We volition open and read the file in iii steps.

  1. We will create a variable to hold the name of the file that we want to open.
  2. We volition call a open to open up the file.
  3. We will phone call a office to actually read the information in the file and store information technology in a variable so that we tin can process information technology.

And and then, there's i more than step to exercise!

  • When nosotros are washed, nosotros should remember to close the file!

Y'all can think of these 3 steps every bit being similar to checking out a volume from the library. Showtime, you have to get to the catalog or database to find out which book you lot need (the filename). Then, you take to become and get it off the shelf and open the volume upward (the open function). Finally, to gain whatsoever data from the book, you lot have to read the words (the read function)!

Here is an example of opening, reading, and closing a file.

                          #Create a variable for the file proper noun              filename              =              'Plates_output_simple.csv'              #This is simply a string of text              #Open the file              infile              =              open              (              filename              ,              'r'              )              # 'r' says we are opening the file to read, infile is the opened file object that we volition read from              #Store the data from the file in a variable              data              =              infile              .              read              ()              #Print the data in the file              print              (              data              )              #close the file              infile              .              close              ()                      

Once we accept read the data in the file into our variable data, we can treat it like any other variable in our code.

Employ consequent names to make your lawmaking clearer

It is a good idea to develop some consistent habits virtually the fashion y'all open up and read files. Using the same (or similar!) variable names each fourth dimension will make it easier for you to keep track of which variable is the proper name of the file, which variable is the opened file object, and which variable contains the read-in data.

In these examples, we will use filename for the text cord containing the file name, infile for the open file object from which we can read in data, and information for the variable holding the contents of the file.

Commands for reading in files

In that location are a variety of commands that let united states to read in data from files.
infile.read() will read in the entire file every bit a single string of text.
infile.readline() will read in one line at a time (each time yous telephone call this command, it reads in the adjacent line).
infile.readlines() volition read all of the lines into a list, where each line of the file is an item in the list.

Mixing these commands can have some unexpected results.

                          #Create a variable for the file name              filename              =              'Plates_output_simple.csv'              #Open the file              infile              =              open up              (              filename              ,              'r'              )              #Print the start two lines of the file              impress              (              infile              .              readline              ())              print              (              infile              .              readline              ())              #call infile.read()              impress              (              infile              .              read              ())              #close the file              infile              .              close              ()                      

Notice that the infile.read()control started at the third line of the file, where the first ii infile.readline() commands left off.

Think of it like this: when the file is opened, a pointer is placed at the top left corner of the file at the outset of the first line. Whatever fourth dimension a read part is called, the cursor or pointer advances from where it already is. The first infile.readline() started at the commencement of the file and avant-garde to the finish of the first line. Now, the pointer is positioned at the start of the second line. The 2d infile.readline() advanced to the end of the second line of the file, and left the arrow positioned at the beginning of the third line. infile.read() began from this position, and avant-garde through to the end of the file.

In full general, if you want to switch between the different kinds of read commands, you should shut the file then open up information technology again to outset over.

Reading all of the lines of a file into a list

infile.readlines() will read all of the lines into a listing, where each line of the file is an detail in the listing. This is extremely useful, because in one case we take read the file in this mode, we can loop through each line of the file and procedure it. This approach works well on data files where the data is organized into columns like to a spreadsheet, because it is likely that nosotros will desire to handle each line in the same fashion.

The case below demonstrates this approach:

                          #Create a variable for the file name              filename              =              "Plates_output_simple.csv"              #Open the file              infile              =              open              (              filename              ,              'r'              )              lines              =              infile              .              readlines              ()              for              line              in              lines              :              #lines is a list with each detail representing a line of the file              if              'control'              in              line              :              impress              (              line              )              #print lines for control status              infile              .              shut              ()              #close the file when you're done!                      

Using .split() to separate "columns"

Since our data is in a .csv file, we can use the divide command to divide each line of the file into a list. This can be useful if we want to access specific columns of the file.

                          #Create a variable for the file name                            filename              =              "Plates_output_simple.csv"              #Open the file              infile              =              open              (              filename              ,              'r'              )              lines              =              infile              .              readlines              ()              for              line              in              lines              :              sline              =              line              .              split              (              ','              )              # separates line into a list of items.  ',' tells it to split the lines at the commas              impress              (              sline              )              #each line is now a list              infile              .              close              ()              #Always close the file!                      

Consistent names, again

At kickoff glance, the variable proper name sline in the case in a higher place may not make much sense. In fact, we chose it to be an abbreviation for "split line", which exactly describes the contents of the variable.

You don't accept to utilize this naming convention if yous don't desire to, but you lot should work to apply consistent variable names beyond your code for mutual operations like this. It will go far much easier to open an old script and quickly sympathise exactly what information technology is doing.

Converting text to numbers

When we called the readlines() command in the previous code, Python reads in the contents of the file equally a string. If we desire our code to recognize something in the file as a number, nosotros need to tell it this!

For example, float('5.0') will tell Python to care for the text string 'v.0' as the number 5.0. int(sline[four]) will tell our code to treat the text string stored in the 5th position of the list sline as an integer (not-decimal) number.

For each line in the file, the ColonyCount is stored in the 5th column (index four with our 0-based counting).
Modify the code higher up to impress the line only if the ColonyCount is greater than 30.

Solution

                                  #Create a variable for the file proper noun                  filename                  =                  'Plates_output_simple.csv'                  ##Open up the file                  infile                  =                  open                  (                  filename                  ,                  'r'                  )                  lines                  =                  infile                  .                  readlines                  ()                  for                  line                  in                  lines                  [                  1                  :]:                  #skip the first line, which is the header                  sline                  =                  line                  .                  split                  (                  ','                  )                  # separates line into a list of items.  ',' tells it to separate the lines at the commas                  colonyCount                  =                  int                  (                  sline                  [                  4                  ])                  #store the colony count for the line as an integer                  if                  colonyCount                  >                  30                  :                  impress                  (                  sline                  )                  #close the file                  infile                  .                  close                  ()                              

Writing data out to a file

Frequently, we volition want to write data to a new file. This is especially useful if we have washed a lot of computations or data processing and nosotros want to be able to save it and come up back to information technology later.

Writing a file is the same multi-step procedure

Just like reading a file, we will open and write the file in multiple steps.

  1. Create a variable to agree the proper name of the file that we want to open. Often, this volition be a new file that doesn't yet exist.
  2. Telephone call a office to open the file. This fourth dimension, we volition specify that we are opening the file to write into information technology!
  3. Write the information into the file. This requires some careful attention to formatting.
  4. When we are washed, we should retrieve to shut the file!

The code beneath gives an example of writing to a file:

                          filename              =              "output.txt"              #west tells python nosotros are opening the file to write into information technology              outfile              =              open up              (              filename              ,              'w'              )              outfile              .              write              (              "This is the first line of the file"              )              outfile              .              write              (              "This is the second line of the file"              )              outfile              .              close              ()              #Close the file when we're done!                      

Where did my file end upwards?

Any time yous open a new file and write to it, the file will be saved in your electric current working directory, unless y'all specified a different path in the variable filename.

Newline characters

When you examine the file you lot just wrote, you will see that all of the text is on the same line! This is because we must tell Python when to start on a new line by using the special string character '\n'. This newline character will tell Python exactly where to start each new line.

The instance below demonstrates how to employ newline characters:

                          filename              =              'output_newlines.txt'              #w tells python nosotros are opening the file to write into it              outfile              =              open              (              filename              ,              'due west'              )              outfile              .              write              (              "This is the first line of the file              \due north              "              )              outfile              .              write              (              "This is the second line of the file              \due north              "              )              outfile              .              close              ()              #Shut the file when we're washed!                      

Go open the file you but wrote and and check that the lines are spaced correctly.:

Dealing with newline characters when yous read a file

Y'all may take noticed in the last file reading instance that the printed output included newline characters at the end of each line of the file:

['colonies02.tif', '2', 'exp', '24', '84', '3.2', '22\n']
['colonies03.tif', 'iii', 'exp', '24', '792', '3', '78\n']
['colonies06.tif', 'ii', 'exp', '48', '85', 'five.2', '46\north']

Nosotros tin get rid of these newlines by using the .strip() function, which volition get rid of newline characters:

                              #Create a variable for the file name                filename                =                'Plates_output_simple.csv'                ##Open the file                infile                =                open                (                filename                ,                'r'                )                lines                =                infile                .                readlines                ()                for                line                in                lines                [                i                :]:                #skip the outset line, which is the header                sline                =                line                .                strip                ()                #get rid of trailing newline characters at the end of the line                sline                =                sline                .                split                (                ','                )                # separates line into a list of items.  ',' tells it to split the lines at the commas                colonyCount                =                int                (                sline                [                4                ])                #store the colony count for the line as an integer                if                colonyCount                >                30                :                print                (                sline                )                #close the file                infile                .                close                ()                          

Writing numbers to files

Just like Python automatically reads files in as strings, the write()office expects to only write strings. If we desire to write numbers to a file, we volition need to "bandage" them as strings using the role str().

The code below shows an example of this:

                          numbers              =              range              (              0              ,              10              )              filename              =              "output_numbers.txt"              #westward tells python we are opening the file to write into it              outfile              =              open              (              filename              ,              'westward'              )              for              number              in              numbers              :              outfile              .              write              (              str              (              number              ))              outfile              .              close              ()              #Close the file when we're done!                      

Writing new lines and numbers

Go open up and examine the file you just wrote. Yous will see that all of the numbers are written on the same line.

Modify the code to write each number on its own line.

Solution

                                  numbers                  =                  range                  (                  0                  ,                  ten                  )                  #Create the range of numbers                  filename                  =                  "output_numbers.txt"                  #provide the file proper noun                  #open the file in 'write' style                  outfile                  =                  open up                  (                  filename                  ,                  'west'                  )                  for                  number                  in                  numbers                  :                  outfile                  .                  write                  (                  str                  (                  number                  )                  +                  '                  \northward                  '                  )                  outfile                  .                  close                  ()                  #Close the file when we're done!                              

The file you simply wrote should exist saved in your Working Directory. Open the file and check that the output is correctly formatted with ane number on each line.

Opening files in different 'modes'

When we have opened files to read or write information, we have used the function parameter 'r' or 'w' to specify which "manner" to open the file.
'r' indicates we are opening the file to read data from it.
'westward' indicates we are opening the file to write information into it.

Be very, very careful when opening an existing file in 'due west' mode.
'w' will over-write whatever data that is already in the file! The overwritten data will be lost!

If y'all want to add on to what is already in the file (instead of erasing and over-writing it), yous tin can open up the file in append mode by using the 'a' parameter instead.

Pulling it all together

Read in the data from the file Plates_output_simple.csv that nosotros have been working with. Write a new csv-formatted file that contains only the rows for control plates.
Yous will need to practise the following steps:

  1. Open the file.
  2. Utilize .readlines() to create a listing of lines in the file. And so close the file!
  3. Open up a file to write your output into.
  4. Write the header line of the output file.
  5. Use a for loop to allow y'all to loop through each line in the list of lines from the input file.
  6. For each line, check if the growth condition was experimental or control.
  7. For the control lines, write the line of data to the output file.
  8. Close the output file when you're done!

Solution

Here's one way to do it:

                                  #Create a variable for the file name                  filename                  =                  'Plates_output_simple.csv'                  ##Open the file                  infile                  =                  open up                  (                  filename                  ,                  'r'                  )                  lines                  =                  infile                  .                  readlines                  ()                  #We will procedure the lines of the file later                  #close the input file                  infile                  .                  close                  ()                  #Create the file we will write to                  filename                  =                  'ControlPlatesData.txt'                  outfile                  =                  open up                  (                  filename                  ,                  'w'                  )                  outfile                  .                  write                  (                  lines                  [                  0                  ])                  #This will write the header line of the file                                    for                  line                  in                  lines                  [                  1                  :]:                  #skip the get-go line, which is the header                  sline                  =                  line                  .                  split up                  (                  ','                  )                  # separates line into a listing of items.  ',' tells it to carve up the lines at the commas                  condition                  =                  sline                  [                  ii                  ]                  #store the condition for the line as a string                  if                  condition                  ==                  "command"                  :                  outfile                  .                  write                  (                  line                  )                  #The variable line is already formatted correctly!                  outfile                  .                  close                  ()                  #Close the file when we're washed!                              

Claiming Problem

Open and read in the data from Plates_output_simple.csv. Write a new csv-formatted file that contains only the rows for the command condition and includes only the columns for Time, colonyCount, avgColonySize, and percentColonyArea. Hint: you can use the .join() function to join a list of items into a string.

                              names                =                [                'Erin'                ,                'Mark'                ,                'Tessa'                ]                nameString                =                ', '                .                join                (                names                )                #the ', ' tells Python to join the list with each item separated by a comma + space                impress                (                nameString                )                          

'Erin, Marker, Tessa'

Solution

                                  #Create a variable for the input file name                  filename                  =                  'Plates_output_simple.csv'                  ##Open the file                  infile                  =                  open                  (                  filename                  ,                  'r'                  )                  lines                  =                  infile                  .                  readlines                  ()                  #We will process the lines of the file later                  #shut the file                  infile                  .                  close                  ()                  # Create the file we volition write to                  filename                  =                  'ControlPlatesData_Reduced.txt'                  outfile                  =                  open                  (                  filename                  ,                  'due west'                  )                  #Write the header line                  headerList                  =                  lines                  [                  0                  ]                  .                  divide                  (                  ','                  )[                  3                  :]                  #This will render the listing of column headers from 'fourth dimension' on                  headerString                  =                  ','                  .                  bring together                  (                  headerList                  )                  #join the items in the listing with commas                  outfile                  .                  write                  (                  headerString                  )                  #At that place is already a newline at the stop, so no need to add one                  #Write the remaining lines                  for                  line                  in                  lines                  [                  1                  :]:                  #skip the first line, which is the header                  sline                  =                  line                  .                  split                  (                  ','                  )                  # separates line into a list of items.  ',' tells it to split the lines at the commas                  condition                  =                  sline                  [                  2                  ]                  #shop the colony count for the line as an integer                  if                  condition                  ==                  "control"                  :                  dataList                  =                  sline                  [                  3                  :]                  dataString                  =                  ','                  .                  join                  (                  dataList                  )                  outfile                  .                  write                  (                  dataString                  )                  #The variable line is already formatted correctly!                  outfile                  .                  shut                  ()                  #Close the file when we're done!                              

Primal Points

  • Opening and reading a file is a multistep procedure: Defining the filename, opening the file, and reading the data

  • Data stored in files can be read in using a variety of commands

  • Writing data to a file requires attending to data types and formatting that isn't necessary with a impress() statement

fitzpatrickhimageary73.blogspot.com

Source: https://eldoyle.github.io/PythonIntro/08-ReadingandWritingTextFiles/

0 Response to "Read Multiple Text Files and Add Filename as New Column in Numpy"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel