Delimiters and Functions with Flu Data
To review spreadsheet basics, download the data from Google’s Flu Trends. The data is just text with a lot of commas. The goal is to get your data into tidy rows and columns in a spreadsheet, so you can start looking for interesting patterns or trends.
- Copy all the data (
Cmd+C) and in Excel, paste the data (
- The pasted data appears all in one column.
- Select column A, and choose
Data > Text to Columns.
- The Convert Text to Columns Wizard appears, which allows you to define the “delimiter”, or the character that separates columns of data. Choose the comma as your delimiter. You can see a preview of the re-formatted data in the preview window below.
- In the next section of the Text to Column dialog box, you can format the date data so it’s preserved as Year-Month-Date.
- Now that all the data has been successfully transferred to your spreadsheet, you can start using formulas and/or other sorting/ filtering functions to explore the data. Let’s find the maximum number of flu searches in each country. At the bottom of the spreadsheet, at the end of column B, enter =Max(). All formulas begin with the equals sign, then the function name, and then a pair of parentheses. In between the parentheses, enter the range of data from which you want to find the maximum. You can select the cells with your mouse, or enter the beginning and end cell, separated with a colon.
- To extend your Max function to ALL your columns, simply click the bottom-right corner of the cell and drag it to the right. The function is extended and Excel “auto-increments” the range so the maximum is determined for each appropriate column.
Now that you know the maximum value for each country, you can create an =Max() function to identify which country (column) has the maximum. Your range would be the cells in the row that displays the max values.
You don’t always need to use functions. Use Data > Sort (Shift+Command+R) to sort your spreadsheet along a single column, in either descending or ascending order. That’s an easy way to re-order your data to see maximums and minimums.