Instructor notes
Lesson motivation and learning objectives
The purpose of this lesson is not to teach how to do data analysis in spreadsheets, but to teach good data organization and how to do some data cleaning and quality control in a spreadsheet program.
Lesson design
Data tidiness
- Introduce that we’re teaching data organization, and that we’re using spreadsheets, because most people do data entry in spreadsheets or have data in spreadsheets.
- Emphasize that we are teaching good practice in data organization and that this is the foundation of their research practice. Without organized and clean data, it will be difficult for them to apply the things we’re teaching in the rest of the workshop to their data.
- Much of their lives as a researcher will be spent on this ‘data wrangling’ stage, but some of it can be prevented with good strategies for data collection up front.
- Tell that we’re not teaching data analysis or plotting in spreadsheets, because it’s very manual and also not reproducible. That’s why we’re teaching bash shell scripting!
- Now let’s talk about spreadsheets, and when we say spreadsheets, we mean any program that does spreadsheets like Excel, LibreOffice, OpenOffice. Most learners are probably using Excel.
- Ask the audience any things they’ve accidentally done in spreadsheets. Talk about an example of your own, like that you accidentally sorted only a single column and not the rest. of the data in the spreadsheet. What are the pain points!?
- As people answer, highlight some of these issues with spreadsheets.
- Go through the point about keeping track of your steps and keeping raw data raw.
- Go through the cardinal rule of spreadsheets about columns, rows and cells.
- Hand them a messy data file and have them pair up and work together to clean up the data.
Planning for NGS projects
- This episode depends on learners discussing exercises with one another. Be sure to give plenty of time for this discussion.
Examining Data on the NCBI SRA Database
- Learners should not actually download the ENA files in the “Downloading a few sequencing files: EMBL-EBI” section.
Concluding points
- Now your data is organized so that a computer can read and understand it. This lets you use the full power of the computer for your analyses as we’ll see in the rest of the workshop.
Technical tips and tricks
Provide information on setting up your environment for learners to view your live coding (increasing text size, changing text color, etc), as well as general recommendations for working with coding tools to best suit the learning environment.
Common problems
Excel looks and acts different on different operating systems
The main challenge with this lesson is that Excel looks very different and how you do things is even different between Mac and PC, and between different versions of Excel. So, the presenter’s environment will only be the same as some of the learners.
We need better notes and screenshots of how things work on both Mac and PC. But we likely won’t be able to cover all the different versions of Excel.
If you have a helper who has more experience with the other OS than you, it would be good to prepare them to help with this lesson and tell people how to do things in the other OS.
People are not interactive or responsive on the exercises
This lesson depends on people working on the exercise and responding with things that are fixed. If your audience is reluctant to participate, start out with some things on your own, or ask a helper for their answers. This generally gets even a reluctant audience started.