
Overview
Summary
Manipulating datasets in spreadsheet software is essential for transforming raw data into actionable insights. It supports informed decision-making, improves efficiency through automation, organizes data for clarity, and uncovers trends for problem-solving. Spreadsheet manipulation enhances communication by making visualizations and summaries more interpretable. Spreadsheet’s versatility across industries highlights their critical role in ensuring accurate analysis and better outcomes.
Learning Outcomes
- Modify open access data in spreadsheet softwares for data evaluation
- Prepare the dataset content in spreadsheet softwares for data visualization
- Present preliminary relationships with the data using spreadsheet softwares
Downloading Datasets
A common access point for large datasets is an online database. Often, an online database will provide a search interface that allows users to filter a dataset and access or download only the data they need for their research.
Download one or more of the datasets below in order to become familiar with the process.
Electric vehicle population data: https://catalog.data.gov/dataset/electric-vehicle-population-data
Energy Hourly consumption data: https://www.kaggle.com/datasets/robikscube/hourly-energy-consumption
Greenhouse gas emissions data: https://www.kaggle.com/datasets/unitednations/international-greenhouse-gas-emissions
Bike sharing data: https://code.datasciencedojo.com/datasciencedojo/datasets/tree/master/Bike%20Sharing
Data Cleaning in Spreadsheet Softwares
The video or instructions below will show how to:
- Import data into Google Sheets
- Rename columns headers
- Delete columns
- Combine categories
Reference: [1]
Preparing Data with Spreadsheet Softwares
Often, the dataset you download in its original form doesn’t fit your data analysis needs exactly. You may want to make surface-level changes to this “raw” dataset so that it’s easier to work with. This process is called “cleaning” your data, or preparing it for data analysis.
It is important to remember that cleaning data should not change the values in any way.
- Renaming column headers
Remember, when naming column headers, we want to avoid using spaces or special characters (! , * are examples of special characters) so that it is easy for the computer to read.
- Removing unnecessary columns
There may be columns in the dataset that you do not plan to use. You can delete those so that the size of the file is smaller and easier to work with.
- Combining categories
Sometimes the dataset is more specific than we need.
Exploratory Data Analysis
Google Sheets and Microsoft Excel have a tool called PivotTables that help you easily calculate, summarize, and analyze data. You see comparisons, patterns, and trends in your data and then use those visualizations in reports or papers.
Reference: [2]
Python
Many people use spreadsheet applications like Google Sheets or Microsoft Excel to store, analyze, and visualize data. They are relatively easy to learn and use and have built-in advanced data analysis capabilities. Spreadsheet applications use a graphic user interface with drop-down menus and toolbars, so they can be more familiar to many of us.
Python is an open-source programming language that can also be used for data analysis purposes. There can be a steep learning curve at first, but it is a powerful tool. Python and other programming languages work well for:
- Doing research with large datasets
- Automating or repeating the same processes many times
- Showing your work so others can reproduce your research
Reference: [3]
References
[1] Cahoon, C. Data Cleaning in Google Sheets. Panopto. https://ncsu.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=056353fa-6ab0-4386-99f4-ad42014a2ce7&start=0 (accessed 2025-01-22).
[2] Cahoon, C. Exploratory Data Analysis in Google Sheets. Panopto. https://ncsu.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=1e1886b6-ce2b-4687-bcd6-ad4701266857&start=0 (accessed 2025-01-22).
[3] Cahoon, C. Using Google Colabs to work with Python. Panopto. https://ncsu.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=1fc37086-39ba-4841-a012-ad440148be27&start=0 (accessed 2025-01-22).