{"id":515,"date":"2024-05-28T18:06:47","date_gmt":"2024-05-28T18:06:47","guid":{"rendered":"https:\/\/wordpress-projects.wolfware.ncsu.edu\/ncsuereseng\/?page_id=515"},"modified":"2025-06-09T15:33:17","modified_gmt":"2025-06-09T15:33:17","slug":"part-6-b","status":"publish","type":"page","link":"https:\/\/wordpress-projects.wolfware.ncsu.edu\/ncsuereseng\/part-6-b\/","title":{"rendered":"Datasets and their Manipulation"},"content":{"rendered":"\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-1 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"862\" height=\"264\" src=\"https:\/\/wordpress-projects.wolfware.ncsu.edu\/ncsuereseng\/wp-content\/uploads\/sites\/389\/2025\/06\/e-REF-Logo-V-2-12.png\" alt=\"\" class=\"wp-image-1956\" style=\"width:100px\" srcset=\"https:\/\/wordpress-projects.wolfware.ncsu.edu\/ncsuereseng\/wp-content\/uploads\/sites\/389\/2025\/06\/e-REF-Logo-V-2-12.png 862w, https:\/\/wordpress-projects.wolfware.ncsu.edu\/ncsuereseng\/wp-content\/uploads\/sites\/389\/2025\/06\/e-REF-Logo-V-2-12-300x92.png 300w, https:\/\/wordpress-projects.wolfware.ncsu.edu\/ncsuereseng\/wp-content\/uploads\/sites\/389\/2025\/06\/e-REF-Logo-V-2-12-768x235.png 768w\" sizes=\"auto, (max-width: 862px) 100vw, 862px\" \/><\/figure>\n\n\n\n<div style=\"height:10px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\" style=\"font-size:52px\">Overview<\/h2>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:350px\">\n<p style=\"font-size:18px\"><strong>Navigation<\/strong><\/p>\n\n\n\n<ul style=\"font-size:18px\" class=\"wp-block-list\">\n<li style=\"font-size:18px\"><a href=\"#downloading-datasets\">Downloading Datasets<\/a><\/li>\n\n\n\n<li style=\"font-size:18px\"><a href=\"#data-cleaning\">Datacleaning in Spreadsheet Softwares<\/a><\/li>\n\n\n\n<li style=\"font-size:18px\"><a href=\"#exploratory-data\">Exploratory Data Analysis<\/a><\/li>\n\n\n\n<li style=\"font-size:18px\"><a href=\"#python\">Python<\/a><\/li>\n<\/ul>\n<\/div>\n<\/div>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\" style=\"font-size:22px\">Summary<\/h2>\n\n\n\n<p style=\"font-size:22px\">Manipulating datasets in spreadsheet software is essential for transforming raw data into actionable insights. It supports informed decision-making, improves efficiency through automation, organizes data for clarity, and uncovers trends for problem-solving. Spreadsheet manipulation enhances communication by making visualizations and summaries more interpretable. Spreadsheet\u2019s versatility across industries highlights their critical role in ensuring accurate analysis and better outcomes.<\/p>\n\n\n\n<div style=\"height:10px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\" style=\"font-size:22px\">Learning Outcomes<\/h2>\n\n\n\n<ul style=\"font-size:22px\" class=\"wp-block-list\">\n<li style=\"font-size:22px\"><strong>Modify open access data <\/strong>in spreadsheet softwares for data evaluation<\/li>\n\n\n\n<li style=\"font-size:22px\"><strong>Prepare the dataset content<\/strong> in spreadsheet softwares for data visualization<\/li>\n\n\n\n<li style=\"font-size:22px\"><strong>Present preliminary relationships<\/strong> with the data using spreadsheet softwares<\/li>\n<\/ul>\n\n\n\n<div style=\"height:50px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"downloading-datasets\" style=\"font-size:52px\">Downloading Datasets<\/h2>\n\n\n\n<div style=\"height:15px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<p style=\"font-size:22px\">A common access point for large datasets is an online database. Often, an online database will provide a search interface that allows users to filter a dataset and access or download only the data they need for their research.&nbsp;<\/p>\n\n\n\n<p style=\"font-size:22px\">Download one or more of the datasets below in order to become familiar with the process.<\/p>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p style=\"font-size:22px\"><strong>Electric vehicle population data: <\/strong><a href=\"https:\/\/catalog.data.gov\/dataset\/electric-vehicle-population-data\">https:\/\/catalog.data.gov\/dataset\/electric-vehicle-population-data<\/a><\/p>\n\n\n\n<p style=\"font-size:22px\"><strong>Energy Hourly consumption data:<\/strong> <a href=\"https:\/\/www.kaggle.com\/datasets\/robikscube\/hourly-energy-consumption\">https:\/\/www.kaggle.com\/datasets\/robikscube\/hourly-energy-consumption<\/a><\/p>\n\n\n\n<p style=\"font-size:22px\"><strong>Greenhouse gas emissions data:&nbsp;<\/strong><a href=\"https:\/\/www.kaggle.com\/datasets\/unitednations\/international-greenhouse-gas-emissions\">https:\/\/www.kaggle.com\/datasets\/unitednations\/international-greenhouse-gas-emissions<\/a><\/p>\n\n\n\n<p style=\"font-size:22px\"><strong>Bike sharing data:<\/strong><a href=\"https:\/\/code.datasciencedojo.com\/datasciencedojo\/datasets\/tree\/master\/Bike%20Sharing\"> https:\/\/code.datasciencedojo.com\/datasciencedojo\/datasets\/tree\/master\/Bike%20Sharing<\/a><\/p>\n\n\n\n<div style=\"height:50px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"data-cleaning\" style=\"font-size:52px\">Data Cleaning in Spreadsheet Softwares<\/h2>\n\n\n\n<div style=\"height:15px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<p style=\"font-size:22px\"><strong>The video or instructions below will show how to:<\/strong><\/p>\n\n\n\n<ol style=\"font-size:22px\" class=\"wp-block-list\">\n<li style=\"font-size:22px\"><strong>Import data<\/strong> into Google Sheets<\/li>\n\n\n\n<li style=\"font-size:22px\"><strong>Rename columns headers<\/strong><\/li>\n\n\n\n<li style=\"font-size:22px\"><strong>Delete columns<\/strong><\/li>\n\n\n\n<li style=\"font-size:22px\"><strong>Combine categories<\/strong><\/li>\n<\/ol>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-2 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\"><\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:722px\">\n<figure class=\"wp-block-embed is-type-rich is-provider-embed-handler wp-block-embed-embed-handler\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" src=\"https:\/\/ncsu.hosted.panopto.com\/Panopto\/Pages\/Embed.aspx?id=056353fa-6ab0-4386-99f4-ad42014a2ce7\" height=\"405\" width=\"720\" style=\"border: 1px solid #464646;\" allowfullscreen allow=\"autoplay\"><\/iframe>\n<\/div><\/figure>\n\n\n\n<p style=\"font-size:22px\">Reference: [1]<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\"><\/div>\n<\/div>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\" style=\"font-size:22px\">Preparing Data with Spreadsheet Softwares<\/h2>\n\n\n\n<p style=\"font-size:22px\">Often, the dataset you download in its original form doesn\u2019t fit your data analysis needs exactly. You may want to make surface-level changes to this \u201craw\u201d dataset so that it\u2019s easier to work with. This process is called \u201ccleaning\u201d your data, or preparing it for data analysis.<\/p>\n\n\n\n<p style=\"font-size:22px\">It is important to remember that <strong>cleaning data should not change the values<\/strong> in any way.<\/p>\n\n\n\n<div style=\"height:10px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<ul style=\"font-size:22px\" class=\"wp-block-list\">\n<li style=\"font-size:22px\">Renaming column headers<\/li>\n<\/ul>\n\n\n\n<p style=\"font-size:22px\">Remember, when naming column headers, we want to avoid using spaces or special characters (! , * are examples of special characters) so that it is easy for the computer to read.<\/p>\n\n\n\n<div style=\"height:10px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<ul style=\"font-size:22px\" class=\"wp-block-list\">\n<li style=\"font-size:22px\">Removing unnecessary columns<\/li>\n<\/ul>\n\n\n\n<p style=\"font-size:22px\">There may be columns in the dataset that you do not plan to use. You can delete those so that the size of the file is smaller and easier to work with.<\/p>\n\n\n\n<div style=\"height:10px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<ul style=\"font-size:22px\" class=\"wp-block-list\">\n<li style=\"font-size:22px\">Combining categories<\/li>\n<\/ul>\n\n\n\n<p style=\"font-size:22px\">Sometimes the dataset is more specific than we need.<\/p>\n\n\n\n<div style=\"height:50px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"exploratory-data\" style=\"font-size:52px\">Exploratory Data Analysis<\/h2>\n\n\n\n<div style=\"height:15px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<p style=\"font-size:22px\">Google Sheets and Microsoft Excel have a tool called PivotTables that help you easily <strong>calculate, summarize, and analyze data<\/strong>. You see comparisons, patterns, and trends in your data and then use those visualizations in reports or papers.<\/p>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-3 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\"><\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:722px\">\n<figure class=\"wp-block-embed is-type-rich is-provider-embed-handler wp-block-embed-embed-handler\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" src=\"https:\/\/ncsu.hosted.panopto.com\/Panopto\/Pages\/Embed.aspx?id=1e1886b6-ce2b-4687-bcd6-ad4701266857\" height=\"405\" width=\"720\" style=\"border: 1px solid #464646;\" allowfullscreen allow=\"autoplay\"><\/iframe>\n<\/div><\/figure>\n\n\n\n<p style=\"font-size:22px\">Reference: [2]<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\"><\/div>\n<\/div>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"python\" style=\"font-size:52px\">Python<\/h2>\n\n\n\n<div style=\"height:15px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<p style=\"font-size:22px\">Many people use spreadsheet applications like Google Sheets or Microsoft Excel to store, analyze, and visualize data. They are relatively easy to learn and use and have built-in advanced data analysis capabilities. Spreadsheet applications use a graphic user interface with drop-down menus and toolbars, so they can be more familiar to many of us.<\/p>\n\n\n\n<p style=\"font-size:22px\"><strong>Python<\/strong> is an <a href=\"https:\/\/opensource.com\/resources\/what-open-source\">open-source<\/a> programming language that can also be used for data analysis purposes. There can be a steep learning curve at first, but it is a powerful tool.&nbsp; Python and other programming languages work well for:<\/p>\n\n\n\n<ul style=\"font-size:22px\" class=\"wp-block-list\">\n<li style=\"font-size:22px\">Doing research with large datasets<\/li>\n\n\n\n<li style=\"font-size:22px\">Automating or repeating the same processes many times<\/li>\n\n\n\n<li style=\"font-size:22px\">Showing your work so others can reproduce your research<\/li>\n<\/ul>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-4 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\"><\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:722px\">\n<figure class=\"wp-block-embed is-type-rich is-provider-embed-handler wp-block-embed-embed-handler\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" src=\"https:\/\/ncsu.hosted.panopto.com\/Panopto\/Pages\/Embed.aspx?id=1fc37086-39ba-4841-a012-ad440148be27\" height=\"405\" width=\"720\" style=\"border: 1px solid #464646;\" allowfullscreen allow=\"autoplay\"><\/iframe>\n<\/div><\/figure>\n\n\n\n<p style=\"font-size:22px\">Reference: [3]<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\"><\/div>\n<\/div>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<p style=\"font-size:22px\"><strong>References<\/strong><\/p>\n\n\n\n<p style=\"font-size:22px\">[1] Cahoon, C. <em>Data Cleaning in Google Sheets<\/em>. Panopto. https:\/\/ncsu.hosted.panopto.com\/Panopto\/Pages\/Viewer.aspx?id=056353fa-6ab0-4386-99f4-ad42014a2ce7&amp;start=0 (accessed 2025-01-22).<\/p>\n\n\n\n<p style=\"font-size:22px\">[2] Cahoon, C. <em>Exploratory Data Analysis in Google Sheets<\/em>. Panopto. https:\/\/ncsu.hosted.panopto.com\/Panopto\/Pages\/Viewer.aspx?id=1e1886b6-ce2b-4687-bcd6-ad4701266857&amp;start=0 (accessed 2025-01-22).<\/p>\n\n\n\n<p style=\"font-size:22px\">[3] Cahoon, C. <em>Using Google Colabs to work with Python<\/em>. Panopto. https:\/\/ncsu.hosted.panopto.com\/Panopto\/Pages\/Viewer.aspx?id=1fc37086-39ba-4841-a012-ad440148be27&amp;start=0 (accessed 2025-01-22).<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Overview Navigation Summary Manipulating datasets in spreadsheet software is essential for transforming raw data into actionable insights. It supports informed decision-making, improves efficiency through automation, organizes data for clarity, and uncovers trends for problem-solving. Spreadsheet manipulation enhances communication by making visualizations and summaries more interpretable. Spreadsheet\u2019s versatility across industries highlights their critical role in ensuring [&hellip;]<\/p>\n","protected":false},"author":7544,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_acf_changed":false,"footnotes":""},"class_list":["post-515","page","type-page","status-publish","hentry"],"acf":[],"_links":{"self":[{"href":"https:\/\/wordpress-projects.wolfware.ncsu.edu\/ncsuereseng\/wp-json\/wp\/v2\/pages\/515","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/wordpress-projects.wolfware.ncsu.edu\/ncsuereseng\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/wordpress-projects.wolfware.ncsu.edu\/ncsuereseng\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/wordpress-projects.wolfware.ncsu.edu\/ncsuereseng\/wp-json\/wp\/v2\/users\/7544"}],"replies":[{"embeddable":true,"href":"https:\/\/wordpress-projects.wolfware.ncsu.edu\/ncsuereseng\/wp-json\/wp\/v2\/comments?post=515"}],"version-history":[{"count":75,"href":"https:\/\/wordpress-projects.wolfware.ncsu.edu\/ncsuereseng\/wp-json\/wp\/v2\/pages\/515\/revisions"}],"predecessor-version":[{"id":1957,"href":"https:\/\/wordpress-projects.wolfware.ncsu.edu\/ncsuereseng\/wp-json\/wp\/v2\/pages\/515\/revisions\/1957"}],"wp:attachment":[{"href":"https:\/\/wordpress-projects.wolfware.ncsu.edu\/ncsuereseng\/wp-json\/wp\/v2\/media?parent=515"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}