Discuss how data curation can lead to new discoveries in disparate data sets.

This competency assessment assesses the following Outcome: IN401M5-5:. Module 5 Assessment Part 1: Assessment Details All resources needed can be found in the Course Resources area of the course under Course Documents. You must have VirtualBox and the VirtualBox version of the Cloudera QuickStart virtual machine installed to complete item 5 in this assessment. Part A: Data Integration In this assessment, you will complete the following steps using provided code Start your virtual machine and securely copy the world_dev_indicators.csv and the income_metadata.csv file to the Quickstart The command may look like: Secure copy, using the scp command, the world_dev_indicators.csv and income_metadata.csv file to your virtual environment. Type a command similar to scp word_dev_indicators.csv cloudera@192.168.7.114:world_dev_indicators.csv Type a command similar to scp income_metadata.csv cloudera@192.168.7.114:income_metadata.csv Your IP address will vary depending on your network. Log into your virtual environment and open a terminal window. At the terminal prompt, log into the MySQL database by typing mysql -uroot -pcloudera. Notice the spacing. Once logged in, create a relational database for this class by typing CREATE DATABASE in401; Use the new database by typing the following command: USE in401; Create the relational tables that will be used by typing the below commands while in MySQL: CREATE TABLE world_dev_indicators ( countryname VARCHAR(256) NOT NULL, countrycode VARCHAR(256) NOT NULL, indicatorname VARCHAR(256) NOT NULL, indicatorcode VARCHAR(256) NOT NULL, year2010 DECIMAL(10,4), year2011 DECIMAL(10,4), year2012 DECIMAL(10,4), year2013 DECIMAL(10,4), year2014 DECIMAL(10,4), year2015 DECIMAL(10,4), year2016 DECIMAL(10,4), year2017 DECIMAL(10,4), year2018 DECIMAL(10,4), year2019 DECIMAL(10,4) ); Create the relational tables that will be used by typing the below commands while in MySQL: CREATE TABLE income_metadata ( countrycode VARCHAR(256) NOT NULL, region VARCHAR(256), incomegroup VARCHAR(256), ); Exit MySQL and return to the cloudera user’s home directory where the file was securely copied to. From there, verify the file is in the directory and type the below command: mysqlimport –fields-terminated-by= ‘|’ –local -u root -p in401 world_dev_indicators.csv mysqlimport –fields-terminated-by=’|’ –local -u root -p in401 income_metadata.csv In your virtual environment, log back into MySQL to verify the import worked. A command such as SELECT COUNT(*) from world_dev_indicator; should work and show 20064 rows. In your virtual environment, log back into MySQL to verify the import worked. A command such as SELECT COUNT(*) from income_metadata; should work and show 263 rows. Complete the follow questions based on the results of the query: How many rows were returned? What are the filtering conditions listed in the query? What other attributes could you add to this query to extend the data to answer broader questions? Name two additional data sources that might extend the value of this data set from https://data.worldbank.org/indicator. Part B: Data Insights Based on the results from Part A (contained in combined_indicators_data.csv), provide the answers to the following questions: Outline three of the common populated indicators across Sub-Saharan Africa as a region and the low-income group. Please highlight two reasons why these indicators might be common for this region and income group. Analyze the indicator – Mortality rate, under-5 (per 1,000 live births) for the low-income group. Is it populated for all countries? Is this indicator reducing over the years or increasing? Which countries have the largest change? Place Part A (screen shot of successful query) and Part B answers in a single Word document. Module 5 Assessment Part 2: Assessment Details All resources needed can be found in the Course Resources area of the course under Course Documents. The World Development Indicators and Income data sets when combined provide an extended view of different countries, how these countries have developed over time, and income by region. This assessment further explores new data insights from the integration of these data sets. Part A: Creating Visualizations Follow the instructions below. Create Chart 1 Open the combined_indicator_data.csv in Excel Choose two countries within the same region and for indicator CO2 emissions from liquid fuel consumption (kt) to analyze Create a single line graph for the years 2010-2015 with the indicator values on the y axis and the year on the x axis. There should be two lines on the graph – one for each region Title the chart appropriately Include a legend to identify the regions Create Chart 2 Using combined_indicator_data.csv, create another line graph for indicator CO2 emissions from liquid fuel consumption (kt) Use the same two countries from the same region and the same income group from chart 1 for years 2015-2019 Follow the same directions for chart 1 Part B: Data Discovery. Using the combined_indicator data.csv, answer the following questions by pasting into a Word document and directly after include your answer. What country in Europe & Central Asia has the largest drop in CO2 emissions from liquid fuel consumption (kt) in the years 2010-2015? What income group in Europe & Central Asia has the largest drop in CO2 emissions from liquid fuel consumption (kt) in the years 2015-2019? What region has the largest drop in CO2 emissions from liquid fuel consumption (kt) in the years 2010-2019? The two line charts from Part A and the Data Discovery observations from Part B are to be placed in a single Word document.

Get Your Custom Essay Written From Scratch
Are You Overwhelmed With Writing Assignments?
Give yourself a break and turn to our top writers. They’ll follow all the requirements to compose a premium-quality piece for you.
Order Now