DATA MINING AND DATA ANALYSIS KEY FRAMEWORKS AND CONCEPTS COVERED IN MODULES CIS8008



















DescriptionPossible Marks and  Wtg(%)WordDue date
  Count 
Assignment 2 Written Practical Report100 marks 30% Weighting350005/09/16
    
Get Assignment help for this assignment at hndassignmenthelp@gmail.com

The key frameworks and concepts covered in modules 1–5 are particularly relevant for this assignment. Assignment 2 relates to the specific course learning objectives 1, 2 and 4 and associated MBA program learning goals and skills: Global Content, Problem solving, Critical thinking, and Written Communication at level 3:
  1. Demonstrate applied knowledge of people, markets, finances, technology and management in a global context of business intelligence practice (data warehouse design, data mining process, data visualisation and performance management) and resulting organisational change and how these apply to implementation of business intelligence in organisation systems and business processes.


 
  1. Identify and solve complex organisational problems creatively and practically through the use of business intelligence and critically reflect on how evidence based decision making and sustainable business performance management can effectively address real world problems.


 
  1. Demonstrate the ability to communicate effectively in a clear and concise manner in written report style for senior management with correct and appropriate acknowledgment of main ideas presented and discussed.

Note you must use Rapid Miner Studio for Task 1 and Tableau Desktop for Task 3 in this Assignment 2. Failure to do so may result in Task 1 and/or 3 not being marked and zero marks awarded.

 

Note carefully University policy on  AcademiMisconduct such as plagiarism, collusion and cheating. If any of these occur they will be found and dealt with by the  USQ Academic  Integrity Procedures. If proven, Academic Misconduct may result in failure of an individual assessment, the entire course or exclusion from a University program or programs.

 

Assignment 2 consists of three main tasks and a number of sub tasks

 

Task 1 Exploratory data analysis and Decision Tree Analysis (Worth 35 Marks)



Task 1a) Identify, critically review and discuss literature which determines higher adult income. This research will inform your assessment and identification of the key variables for determining higher level adult incomes in the data set adult-income.csv. I suggest you relate your discussion here to the variables in the adult-income.csv data set where possible (About 600 words).

Task1 b) Conduct an exploratory data analysis of the adult-income.csv data set using the RapidMiner Studio data mining tool to understand the characteristics of each variable and the relationship of each variable to the other variables in the data set adult-income.csv. Summarise the findings of your exploratory data analysis in terms of describing key characteristics of each of the variables in the adult-income.csv data set such as maximum, minimum values, average, standard deviation, most frequent values (mode), missing values and invalid values etc and relationships with other variables if relevant in a table named Task 1b Results of Exploratory Data Analysis for Adults Income Data Set.

Hint: The Statistics Tab and the Chart Tab in RapidMiner will provide descriptive statistical information and useful charts like Barcharts, Scatterplots etc. You might also like to look at running some correlations and chi square tests. Indicate in this Table which variables you consider to be the top five key variables and which contribute most to determining whether an adult is on a high income over $50,000. Note in completing Task 1b you will find it useful to refer to the data dictionary for the Adult Income provided in this document which defines each of the variables in terms of their data type and range of values.

 

Briefly discuss the key results of your exploratory data analysis and how you have selected your five top variables for predicting adult income higher than $50,000. (About 600 words)

 

Task 1c) Build a Decision Tree model for predicting higher Adult incomes using RapidMiner and an appropriate set of data mining operators and a reduced adult-come.csv data set determined by your exploratory data analysis in Task 1b. Provide these outputs from RapidMiner (1) Final Decision Tree Model process, (2) Final Decision Tree diagram, and (3) associated decision tree rules.

 

Briefly explain your final Decision Tree Model Process, discuss the results of the Final

 

Decision Tree Model drawing on the key outputs (Decision Tree Diagram, Decision Tree Rules) for predicting higher Adult Income and relevant supporting literature on the interpretation of decision trees (About 600 words).

 

Note For Task 1b and Task 1c completing the Tutorial Activities for RapidMiner and postings on the Assignment 2 discussion forum will assist you in determining what are appropriate RapidMiner operators to use.

 

Include all appropriate RapidMiner outputs such as RapidMiner Processes, Graphs and Tables that support the key aspects of your exploratory data analysis and decision tree model analysis of the adult-income.csv data set in Appendix A. Note you need export these outputs from

 

RapidMiner using the File/Print/Export Image option and include where relevant in Task 1 and Appendix A of the Assignment 2 report.



Table 1 Adult Income Data Dictionary


























































































































































































































Variable NameType and description ofRange of values
  variable  
1.SalaryGreater50KNominal, Target/Label>50K, <=50K.
2.ageInteger, Age of Adultcontinuous
3.workclassPolynominal, Category ofPrivate, Self-emp-not-inc, Self-emp-
  work classinc, Federal-gov, Local-gov, State-
   gov, Without-pay, Never-worked.
4.fnlwgtInteger, final weighting forcontinuous.
  each adult income record  
5.educationPolynominal,Category ofBachelors, Some-college, 11th, HS-
  Education level obtainedgrad, Prof-school, Assoc-acdm,
   Assoc-voc, 9th, 7th-8th, 12th, Masters,
   1st-4th, 10th, Doctorate, 5th-6th,
   Preschool.
6.education-numInteger, education levelcontinuous.
  ranking  
7.marital-statusPolynominal, CategoryMarried-civ-spouse, Divorced, Never-
  Martial status of Adultmarried, Separated, Widowed,
   Married-spouse-absent, Married-AF-
   spouse.
8.occupationPolynominal, Category ofTech-support, Craft-repair, Other-
  occupation of each Adultservice, Sales, Exec-managerial, Prof-
   specialty, Handlers-cleaners, Machine-
   op-inspct, Adm-clerical, Farming-
   fishing, Transport-moving, Priv-
   house-serv, Protective-serv, Armed-
   Forces.
9.relationshipPolynominal, Category ofWife, Own-child, Husband, Not-in-
  Adult relationshipfamily, Other-relative, Unmarried.
10. racePolynominal, Race of each, White, Asian-Pac-Islander, Amer-
  AdultIndian-Eskimo, Other, Black.
11. genderNominalFemale, Male.
12. capital-gainInteger, capital gain for eachcontinuous.
  adult  
13. capital-lossInteger, capital loss for eachcontinuous.
  adult  
14. hours-per-weekInteger, hours per weekcontinuous.
  worked by each adult  
15. native-countryPolynominal, Native countryUnited-States, Cambodia, England, 
  of each adultPuerto-Rico, Canada, Germany, 
   Outlying-US(Guam-USVI-etc), India, 
   Japan, Greece, South, China, Cuba, 
   Iran, Honduras, Philippines, Italy, 
   Poland, Jamaica, Vietnam, Mexico, 
   Portugal, Ireland, France, Dominican- 
   Republic, Laos, Ecuador, Taiwan, 
   Haiti, Columbia, Hungary, Guatemala, 
   Nicaragua, Scotland, Thailand, 
   Yugoslavia, El-Salvador, 
   Trinadad&Tobago, Peru, Hong, 
   Holland-Netherlands. 

Task 2 Data Warehousing Architecture Design (Worth 35 Marks)


 

A data warehouse is the foundation of any Business Intelligence or Business Analytics initiative. Consider the following scenario:

 

A large regional University consisting of five divisions (Academics, Academic Services, Students, Research and Campus Services), with a number of functional groups within each division. There are many different data sets residing in functional groups within the five divisions. They want high level advice on the logical design of a data warehouse architecture that will meet their reporting and decision making needs into the future.

 

Task 2a) Discuss the Kimball Model versus the Inmon Model as possible approaches by considering relative advantages and disadvantages of each approach with appropriate in-text reference support that could be used for designing and developing a data warehouse architecture that would meet the reporting and decision making needs of a large regional University described above (about 1000 words)

 

Task 2b) Provide a high level diagrammatic representation of your proposed data warehouse architecture design for a large Regional University as outlined above

 

Task 2c) Describe and justify your proposed data warehouse architecture design for a large regional University presented diagrammatically for Task 2b with appropriate in-text referencing support (about 500 words)

 

Note that the coverage of these concepts in the textbook Chapter 2 Data Warehousing is somewhat limited and dated and may not be current thinking for such a fast moving field. Hence you will need to research and critically review the current literature in relation to the concept of a data warehouse, different data warehouse design architectures and data warehouse design and implementation methodologies in more detail.

 

Task 3 Global Bike Company Sales Reports using Tableau Desktop (Worth 20 Marks)

 

The bicycles-sales.xlsx file provided for Assignment 2 on course study desk contains the following dimensions and information:

 














RegionSales Date
Sub RegionSales Period
MarketList Price
Business SegmentUnit Price
CategoryOrder Quantity
ModelSales Amount
Color 

 

With the bicycles-sales file use Tableau Desktop to produce four sales reports:

 

Task 3a) Create a sales report in a Text Table or Graph view that lists by sub region, business segment and model for all mountain bikes sold for the years 2002, 2003 and 2004. Analyse this sales report and comment on key trends and patterns that are apparent (about 500 words).

Task 3b) Create a sales report in a Text Table or Graph view that lists by region, sub region, business segment, order quantity and sales amount for all bicycle clothing for the years 2002, 2003 and 2004. Analyse this sales report and comment on key trends and patterns that are apparent (about 50 words).

 

Task 3c) Create a sales report in a Text Table or Graph view that lists by region, sub region, business segment and model for all the road bicycles in order of the total sales and profit for years 2002, 2003 and 2004. Analyse this sales report and comment on key trends and patterns that are apparent (about 50 words).

 

Task 3d) Create a sales report in in a Text Table or Graph view that lists by category, model, colour and order quality for all bicycles for the years 2002, 2003 and 2004. Analyse this sales report and comment on key trends and patterns that are apparent (about 50 words).

 

Note that you need provide a copy of the Text Table or Graph view of each sales report in your Assignment 2 report for the relevant sub Tasks 3a, 3b, 3c and 3d. The Tableau Menu Option Worksheet and then Copy or Export Options will allow you to copy and paste the view for each sales report into relevant section of Task 3 for your Assignment 2 report.

 

Your assignment 2 report must be structured in report format as follows:

 
  1. Title Page for Assignment 2 Report

  1. Table of Contents

  1. Body of report –main sections and subsections for assignment 2 tasks and sub tasks so


 
  • Task 1 will be a main heading with appropriate sub headings etc....for each sub task etc..

  • Task 2 …

  • Task 3 ….


 
  1. List of References

  1. List of Appendices


 

You need to submit two files for Assignment 2:

 
  1. Assignment 2 Report for Tasks 1, 2 and 3 in Word document format with the extension .docx

  1. Tableau packaged workbook with the extension .twbx contains four required sales reports for Task 3

Get Assignment help for this assignment at hndassignmenthelp@gmail.com

Use the following file naming convention:
  1. Student_no_Student_name_CIS8008_Ass2.docx and

  1. Student_no_Student_name_CIS8008_Ass2.twbx



Comments