MIS772 Predictive Analytics Assignment A2 1 of 4 Assignment A2: SAS Enterprise Miner After this workshop consisting of sessions in modules M2 and M3 students will understand how to use SAS Enterprise Miner (SAS EM) to explore data, gain insights into the problem domain and make predictions based on such insights. The workshop will rely on students' knowledge of methods and techniques introduced in a series of seminar. During the workshop (on-campus and on-cloud) students will work in teams of up to 3 members. They'll be given some tasks and in groups will use SAS EM to achieve them. Students arriving late will work on their own! Before attending this tutorial, students are required to be familiar with: Kattamuri S. Sarma (2013): Predictive Modeling with SAS Enterprise Miner: Practical Solutions for Business Applications, Second Edition. SAS Institute. Activities – No late arrivals for the on-campus sessions! Topic 1. Learn how to use Deakin AppsOnDemand and SAS Enterprise Miner, create project and library folders on your home drive. Before Workshop 2. The workshop facilitator will explain the case in the focus of this assignment. Work in groups of up to 3. M2T1, M2T2 SAS EM Decision Trees & Model Comparison 3. Learn SAS EM and the role of nodes to read and manipulate data from libraries, clean and transform this data, produce statistics and charts. Learn to create decision trees, regression and neural network models. Gain hands-on experience in model validation and comparison of models' performance. 4. Explore SAS EM facilities for data exploration and dimensionality reduction with data clustering. Use Ward's of hierarchical cluster analysis to determine number of clusters for k-means clustering. Learn how to profile and validate data clusters using CCC statistic. M2T3 Clustering 5. Learn SAS EM methods of text analytics. Explore aspects of text parsing, using of stop rules, different ways of representing and analysing text. Filter text to reduce its complexity. Create text clusters and topics. Visualise results. Conduct customer propensity analysis. Note that topic and rule formation in text analytics follow the principles of association rule creation. M3T1-M3T2 Text Analytics, Association & Sequence 6. Learn how to evaluate and compare individual predictive models. Deploy large models into production. Integrate several predictive models into ensembles. Conduct validation and testing of ensemble models. Visualise and interpret the results. M3T3 Model Comparison & Ensembles 7. As a team, prepare a report of your findings using the provided template. Executive summary should offer interpretation and justification of results. Your forms should include screen shots of SAS EM analytic processes, tables and charts produced. Report and Executive Summary 8. Teams have to submit a single submission of teams' work via CloudDeakin dropbox (possibly in multiple versions submitted weekly or daily), Submissions must include team member's names, student numbers and the group ID. Submission Objectives Methods Prerequisites Workshop Schedule MIS772 Predictive Analytics Assignment A2 2 of 4 The following mini case study will be used in assignment A2. The workshop material for topics M2T1-M3T3 is presented in a separate handout. All amendments, extensions and assumptions should be recorded in the final submission. Business Scenario A marketing company has been commissioned by a number of popular airlines to understand customer satisfaction and feedback. The data set airlines.sas7bdat and airlines.csv (see CloudDeakin) contains responses from a survey evaluating customer satisfaction with their airline travels. The data set contains 1,474 observations and 11 variables. The metadata is given in the table below. The airlines would like to know based on the customer survey and feedback whether the customers would recommend their airline or not and what is their perception of value for money. In particular, they are interested in incorporating the unstructured variable Review into any predictive modelling, as they are of the opinion there is a lot of meaningful information in there. Assessment Objective As a data scientist for the marketing company, your role is to determine the propensity for a particular customer to recommend the airline they travelled with. The airlines would also like a list of their customers with a probability score that they will recommend their airline. They are also interested in improving the quality of their services and the likely impact of issues that may develop in the company logistics and during the flight. Questions Q1. Describe the business problem and the potential value of the predictive model to the client. Propose an analytic solution to the problem and support your recommendation with references to the conducted data and text analytics. Q2. Explore the sample data using descriptive statistics, frequency plots and cluster analysis. Specifically identify any missing, anomalous or inconsistent data characteristics, explaining the potential impact. Q3. Describe any treatments or transformations undertaken to resolve, missing, anomalous or inconsistent data characteristics. Assignment Case Study MIS772 Predictive Analytics Assignment A2 3 of 4 Q4. Perform text analytics on the "Review" data item, generating at least 5 topic clusters. Provide a description for each of the clusters generated. Q5. Develop at least three analytic models to predict whether or not the customers are likely to recommend the airline services and their perception of value for money, for each of the following combinations of input characteristics: a. Using only the structured data (using appropriate columns) b. Using only the text data (using only the generated text topics and clusters) c. Using both structured and text data Q6. For all models provide a summary of the model assessment statistics over the training and validation data sets. Q7. Select the best predictive model, possibly an ensemble model, and provide a summary of the model and its performance. Both on-campus and off-campus students will work in teams created for the duration of assignment A2. Workshops will support the assignment work. Use forms provided as a template – deviation from the format is acceptable, however, the page limit and readability of each section must be preserved. Teams must submit the assignments via CloudDeakin dropbox end of trimester by the indicated deadline. You will be assessed as a team, with equal share. Ensure that your team's work is unique. As this is a team's effort, no extensions will be possible. Weekly contribution of all team members is necessary and must be documented. All teams, whether on-campus or off-campus must lodge weekly minutes of meetings to CloudDeakin's discussion area with a prominent title "Minutes of Meeting 1 May 2016", for example. The post should include the following information: Date and time of the meeting Location, either virtual or face-to-face Attending team members and apologies from others Review of tasks allocated to individual team members Issues discussed and actions taken, especially issues to do with the lack of progress, incompletion of tasks and rescheduling of tasks, screen shots of diagrams, charts, tables and results discussed Allocation of new tasks and rescheduled tasks and team members' responsibilities and due dates for their completion Date and time of the next meeting Your lecturer will acknowledge the team's weekly reports, note the lack of progress and respond to the reported issues. It is important that all team members keep in touch and actively communicate with their teams and complete the assigned tasks on time. Failure to contribute to the team effort or failure to lodge weekly reports may result in the expulsion from the team or splitting the team, which can only be initiated by the lecturer. Non-contributing team members will be allocated to a team of one. Note that there are is no relief or dispensation for any team of less than 3 members, the deliverables will always be the same. On-Campus/ Off-Campus Submission MIS772 Predictive Analytics Assignment A2 4 of 4 The assessment of the submitted assignment work will use the following rubric. Note that the solution may fit on one or more EM models, submit an XML file for each. Assessment Machine Learning in SAS Enterprise Miner Typical Distribution of Marks 5% of students 20% 30% 20% 20% 3% 2 % Exceptional Meets Expectations Acceptable Needs Improvement Unacceptable 10 8 5 2 0 Exec Summary: Q1 The executive summary, its insights and arguments all fit on one page. Exceptional quality presentation of the entire report and included XML files. The summary is clear and convincing. Aimed at the management reader. All decisions and recommendations identifiable. All reported aspects can be justified by tracing them back to data and text analytics. Summary of findings and recommendations based on those findings provided. All aspects crossreferenced with tables and charts. Few aspects identified and briefly described. Some errors and omissions. Aspects not identified or incomprehensible. 20 16 10 4 0 Data Prep: Q2 & Q3 Exceptional quality presentation on one page. Crisp, short and to the point. Showing expert knowledge of data, tools and methods. All data exploration was supported with charts and tables, identifying problems in data, which were then eliminated. XML files supplied to reproduce the results. The supplied data set was explored in preparation for analysis in EM. Few aspects identified and briefly described. Some errors and omissions. Aspects not identified or incomprehensible. Modelling did not rely on SAS Enterprise Miner. 30 24 15 6 0 Text Analytics: Q4 Exceptional quality presentation on three pages. Crisp, short and to the point. Showing expert knowledge of data, tools and methods. The model was optimised to give the best results. Cluster analysis was conducted over the provided text. The model was evaluated, results presented, explained and justified. XML files supplied to reproduce the results. Partial solution submitted on time. The required text analytics methods were applied to the provided data in EM. Few aspects identified and briefly described. Some errors and omissions. Aspects not identified or incomprehensible. Modelling did not rely on SAS Enterprise Miner. 20 16 10 4 0 Predictive Models: Q5 & Q6 Exceptional quality presentation on three pages. Crisp, short and to the point. Showing expert knowledge of data, tools and methods. The models were cross-validated and optimised to give the best results. At least three predictive models were developed based on text only, structured data only and mixed data. Each model was evaluated, results presented, explained and justified. XML files supplied to reproduce the results. At least two predictive models were developed based on text only, and structured data only. Both models were evaluated, results presented. Partial models based on structured data not submitted on time. Few aspects identified and briefly described. Some errors and omissions. Aspects not identified or incomprehensible. Modelling did not rely on SAS Enterprise Miner. 20 16 10 4 0 Model Comparison: Q7 Exceptional quality on two pages. Models were fully integrated to provide a highly cohesive analytic report. Ensemble models used as appropriate. All developed models were compared for their performance, best models selected and their results generated to solve a business problem. XML files supplied to reproduce the results. All developed models were compared for their performance and results reported. Few aspects identified and briefly described. Some errors and omissions. Aspects not identified or incomprehensible. Modelling did not rely on SAS Enterprise Miner.
[button color="#7299bf" background_color="#568ebf" style="1" size="small" icon="no" icon_upload="" target="_blank" url=" http://www.hndassignmenthelp.com.au/contact-us/ "]ORDER NOW[/button]