Data science. Many have probably heard the term, but most may not know what it means exactly. This new and high demand field emerged from the combination of new technologies that collect and organize data and the desire to better understand and improve business, marketing, and even the human condition. Just about every sector of the economy relies on data, which means professionals who are able to analyze and use big data effectively are highly sought after. According to McKinsey Global Institute, there will be a talent shortage of nearly 200,000 trained data professionals and 1.5 million data analysts by 2019. With that in mind, the popularity – and availability – of data science master’s degree programs is quickly growing.
The following guide offers an overview of master’s degree programs in this expanding field. Prospective students can learn more about program goals and outcomes, review curriculum paths, and get information to decide whether a PhD or corporate job is the right choice after completing their master’s degree.
To meet the growing demand for skilled data analysts, colleges and universities have created master’s programs in data science. Today, there are approximately 70 graduate programs in data science analytics across the country. The main goal of these programs is to equip graduates with in-depth knowledge of data science as well as technical skills in statistical analysis. Upon graduation, students are proficient in collecting, organizing, managing and extracting insights from large, complex data sets. Students also develop hands-on experience with industry tools, statistical techniques, and working in a variety of data environments. Below is a list of the core learning outcomes of data science master’s degree programs:
Computer science is the foundation of gathering, moving, and transforming data. Students gain familiarity with object-oriented computer programming and database languages, such as SQL, Python, Java, Hadoop, Pig, and Hive. The goal is to teach students how to gather, manipulate, and prepare data for analysis.
Students study the concepts and principles of statistical and mathematical analysis. This includes core subjects in mathematics, including linear algebra, discrete mathematics, and statistics. Understanding the mathematical underpinnings of the field form the basis for predictive modeling and machine learning techniques.
Students learn how to organize and manage, unstructured data sets. Managing data requires understanding database and data warehousing tools, such as SAS, SQL, Informatica, and Hadoop. Students gain experience designing, developing, and deploying data warehousing systems.
Analyzing data requires an understanding of applied statistics and how to use tools to generate statistical assumptions. Students develop skills working with programming languages, such as Python, C/C++, Java, and Perl.
Complementing statistical and programming skills is the ability to translate complex data into narrative and meaningful visualizations. Students develop fundamental knowledge of visualization theory, different data tools (e.g. R, Tableau, D3), data patterns, reporting, and learn how to communicate technical findings clearly and succinctly.
In order to achieve the goals discussed above, curriculum in a Master of Science in Data Science focuses on quantitative analysis skills, providing students with a broad skill set that can be translated to almost any industry. Typically, students must complete between 30 and 33 credit hours of coursework to earn a graduate degree.
Overall curriculum is integrated, blending together applied statistics and mathematics, business, and computer science. Through this comprehensive curriculum, students gain theoretical knowledge and hands-on experience with today’s modern analytical tools.
Exact coursework and requirements will vary by school, but in general, courses are divided between core coursework (~18 credit hours), required courses (~6 credit hours) and electives (~6 credit hours). Depending on the program, students may also be required to complete a practicum, comprehensive examination, thesis, or capstone project.
Core curriculum covers the fundamentals of data science, such as data mining, data warehousing, statistical programming, applied statistics, and computer programming. Required courses vary by department, but may include key topics such as predictive statistics, data mining applications, and big data analytics. Elective classes allow students to augment their degree program by selecting courses that align with their personal interests and career goals, ranging from topics in strategic marketing to data management in health informatics, systems design to advanced databases.
Below is a list of core concepts and skills grad students gain during their studies:
This course covers topics in software development, programming optimization, and troubleshooting. The class serves as a foundation for understanding software performance and covers topics including C/C++ , libraries and frameworks (e.g. PETSc, BLAS), and debugging tools.
This class examines the design of parallel architectures, helping students further their programming experience. Topics of study include working with shared memory, data flow, and synchronizing parallel machines.
This class introduces students to the core concepts of probabilistic modeling and the applied statistical computational techniques used to analyze data.
This class reviews algorithms used in various statistical techniques to discover and isolate patterns or trends in data. Topics of study include linear regression, classification, variable section, and clustering.
This course is an introduction to working with data, including the challenges faced by data professionals working with large data sets. The course examines distributed computing, techniques for acquiring unstructured data, and statistical software (e.g. Hadoop).
This class reviews the key fundamentals of data analysis, including central concepts: data collection, cleaning and filtering, and statistical techniques. Topics include working with databases, SQL, visualization, and algorithms.
This course teaches students about the applied computational and statistical techniques used to work with large data sets. Topics of study include data normalization, classification, detecting anomalies in data, and predictive modeling.
This class covers techniques related to machine learning and artificial intelligence. Students study data mining methods, analyze machine learning schemes, and develop data mining algorithms.
This class is an in-depth exploration of Structured Query Language (SQL). Topics of study include working with complex SQL queries, writing SQL scripts, evaluating SQL analytical functions, and learning how to use SQL for data warehouse manipulation.
This course is the study of designing and developing data warehouse applications for use with big data. Topics of study include data warehousing trends, architectural components, data quality, data aware house design, as well as hands-on experience building data warehouses.
This class covers the core statistical models used in experimental design. Students learn about statistical sampling, variances, linear regression, and how to think about and communicate analytical findings.
This course introduces students to using computations on data sets to solve problems. Topics are statistics-oriented: regression, collinearity, model selection, and randomization.
This class covers the fundamentals of using statistical tools in SAS. Students learn about multivariate data analysis, how to describe data, data regression and other statistical techniques.
This class focuses on SAS data analysis, teaching students the basic techniques of discrete data analysis and how to apply them to large data sets with statistical software.
This class is a review of the principles and techniques of object-oriented programming. Students study data structures, software development, and focus on a language, such as C+ or Java.
This class teaches students about Python, an open-source scripting language. Students learn about its use in databases, programming, and system administration.
Because data science remains an emerging field, universities across the country continue to launch new programs and refine existing ones. This refinement includes adding concentrations to their degree programs. Although concentrations are not widely available, some programs do offer the option. Pursuing a concentration allows students to specialize their skills and knowledge by linking data science education to academics in other fields, such as business or healthcare. Below is a list of potential concentration areas data science students can pursue.
This specialization blends the study of data science with business intelligence applications. In addition to learning about quantitative methods, predictive modeling, and data management, this concentration provides students with opportunities to develop applied skills in various business areas, such as marketing, finance, or information technology.
The computer science concentration is designed for students with an undergraduate degree in the field. Curriculum concentrates on developing real-world applications, writing queries for large scale data visualization, planning and building data storage systems, and working with data sets across a variety of computing platforms.
A concentration in cybersecurity augments data analytics coursework with instruction in systems and network security, cryptography, cloud computing, randomized algorithms, and computer architecture. In this concentration, students learn how to apply statistical techniques and build analytical models to identify and handle potential cybersecurity threats.
A concentration in data analytics is designed for individuals who want to combine machine learning engineering with large, complex data sets. Students develop an advanced understanding of optimization theory, data structures, and machine learning processes and are able to develop applications for multiple industries.
The healthcare analytics specialization covers data management in healthcare. Students learn about the fundamental principles of healthcare information, including infrastructure, privacy, and quality control. Curriculum broadly covers clinical computing applications for healthcare delivery and database modeling and design for healthcare informatics.
In addition to coursework, students pursuing data science master’s degrees may be required to also complete the following:
Instead of a thesis, students may be asked to complete a capstone project. Capstone projects allow students to translate their theoretical knowledge into real world data analysis. Working in teams, students collaborate to address a problem statement from a sponsoring organization, such as an insurance company or local department of accounting. They are responsible for the entire data analysis process, from organizing and processing data to selecting an appropriate statistical model. Students ultimately devise a solution to the problem and present their results.
A central component of the curriculum is the practicum, either an individual or team-based learning experience. Practicums provide students an opportunity to work on real-world analytics projects. The length and type of practicum depends on the institution. For example, Illinois Institute of Technology requires a 6-credit summer practicum, while the program at North Carolina State University requires an 8-month program. Students tackle an industry-specific business problem and are responsible for collecting, cleaning, analyzing, and reporting on their assigned data set.
Depending on the program, students may be required to complete a thesis project. A thesis consists of original written work, based on the subject. Working under the supervision of a faculty member, students propose a topic and project. They select an area of interest and devise a problem or element to investigate. For example, students might explore mining data off the Internet, the implications of social media data, or data sampling from massive data sets. Once submitted, students usually complete an oral defense of their paper to a committee.
The implications of big data are starting to sweep into every industry and across business functions, placing a high demand on the need for data analytics professionals. According to a 2015 Data & Analytics Report from MIT Sloan Management Review, the need for talent is two-fold: 1) remaining competitive in a data-driven environment and 2) extracting actionable business insights from analytics. Half of survey respondents reported issues transforming data into business intelligence, nearly half (43 percent) reported having a lack of available data science talent, and others reported that managing the increasing volume of data presents technical issues. The MIT Sloan Management report reveals exponential growth is on the horizon for trained and highly skilled data science professionals.
So, what does this mean for freshly minted graduates of master’s in data science programs? The increasing opportunities allow them to be highly selective in their role and professional field. In other words, graduates with a master’s degree are not locked into working solely in the technology sector. The saturation of data across industries presents opportunities to apply their skills in a vast array of arenas. Below is an example list of industries where data scientists are in demand:
A growing industry, biotechnology deals with clinical studies, pharmaceutical research, and emerging sub-areas such as genomics. This research generates massive amounts of datasets that call for the analytical eye of data scientists. Opportunities exist to drive forward advancements in genetics, drug development, and disease treatments.
Energy is a complex, broad industry—one that creates a significant array of data. Companies are competing to create energy, find new sources of energy, and distribute it more efficiently and effectively. Data scientists can help businesses and organizations in the sector by helping them to better organize and analyze data around energy consumption patterns, use data to align supply and demand, and find ways to save costs on energy exploration.
Data science offers new approaches to analysis across all financial markets, ranging from asset management to stock trading. Whether a brokerage firm or multinational corporation, data scientists are leveraged to generate new ways of thinking about data and using machine learning to improve investment performance.
During the past five years, healthcare spending has continually increased, reaching nearly $3 trillion in 2014. The healthcare system is a complex ecosystem of insurers, medical providers, and researchers. Data scientists are needed to find solutions to cut spending, help physicians make evidence-based decisions, and sift through mounds of research information to spur medical breakthroughs.
The longstanding trend in manufacturing has been to outsource factories to countries with lower labor costs. However, big data has introduced a new approach to manufacturing. Data science is used in manufacturing to drive improvements in supply chain management, risk management, and forecasting. Data scientists ensure companies use data analytics to remain competitive as well as cost efficient.
After earning a master’s degree, graduates may be interested in completing a doctoral program in data science. However, this may not be the right path for everyone. According to research from Burtch Works, an executive recruiting company, 92 percent of data scientists hold an advanced degree. Specifically, 48 percent have a PhD and 44 percent hold a master’s degree. A master’s degree is an avenue for employment in the field, which means prospective PhD candidates must possess a burning desire to complete a doctorate.
The PhD in data science is a flexible degree program that enables graduates to pursue employment opportunities in either the private sector or enter positions within academia. PhD programs in data science seek candidates that take active ownership of their academic and professional development.
Below is a list of suggested steps master’s students can take to prepare for a doctoral program in data science.
Membership in professional data science associations serves several purposes. Associations bring together like-minded individuals and offer a range of opportunities and benefits such as continuing education courses, professional certifications, conferences, journals, and opportunities to network. Example associations within data science include the Data Science Association, Digital Analytics Association, and American Statistical Association.
After graduation, continuing education plays a vital role in helping data professionals stay on top of the latest trends and developments. MOOCs—massive open online courses—make it easy for prospective data science PhD students to continue building their skills. Courses and certificates are also available in different areas (e.g. machine learning, data visualization) from various providers, including Coursera and Udacity.
Kaggle is an online community of data scientists. Through competitions, Kaggle invites data scientists to tackle business problems from a diverse list of industries and companies. For example, Dato—a company that builds intelligent applications—sponsored a project where data scientists were asked to analyze a dataset with more than 300,000 HTML files, images, and links to devise improvements to native advertisements.
Gaining experience outside of academia can make an application to a PhD program stand out. Internships in data science are available in nearly every industry. In turn, students should seek internships relevant to their academic specialty or professional interests. For example, here are three data science internships offered by major organizations: Khan Academy (Education), ESPN (Media and Sports), and Pure Storage (Technology).
There’s a major difference between textbooks study and real world knowledge. Prior to applying to a graduate program, actually working in relevant industries is a great way to build practical experience in data science as well as show PhD admission committees that you are serious about pursuing the field.
Completing certification programs is another avenue to develop related and specific knowledge in different areas of data science. Below is an example list of data science certification programs.
In addition to formal training at the graduate level, there are a variety of resources students can utilize to further their education, prepare for a PhD program, or transition into the workforce. The list below includes scholarships, bootcamps, and popular publications for industry professionals.
Because data science is relatively new to the academic landscape, the number of scholarships for master’s students is limited. However, there are scholarships for related fields, such as computer science and mathematics. Below is a short list of example scholarships that might apply to data science students.
Sponsoring organization: Google
Application deadline: December 1, 2015
The Anita Borg Memorial Scholarship was created to encourage the success of women in computing and technology. Applicants must be a full-time female student in an undergraduate or graduate program in computer science or related field.
Sponsoring Organization: US Office of Personnel Management
Application deadline: Varies
In exchange for a service commitment in a government organization, master’s degree students in are eligible to receive up to $32,000 in scholarship funding from SFS. Students also receive a professional development allowance ($3,000), health insurance allowance ($1,200) and a book allowance ($1,000). Students must be enrolled in a formal academic program focused in cybersecurity or information assurance.
Sponsoring Organization: Department of Energy
Amount: Full tuition, stipend and academic allowance
Application deadline: January 19, 2016
The DOE CSGF is a fellowship that provides four years of financial support to students studying in computer science and engineering fields. First year master’s students are eligible to apply. Full tuition is paid at any accredited university, students receive a yearly $36,000 stipend, a $5,000 academic allowance during the first year of study, and a $1,000 allowance for year subsequent year of study.
Sponsoring Organization: National Security Agency
Application deadline: December 31, 2015
The NSA sponsors a scholarship to students enrolled in master’s degree programs in computer science and mathematics. Students must be a US citizen and meet minimum GPA requirements.
Bootcamps are short, on-site, intensive training programs that can help students refine and advance their skill sets. Most bootcamps are located in tech hubs, such as New York City and San Francisco, but the number of available programs continues to increase each year.
A 4-week course that teaches the fundamentals of working with large data sets. Students proceed through a progressive list of courses, learning the principles of computer science, statistics, and machine learning.
An 8-week fellowship, the Data Incubator is designed for individuals who already have PhDs. The program is designed to enhance skills in key areas, including software engineering, numerical computation, natural language processing, data visualization and databases.
Sponsored by the University of Chicago, students build their data science sand and coding skills through a three month program. Selected participants work on machine learning, data mining and data science projects that have a social impact, tackling problems in education, healthcare, economic development and other fields.
An immersive five-day bootcamp, attendees gain hands-on experience with a variety of data science topics, such as predictive modeling, predictive analysis, clustering, and data visualization. All courses include the use of Python and R.
A comprehensive, 12-week bootcamp, students become comfortable designing and implementing a data science project, understand the foundations of data visualization, and use data tools such as Hadoop. The program also offers job placement services to graduates.
In this 12-week program, students advance from beginner to intermediate study of data science, learning about statistical software and computer programming software such as R, Hadoop and Python. Students work on data analytics projects throughout to develop applied learning experience.
To keep up with this evolving field, students can read the work of professors and industry professionals. Below is a list of some resources to consider:
MIT graduate and industry insider, Edwin Chen, has worked in machine learning at Google, in speech recognition at Microsoft, ad quality at Twitter and data science at Dropbox. He writes about different topics in data science, math and statistics.
The work of Nathan Yan, who holds a PhD in statistics from UCLA, Flowing Data examines how data scientists and other professional use analysis and visualization to explore data.
R Bloggers is an industry leading blog for data professionals who use R language and open-source software. It is a content aggregator of content written by a collection of R bloggers from across the profession.
Launched by two Johns Hopkins University and one Harvard University biostatistics professors, Simply Statistics is a site that posts articles about the latest trends in statistics.
Founded by Andrew Gelman, professor of statistics at Columbia University, the site includes posts from other professors in the industry and discusses a variety of statistical and data analysis trends.