Earning A Master’s in Data Science

From Finding A Program to Completing A Degree

Data science. Many have probably heard the term, but most may not know what it means exactly. This new and high demand field emerged from the combination of new technologies that collect and organize data and the desire to better understand and improve business, marketing, and even the human condition. Just about every sector of the economy relies on data, which means professionals who are able to analyze and use big data effectively are highly sought after. According to McKinsey Global Institute, there will be a talent shortage of nearly 200,000 trained data professionals and 1.5 million data analysts by 2019. With that in mind, the popularity – and availability – of data science master’s degree programs is quickly growing.

The following guide offers an overview of master’s degree programs in this expanding field. Prospective students can learn more about program goals and outcomes, review curriculum paths, and get information to decide whether a PhD or corporate job is the right choice after completing their master’s degree.

Understanding the Data Science Master’s Degree

To meet the growing demand for skilled data analysts, colleges and universities have created master’s programs in data science. Today, there are approximately 70 graduate programs in data science analytics across the country. The main goal of these programs is to equip graduates with in-depth knowledge of data science as well as technical skills in statistical analysis. Upon graduation, students are proficient in collecting, organizing, managing and extracting insights from large, complex data sets. Students also develop hands-on experience with industry tools, statistical techniques, and working in a variety of data environments. Below is a list of the core learning outcomes of data science master’s degree programs:

Skills in computer programming and databases

Computer science is the foundation of gathering, moving, and transforming data. Students gain familiarity with object-oriented computer programming and database languages, such as SQL, Python, Java, Hadoop, Pig, and Hive. The goal is to teach students how to gather, manipulate, and prepare data for analysis.

Knowledge of fundamental principles of math

Students study the concepts and principles of statistical and mathematical analysis. This includes core subjects in mathematics, including linear algebra, discrete mathematics, and statistics. Understanding the mathematical underpinnings of the field form the basis for predictive modeling and machine learning techniques.

Knowledge of data warehousing and data management

Students learn how to organize and manage, unstructured data sets. Managing data requires understanding database and data warehousing tools, such as SAS, SQL, Informatica, and Hadoop. Students gain experience designing, developing, and deploying data warehousing systems.

Ability to apply statistical analysis to data

Analyzing data requires an understanding of applied statistics and how to use tools to generate statistical assumptions. Students develop skills working with programming languages, such as Python, C/C++, Java, and Perl.

Ability to visualize data and communicate findings

Complementing statistical and programming skills is the ability to translate complex data into narrative and meaningful visualizations. Students develop fundamental knowledge of visualization theory, different data tools (e.g. R, Tableau, D3), data patterns, reporting, and learn how to communicate technical findings clearly and succinctly.

Overview of Master’s Level Coursework

In order to achieve the goals discussed above, curriculum in a Master of Science in Data Science focuses on quantitative analysis skills, providing students with a broad skill set that can be translated to almost any industry. Typically, students must complete between 30 and 33 credit hours of coursework to earn a graduate degree.

Overall curriculum is integrated, blending together applied statistics and mathematics, business, and computer science. Through this comprehensive curriculum, students gain theoretical knowledge and hands-on experience with today’s modern analytical tools.

Exact coursework and requirements will vary by school, but in general, courses are divided between core coursework (~18 credit hours), required courses (~6 credit hours) and electives (~6 credit hours). Depending on the program, students may also be required to complete a practicum, comprehensive examination, thesis, or capstone project.

Core curriculum covers the fundamentals of data science, such as data mining, data warehousing, statistical programming, applied statistics, and computer programming. Required courses vary by department, but may include key topics such as predictive statistics, data mining applications, and big data analytics. Elective classes allow students to augment their degree program by selecting courses that align with their personal interests and career goals, ranging from topics in strategic marketing to data management in health informatics, systems design to advanced databases.

Below is a list of core concepts and skills grad students gain during their studies:

High performance computing

  • Advanced programming for scientists

    This course covers topics in software development, programming optimization, and troubleshooting. The class serves as a foundation for understanding software performance and covers topics including C/C++ , libraries and frameworks (e.g. PETSc, BLAS), and debugging tools.

  • Parallel computer architecture

    This class examines the design of parallel architectures, helping students further their programming experience. Topics of study include working with shared memory, data flow, and synchronizing parallel machines.

Statistical modeling

  • Statistical computing and probabilistic computing

    This class introduces students to the core concepts of probabilistic modeling and the applied statistical computational techniques used to analyze data.

  • Statistical learning

    This class reviews algorithms used in various statistical techniques to discover and isolate patterns or trends in data. Topics of study include linear regression, classification, variable section, and clustering.

Data analytics

  • Data fundamentals

    This course is an introduction to working with data, including the challenges faced by data professionals working with large data sets. The course examines distributed computing, techniques for acquiring unstructured data, and statistical software (e.g. Hadoop).

  • Introduction to data analytics

    This class reviews the key fundamentals of data analysis, including central concepts: data collection, cleaning and filtering, and statistical techniques. Topics include working with databases, SQL, visualization, and algorithms.

Data mining and computational techniques

  • Data mining

    This course teaches students about the applied computational and statistical techniques used to work with large data sets. Topics of study include data normalization, classification, detecting anomalies in data, and predictive modeling.

  • Machine learning

    This class covers techniques related to machine learning and artificial intelligence. Students study data mining methods, analyze machine learning schemes, and develop data mining algorithms.

Data warehousing

  • Advanced SQL

    This class is an in-depth exploration of Structured Query Language (SQL). Topics of study include working with complex SQL queries, writing SQL scripts, evaluating SQL analytical functions, and learning how to use SQL for data warehouse manipulation.

  • Data warehouse development

    This course is the study of designing and developing data warehouse applications for use with big data. Topics of study include data warehousing trends, architectural components, data quality, data aware house design, as well as hands-on experience building data warehouses.

Statistical methods

  • Experimental statistics

    This class covers the core statistical models used in experimental design. Students learn about statistical sampling, variances, linear regression, and how to think about and communicate analytical findings.

  • Introduction to statistical modeling

    This course introduces students to using computations on data sets to solve problems. Topics are statistics-oriented: regression, collinearity, model selection, and randomization.

SAS

  • SAS applied statistics

    This class covers the fundamentals of using statistical tools in SAS. Students learn about multivariate data analysis, how to describe data, data regression and other statistical techniques.

  • Analysis of discrete data

    This class focuses on SAS data analysis, teaching students the basic techniques of discrete data analysis and how to apply them to large data sets with statistical software.

Computer programming knowledge

  • Introduction to computer programming

    This class is a review of the principles and techniques of object-oriented programming. Students study data structures, software development, and focus on a language, such as C+ or Java.

  • Programming in Python

    This class teaches students about Python, an open-source scripting language. Students learn about its use in databases, programming, and system administration.

Concentration Areas

Because data science remains an emerging field, universities across the country continue to launch new programs and refine existing ones. This refinement includes adding concentrations to their degree programs. Although concentrations are not widely available, some programs do offer the option. Pursuing a concentration allows students to specialize their skills and knowledge by linking data science education to academics in other fields, such as business or healthcare. Below is a list of potential concentration areas data science students can pursue.

Business Analytics

This specialization blends the study of data science with business intelligence applications. In addition to learning about quantitative methods, predictive modeling, and data management, this concentration provides students with opportunities to develop applied skills in various business areas, such as marketing, finance, or information technology.

Computer Science

The computer science concentration is designed for students with an undergraduate degree in the field. Curriculum concentrates on developing real-world applications, writing queries for large scale data visualization, planning and building data storage systems, and working with data sets across a variety of computing platforms.

Cybersecurity

A concentration in cybersecurity augments data analytics coursework with instruction in systems and network security, cryptography, cloud computing, randomized algorithms, and computer architecture. In this concentration, students learn how to apply statistical techniques and build analytical models to identify and handle potential cybersecurity threats.

Data Analytics

A concentration in data analytics is designed for individuals who want to combine machine learning engineering with large, complex data sets. Students develop an advanced understanding of optimization theory, data structures, and machine learning processes and are able to develop applications for multiple industries.

Healthcare Analytics

The healthcare analytics specialization covers data management in healthcare. Students learn about the fundamental principles of healthcare information, including infrastructure, privacy, and quality control. Curriculum broadly covers clinical computing applications for healthcare delivery and database modeling and design for healthcare informatics.

Graduation Requirements

In addition to coursework, students pursuing data science master’s degrees may be required to also complete the following:

Capstone project

Instead of a thesis, students may be asked to complete a capstone project. Capstone projects allow students to translate their theoretical knowledge into real world data analysis. Working in teams, students collaborate to address a problem statement from a sponsoring organization, such as an insurance company or local department of accounting. They are responsible for the entire data analysis process, from organizing and processing data to selecting an appropriate statistical model. Students ultimately devise a solution to the problem and present their results.

Practicum

A central component of the curriculum is the practicum, either an individual or team-based learning experience. Practicums provide students an opportunity to work on real-world analytics projects. The length and type of practicum depends on the institution. For example, Illinois Institute of Technology requires a 6-credit summer practicum, while the program at North Carolina State University requires an 8-month program. Students tackle an industry-specific business problem and are responsible for collecting, cleaning, analyzing, and reporting on their assigned data set.

Thesis

Depending on the program, students may be required to complete a thesis project. A thesis consists of original written work, based on the subject. Working under the supervision of a faculty member, students propose a topic and project. They select an area of interest and devise a problem or element to investigate. For example, students might explore mining data off the Internet, the implications of social media data, or data sampling from massive data sets. Once submitted, students usually complete an oral defense of their paper to a committee.

From Master’s Degree to Real World

The implications of big data are starting to sweep into every industry and across business functions, placing a high demand on the need for data analytics professionals. According to a 2015 Data & Analytics Report from MIT Sloan Management Review, the need for talent is two-fold: 1) remaining competitive in a data-driven environment and 2) extracting actionable business insights from analytics. Half of survey respondents reported issues transforming data into business intelligence, nearly half (43 percent) reported having a lack of available data science talent, and others reported that managing the increasing volume of data presents technical issues. The MIT Sloan Management report reveals exponential growth is on the horizon for trained and highly skilled data science professionals.

So, what does this mean for freshly minted graduates of master’s in data science programs? The increasing opportunities allow them to be highly selective in their role and professional field. In other words, graduates with a master’s degree are not locked into working solely in the technology sector. The saturation of data across industries presents opportunities to apply their skills in a vast array of arenas. Below is an example list of industries where data scientists are in demand:

Biotechnology

A growing industry, biotechnology deals with clinical studies, pharmaceutical research, and emerging sub-areas such as genomics. This research generates massive amounts of datasets that call for the analytical eye of data scientists. Opportunities exist to drive forward advancements in genetics, drug development, and disease treatments.

Energy

Energy is a complex, broad industry—one that creates a significant array of data. Companies are competing to create energy, find new sources of energy, and distribute it more efficiently and effectively. Data scientists can help businesses and organizations in the sector by helping them to better organize and analyze data around energy consumption patterns, use data to align supply and demand, and find ways to save costs on energy exploration.

Finance

Data science offers new approaches to analysis across all financial markets, ranging from asset management to stock trading. Whether a brokerage firm or multinational corporation, data scientists are leveraged to generate new ways of thinking about data and using machine learning to improve investment performance.

Healthcare

During the past five years, healthcare spending has continually increased, reaching nearly $3 trillion in 2014. The healthcare system is a complex ecosystem of insurers, medical providers, and researchers. Data scientists are needed to find solutions to cut spending, help physicians make evidence-based decisions, and sift through mounds of research information to spur medical breakthroughs.

Manufacturing

The longstanding trend in manufacturing has been to outsource factories to countries with lower labor costs. However, big data has introduced a new approach to manufacturing. Data science is used in manufacturing to drive improvements in supply chain management, risk management, and forecasting. Data scientists ensure companies use data analytics to remain competitive as well as cost efficient.

Preparing for a Doctorate Degree

After earning a master’s degree, graduates may be interested in completing a doctoral program in data science. However, this may not be the right path for everyone. According to research from Burtch Works, an executive recruiting company, 92 percent of data scientists hold an advanced degree. Specifically, 48 percent have a PhD and 44 percent hold a master’s degree. A master’s degree is an avenue for employment in the field, which means prospective PhD candidates must possess a burning desire to complete a doctorate.

The PhD in data science is a flexible degree program that enables graduates to pursue employment opportunities in either the private sector or enter positions within academia. PhD programs in data science seek candidates that take active ownership of their academic and professional development.

Below is a list of suggested steps master’s students can take to prepare for a doctoral program in data science.

Join professional associations

Membership in professional data science associations serves several purposes. Associations bring together like-minded individuals and offer a range of opportunities and benefits such as continuing education courses, professional certifications, conferences, journals, and opportunities to network. Example associations within data science include the Data Science Association, Digital Analytics Association, and American Statistical Association.

Participate in self-training

After graduation, continuing education plays a vital role in helping data professionals stay on top of the latest trends and developments. MOOCs—massive open online courses—make it easy for prospective data science PhD students to continue building their skills. Courses and certificates are also available in different areas (e.g. machine learning, data visualization) from various providers, including Coursera and Udacity.

Enter Kaggle competitions

Kaggle is an online community of data scientists. Through competitions, Kaggle invites data scientists to tackle business problems from a diverse list of industries and companies. For example, Dato—a company that builds intelligent applications—sponsored a project where data scientists were asked to analyze a dataset with more than 300,000 HTML files, images, and links to devise improvements to native advertisements.

Complete an internship

Gaining experience outside of academia can make an application to a PhD program stand out. Internships in data science are available in nearly every industry. In turn, students should seek internships relevant to their academic specialty or professional interests. For example, here are three data science internships offered by major organizations: Khan Academy (Education), ESPN (Media and Sports), and Pure Storage (Technology).

Gain professional experience

There’s a major difference between textbooks study and real world knowledge. Prior to applying to a graduate program, actually working in relevant industries is a great way to build practical experience in data science as well as show PhD admission committees that you are serious about pursuing the field.

Earn certifications

Completing certification programs is another avenue to develop related and specific knowledge in different areas of data science. Below is an example list of data science certification programs.

Fellowships, Bootcamps and Related Scholarships for Data Science Majors

In addition to formal training at the graduate level, there are a variety of resources students can utilize to further their education, prepare for a PhD program, or transition into the workforce. The list below includes scholarships, bootcamps, and popular publications for industry professionals.

Data Science Scholarships

Because data science is relatively new to the academic landscape, the number of scholarships for master’s students is limited. However, there are scholarships for related fields, such as computer science and mathematics. Below is a short list of example scholarships that might apply to data science students.

Anita Borg Memorial Scholarship
  • Sponsoring organization: Google

  • Amount

    : $10,000
  • Application deadline: December 1, 2015

The Anita Borg Memorial Scholarship was created to encourage the success of women in computing and technology. Applicants must be a full-time female student in an undergraduate or graduate program in computer science or related field.

Cybercorps: Scholarship for Service
  • Sponsoring Organization: US Office of Personnel Management

  • Amount

    : $32,000+
  • Application deadline: Varies

In exchange for a service commitment in a government organization, master’s degree students in are eligible to receive up to $32,000 in scholarship funding from SFS. Students also receive a professional development allowance ($3,000), health insurance allowance ($1,200) and a book allowance ($1,000). Students must be enrolled in a formal academic program focused in cybersecurity or information assurance.

DOE Computational Science Graduate Fellowship
  • Sponsoring Organization: Department of Energy

  • Amount: Full tuition, stipend and academic allowance

  • Application deadline: January 19, 2016

The DOE CSGF is a fellowship that provides four years of financial support to students studying in computer science and engineering fields. First year master’s students are eligible to apply. Full tuition is paid at any accredited university, students receive a yearly $36,000 stipend, a $5,000 academic allowance during the first year of study, and a $1,000 allowance for year subsequent year of study.

Mathematics and Computer Science Student Scholarship
  • Sponsoring Organization: National Security Agency

  • Amount: $500

  • Application deadline: December 31, 2015

The NSA sponsors a scholarship to students enrolled in master’s degree programs in computer science and mathematics. Students must be a US citizen and meet minimum GPA requirements.

Data Science Bootcamps

Bootcamps are short, on-site, intensive training programs that can help students refine and advance their skill sets. Most bootcamps are located in tech hubs, such as New York City and San Francisco, but the number of available programs continues to increase each year.

  • Top Image
    Bit Bootcamp
    • New York City, New York
    • 4 weeks
    • $1,500
    • Experience with SQL and object-oriented programming

    A 4-week course that teaches the fundamentals of working with large data sets. Students proceed through a progressive list of courses, learning the principles of computer science, statistics, and machine learning.

  • Top Image
    The Data Incubator
    • New York City, New York; Washington, DC
    • 6 weeks
    • Free
    • PhD and Postdoc

    An 8-week fellowship, the Data Incubator is designed for individuals who already have PhDs. The program is designed to enhance skills in key areas, including software engineering, numerical computation, natural language processing, data visualization and databases.

  • Top Image
    Data Science for Social Good
    • Chicago, Illinois
    • 12 weeks
    • Free
    • Undergraduate and graduate students

    Sponsored by the University of Chicago, students build their data science sand and coding skills through a three month program. Selected participants work on machine learning, data mining and data science projects that have a social impact, tackling problems in education, healthcare, economic development and other fields.

  • Top Image
    Data Science Dojo
    • Varies
    • 6 days
    • $3,000
    • Knowledge of a programming or scripting language

    An immersive five-day bootcamp, attendees gain hands-on experience with a variety of data science topics, such as predictive modeling, predictive analysis, clustering, and data visualization. All courses include the use of Python and R.

  • Top Image
    Metis
    • New York City;, New York San Francisco, California
    • 12 weeks
    • $14,000
    • Previous coding experience and knowledge of statistics

    A comprehensive, 12-week bootcamp, students become comfortable designing and implementing a data science project, understand the foundations of data visualization, and use data tools such as Hadoop. The program also offers job placement services to graduates.

  • Top Image
    NYC Data Science Academy
    • New York City, New York
    • 12 weeks
    • $16,000
    • Master’s or PhD

    In this 12-week program, students advance from beginner to intermediate study of data science, learning about statistical software and computer programming software such as R, Hadoop and Python. Students work on data analytics projects throughout to develop applied learning experience.

Resources

To keep up with this evolving field, students can read the work of professors and industry professionals. Below is a list of some resources to consider:

Edwin Chen

MIT graduate and industry insider, Edwin Chen, has worked in machine learning at Google, in speech recognition at Microsoft, ad quality at Twitter and data science at Dropbox. He writes about different topics in data science, math and statistics.

Flowing Data

The work of Nathan Yan, who holds a PhD in statistics from UCLA, Flowing Data examines how data scientists and other professional use analysis and visualization to explore data.

R Bloggers

R Bloggers is an industry leading blog for data professionals who use R language and open-source software. It is a content aggregator of content written by a collection of R bloggers from across the profession.

Simply Statistics

Launched by two Johns Hopkins University and one Harvard University biostatistics professors, Simply Statistics is a site that posts articles about the latest trends in statistics.

Statistical Modeling, Causal Inference, and Social Science

Founded by Andrew Gelman, professor of statistics at Columbia University, the site includes posts from other professors in the industry and discusses a variety of statistical and data analysis trends.