Sexiest Job of the 21st Century: Data Scientist

Sarah Robinson
DATAcated

--

“A data scientist is someone who is better at statistics than any software engineer and better at software engineering than any statistician.”
— Josh Wills of Cloudera

90% of the data we have today was generated in just the past two years. For businesses and organizations that can learn and benefit from that data, the explosive growth seems like a dream come true. However, that data is meaningless without ways to capture and analyze it. A fact that is driving the strong demand for data science professionals.

Data science can cover virtually any quantitative work. Two data scientists at different companies, or even within the same company, could do totally different types of work. The field has gradually been fracturing into more specific job titles, such as data engineer, data analyst, machine learning engineer, and so on.

This process of specialization will certainly accelerate in the future. Therefore, when you’re talking about data science or applying to jobs, try to figure out what the specific relevant definition of data science is for that situation, and make sure that it matches yours.

The very essence of any data science job is solving problems: the company provides you with (often ambiguous) data sets and expects you to make something out of it — something that can help solve a real business problem. When analyzed correctly, data can provide a lot of valuable insights into how the business operates at the moment — and which aspects could be improved upon. Some of them are:

  • Identifying key business metrics that should be tracked
  • Predicting the performance of these metrics
  • Predicting the behavior of customers
  • Testing product changes via experiments
  • Improving the product via the creation of data products

Data scientists have more and more work to do in order to develop new algorithms, using machine learning, to reach the ultimate goal of matching demand and supply and to conquer the market efficiently.

Daily life of a Data Scientist

First off, we need to start with a disclaimer. Ask someone who works as a data scientist about their typical day and they might laugh out loud at the notion of “typical.” It is actually very difficult to describe a typical day in the life of a data scientist because when your day-to-day tasks involve building data products to solve problems for billions of people, it is not possible to have a “typical” day.

Although these workdays are full of flux, some aspects of the day remain the same: working with data, working with people, and working to keep up with the field.

The greatest proportion of a data scientist’s day is spent coding… which can be:

  • data cleaning and data formatting, (70% of the work data scientist do)
  • prototyping
  • creating data visualizations
  • creating automations
  • predictive algorithms
  • build machine learning models (10% of the work data scientist do)
  • implementing your models into the product

A data scientist is responsible for building automated machine learning pipelines and personalized data products for profitable business decision making. Having explored the data and decided on an approach, a data scientist analyses the data to get valuable information from it. There are various algorithmic approaches that can be applied to a data science problem, few of them are listed below–

  1. Two-Class Classification Approach — works best for finding answers to questions that have only two possible answers.
  2. Multi-Class Classification Approach — works best for finding answers to questions that have multiple possible answers.
  3. Reinforcement Learning Algorithms — when your problem is not predictive in nature and requires you to figure out what are good actions.
  4. Regression — works best for questions that have a real-valued answer instead of a class or a category.
  5. Clustering — works best when you want to classify each data point into a specific group and answer questions about how data is organized.
  6. Perform in-depth Data Analysis — Apply Statistical Modelling, Algorithms, and Machine Learning

Data is only as good as the questions you ask. Unless a data scientist asks the right questions, they cannot provide the right algorithm for better business decision making. This involves various tasks such as understanding the business requirements, scoping an efficient solution, and planning the data analysis.

Having tweaked and optimized the model to obtain the best results, the next most important task of a data scientist is to effectively communicate the findings so various stakeholders can understand the insights and take further action based on them.

A picture is worth a million datasets. Data scientists work with various data visualization tools like Tableau, QlikView, PowerBI, and others to demonstrate real-life cases on how the model is working on the actual customer. They create presentations with an appropriate flow to narrate the data story in a way that is easily comprehensible and compelling to the stakeholders.

It’s important to remember that although a data scientist is working with data and numbers, the reason behind it is driven by a business need. Being able to see the big picture from a department’s point of view is critical.

High job growth: The median base salary is $110,000 and there are thousands of unfilled jobs right now, with many more to come: IBM predicts demand for data scientists will soar 28% by 2020.

Getting the job

Data science is a competitive field. There are a limited number of tech companies with great data science brands, and the battle for their summer internships and entry-level roles is fierce. However, once you have even a small amount of real data science work experience, it’s much easier to get a second job in the field.

The best way to gain experience as a data scientist is to create a project using real data to answer real questions. The reason is simple: it’s the closest you can get to an actual job without actually having one. Find something you’re interested in and get your own data. Scraping data off the internet is much easier than most beginners realize.

Then, ask some questions that interest you and see how well you can answer them. Clean the data, make some graphs and models, and then write up your conclusions somewhere public. It’ll be slow going in the beginning, but that’s because you’re learning.

If you can, try to solve actual real-world problems for people in your community, such as doing statistics work for a school sports team or doing polling analysis for the school newspaper, in order to get practice with stakeholder management as well.

Data scientists with a few years under their belts, even from little-known companies, often have little trouble getting hired at top companies. Thus, if you want to be a data scientist, and you don’t get an offer right off the bat from one of the famous companies, consider broadening your job search. There are lots of companies with interesting problems to solve.

Hadoop, Spark, Yarn, Julia, Kafka, Airflow, Scalding, Redshift, Hive, TensorFlow, Kubernetes… there are a seemingly unending number of data science coding languages, frameworks, and tools. When you haven’t worked at a data science job before, it feels like you have to know all of them to be a real data scientist. Pick a small set of tools that work for you. Get comfortable with them, and don’t worry about branching out too much until you’re at a job.

--

--

Sarah Robinson
DATAcated

Data-driven business analyst focused on gathering vital business intelligence to meet company needs and passionate about showing how easy analytics can be