Big data crunching is set to transform all aspects of modern life as every industry seeks to harness the insights data analysis can provide, but it requires a new generation of data scientists to lead this charge.
Review famously labelled data scientist as the sexiest profession of the 21st century, but five
years on the business community is still struggling to recruit enough sufficiently
qualified staff to meet demand. While academia
races to prepare a large enough number of courses of adequate quality, the
potential of the big data revolution is yet to be realised, as increasingly
sophisticated artificial intelligence (AI) systems process the data in their
The surge in
interest in data science is fuelled by a colossal increase in the volume of
data generated. This increase, which some estimate will total a 20,000-fold
leap in data volumes between 2000 and 2020, comes on the back of an uptick in
Internet use and roll out of connected devices.
"Data science is
what tells you what's hot before the experts even see it on the radar. This is
competitive advantage to the n-th degree," data science specialists import.io
wrote in a blog.
trends, corporate espionage, or stealing the competitor's best workers, data
science taps into the information that's already out there, the information
that's pointing the way a trend is headed."
hardware and data storage costs, plus a switch to open source software helped
data science become a specialist discipline, but there remains a lack of
skilled people to navigate and make sense of the vast sea of data being
"The big problem
is people don't know how to ask a question and they don't know what questions
to ask, so often there's this kitchen sink approach - we're just going to
collect all the data," said Christopher Brooks, a Research Assistant Professor
at University of Michigan, which in October launched a series of four-week
online data science courses that have drawn more than 10,000 students to each
"If you don't
collect it the right way to answer your questions, then it ends up being
useless data. It's tough to turn into information."
Filling the skills gap
estimate 100,000-190,000 data science jobs will go unfilled in the United
States alone, from 2011 to the end of this decade.
shortage is a global phenomenon and the dearth of qualified talent is also
driving up salaries - data scientists in the US can expect a six-figure
starting salary, while recruitment website Glassdoor has ranked data scientist
as the top profession for the past two years, based on job openings, salary and
overall job satisfaction.
skills shortage could be even larger than suspected.
"There are so many
job adverts for positions that aren't called data scientists - they're called
business intelligence, customer intelligence and so forth - but what they're
looking for is a data scientist," said Brooks.
professional training are struggling to catch up with industry's demands. About
8,000 people in the US are graduating each year with master's degrees in data
science or data analytics, according to datanami.com, a pitifully small number
considering the scale of the shortage.
experts argue that masters programmes cannot teach all the necessary skills
within the 18-24 months that constitute the length of a typical degree.
"Even having gone
through a good course, it takes years of experience to become accomplished
enough to deliver safe and reliable models," Maciek Wasiak, chief executive of
Dublin-based data science start-up Xpanse Analytics, wrote in a LinkedIn blog
course names, like 'Business Analytics' or 'Data Science' ... the vast majority
of the scientists leading them have no idea (what) data science in the business
world really looks like."
According to a
2014 study, 41% of companies cited a lack of talent as a major challenge to
fully exploiting big data's potential in their operations.
Boosting big data recruitment
Nine in 10
companies said they planned to improve their data science expertise and the
same number expect big data to revolutionise business to the same extent as the
industry are trying to find solutions, but some don't come cheap. Software giant SAS runs three courses through
its Academy for Data Science - six-week courses to become certified as a
SAS Big Data Professional or an SAS Advanced Analytics Professional cost
$9,000, while the eight-week SAS Certified Data Scientist will set you back
"One of the
greatest challenges affecting our industry is the disconnect between the data
science needs and expectations of enterprises and the current skills of today's
scientists, mathematicians and engineers," wrote Travis Oliphant, CEO and
co-founder of Continuum Analytics, which is the creator and driving force
behind Anaconda, an open source analytics platform powered by Python, which
along with R is the principal programming language of data science.
its Anaconda Skills Accelerator Program (ASAP) in March 2016 to train data
data science experts requires more than traditional training programs," he
of Michigan, Washington and California, plus Baltimore's John Hopkins
University all offer a series of short courses for minimal fees, although
Michigan's Brooks is quick to stress these training sessions provide students
of various academic backgrounds with a basic grounding to see if they would
like to take it further.
short courses, a manager can make educated decisions about what their
department should be doing via data science," said Brooks.
"Data science is
fundamental knowledge that people need for all kinds of careers. Thinking about
data and how to interact with data is something that almost every undergraduate
student should be taught in some fashion."
Data dives into deep learning
Amid the shortage of human
data crunchers, machine learning systems are increasingly stepping up to fill
the data science skills gap, with incredibly sophisticated deep learning
algorithms powering data analysis.
Deep learning, a
growing branch of machine learning, has claimed headline-grabbing advancements
in artificial intelligence in recent years, including a victory at a January
poker tournament against some of the world's best Texas Hold 'Em players in
which a computer taught itself how to bluff, a landmark achievement in AI.
The poker triumph
followed an AI win over the world champion of Go, a fiendishly complex board
game that offers more potential outcomes than there are atoms in universe. Yet
games are still simple compared to real-life issues and a new breed of
start-ups are using deep learning techniques to try to solve the data science
transforming the role of data scientists, easing shortages in some
sub-disciplines while creating bottlenecks elsewhere, according to Chris
Nicholson, chief executive and co-founder of San Francisco-based deep learning
data analysis firm Skymind.
specialises in database predictions, pattern recognition, image and video
processing, fraud detection and natural-language processing, for major banks
and telecom operators such as Orange.
demand for new data scientists remains strong although deep learning is
radically changing the profile and skill sets needed for the data crunchers of
tomorrow, as there is less need for the "feature engineers" who help direct the
chief executive and co-founder Chris Nicholson
Why are feature
If you're doing
fraud detection they need to know all about transactions so they can help teach
the algorithm what to look for. A feature is an input variable, and humans need
to intervene to pick the right ones. If you don't have a good feature engineer,
you're not going to be directing the algorithm to look at the right stuff and
it will never find the relevant correlations and causal relationships between
input and output.
Deep learning can
determine which features are important without human intervention, so what does
that mean for the role of data scientists?
If you give it an
image, which is just a blob of pixels, deep learning will figure out which
pixels are important when it's trying to apply a name to a face and which are
just background; you don't have to tell it. That's a huge
difference. We used to construct templates of digital noses so the algorithms
would know what a nose looks like. They can (now) figure it out themselves,
which gets us around a problematic choke point and makes feature engineers less
important for many situations.
How do you rate
the varying data science courses?
Some are preparing
people well with traditional machine learning algorithms and some are preparing
well to work with the newer deep learning algorithms. And some may not be
preparing people well at all. Courses are springing up to meet a need in the
market because demand is there. Companies want machine learning and need data
scientists to help them get more value from data.
Who is studying
A lot of people
are switching careers, because they can get six figures as a data scientist.
You have a lot of analysts coming out of finance, a lot of scientists who may
not have found academia to be the paradise they expected. These are people with
math skills and sometimes have some light coding skills. Those are a good basis
for data science.
Is data science
like many other professions in that the real training happens on the job and
not in the classroom?
The real way to
get good at tuning these neural nets are long weeks and late nights - change the
parameters and see what happens. it's empirical. It's Madame Curie and her
cauldron. It's experimental.
Has deep learning
improved data science's cost efficiency?
are less necessary, which is good, because there were never enough of them.
Nobody could find the feature engineers they needed. Getting better insights
from data can create huge efficiencies, because you have a better idea of what
you need to do to reach your goals. There's less waste.
Why all companies must master data science
Boris Guarisma, Big Data and Data Science Consultant at CapgeminiBig data crunching is hitting the mainstream as an ever growing number of
companies realise that their data is an asset and if it is analysed correctly,
can have a significant impact on the bottom line.
To find out why it is important for business to join the big data
revolution, we spoke to Capgemini's Boris Guarisma.
"Coursework should always be evolving to meet the needs of data science industry"
Prudhvi is a data
scientist at Salt Lake City's Snap! Finance, a web-based firm that provides
merchandise financing to people with poor credit ratings who want to buy
products such as furniture, electronics and jewellery. Prudhvi, 24, graduated
from Utah State University with a Masters' degree in computer science last
year, having previously completed a bachelors' degree in the same discipline at
Sri Venkateswara University in his native India.
Is a computer
science or a data science course better training to be a data scientist?
helps us learn programming concepts, which facilitate us to code easily to
preprocess data, data exploration and to run machine learning models. Apart
from coding, we need more knowledge in statistics and to understand machine
learning in a better way to become a good data scientist. I think data science
with a computer science background will give a sound foundation for a data
How could data
science academic courses be improved?
The academia of
data science should include a strong statistical foundation. This would help in
understanding machine learning at very basic level, as a result of which we
could try to tune the models to give best results for our dataset. The
coursework should also include concepts to deal with big data such as hadoop,
spark, etcetera. Professors should encourage students to participate in data
science competitions and guide them to succeed. Finally, I think coursework
should always be evolving to meet the needs of data science industry.
How well is
academia responding to the demand for data scientists?
As a data
scientist, I feel current academia was able to satisfy the needs of the data
science industry. The students should be more interested to explore data
science blogs and participate in the data science competitions to improve their
skills by working on different data sets. Students feel more comfortable in the
industry only if they work on different data science problems.