BY Capgemini
This page was produced by FT2, the advertising department of the Financial Times. The news and editorial staff of the Financial Times had no role in its preparation.


IT trends spotted and checked by experts


Mind the data science skills gap

21 Feb, 2017 10:15 am

Big data crunching is set to transform all aspects of modern life as every industry seeks to harness the insights data analysis can provide, but it requires a new generation of data scientists to lead this charge.

Harvard Business Review famously labelled data scientist as the sexiest profession of the 21st century, but five years on the business community is still struggling to recruit enough sufficiently qualified staff to meet demand.
While academia races to prepare a large enough number of courses of adequate quality, the potential of the big data revolution is yet to be realised, as increasingly sophisticated artificial intelligence (AI) systems process the data in their stead.

The surge in interest in data science is fuelled by a colossal increase in the volume of data generated. This increase, which some estimate will total a 20,000-fold leap in data volumes between 2000 and 2020, comes on the back of an uptick in Internet use and roll out of connected devices.

"Data science is what tells you what's hot before the experts even see it on the radar. This is competitive advantage to the n-th degree," data science specialists wrote in a blog.

"Forget copycat trends, corporate espionage, or stealing the competitor's best workers, data science taps into the information that's already out there, the information that's pointing the way a trend is headed."

Plunging computer hardware and data storage costs, plus a switch to open source software helped data science become a specialist discipline, but there remains a lack of skilled people to navigate and make sense of the vast sea of data being created. 

"The big problem is people don't know how to ask a question and they don't know what questions to ask, so often there's this kitchen sink approach - we're just going to collect all the data," said Christopher Brooks, a Research Assistant Professor at University of Michigan, which in October launched a series of four-week online data science courses that have drawn more than 10,000 students to each class.

"If you don't collect it the right way to answer your questions, then it ends up being useless data. It's tough to turn into information."

Filling the skills gap

Some experts estimate 100,000-190,000 data science jobs will go unfilled in the United States alone, from 2011 to the end of this decade. 

This skills shortage is a global phenomenon and the dearth of qualified talent is also driving up salaries - data scientists in the US can expect a six-figure starting salary, while recruitment website Glassdoor has ranked data scientist as the top profession for the past two years, based on job openings, salary and overall job satisfaction. 

However, the skills shortage could be even larger than suspected. 

"There are so many job adverts for positions that aren't called data scientists - they're called business intelligence, customer intelligence and so forth - but what they're looking for is a data scientist," said Brooks.

Education and professional training are struggling to catch up with industry's demands. About 8,000 people in the US are graduating each year with master's degrees in data science or data analytics, according to, a pitifully small number considering the scale of the shortage. 

Some industry experts argue that masters programmes cannot teach all the necessary skills within the 18-24 months that constitute the length of a typical degree. 

"Even having gone through a good course, it takes years of experience to become accomplished enough to deliver safe and reliable models," Maciek Wasiak, chief executive of Dublin-based data science start-up Xpanse Analytics, wrote in a LinkedIn blog post.

"Despite the course names, like 'Business Analytics' or 'Data Science' ... the vast majority of the scientists leading them have no idea (what) data science in the business world really looks like."

According to a 2014 study, 41% of companies cited a lack of talent as a major challenge to fully exploiting big data's potential in their operations. 

Boosting big data recruitment

Nine in 10 companies said they planned to improve their data science expertise and the same number expect big data to revolutionise business to the same extent as the Internet.

Academia and industry are trying to find solutions, but some don't come cheap. Software giant SAS runs three courses through its Academy for Data Science - six-week courses to become certified as a SAS Big Data Professional or an SAS Advanced Analytics Professional cost $9,000, while the eight-week SAS Certified Data Scientist will set you back $16,000. 

"One of the greatest challenges affecting our industry is the disconnect between the data science needs and expectations of enterprises and the current skills of today's scientists, mathematicians and engineers," wrote Travis Oliphant, CEO and co-founder of Continuum Analytics, which is the creator and driving force behind Anaconda, an open source analytics platform powered by Python, which along with R is the principal programming language of data science.

Oliphant launched its Anaconda Skills Accelerator Program (ASAP) in March 2016 to train data scientists.

"Fostering true data science experts requires more than traditional training programs," he added. 

The Universities of Michigan, Washington and California, plus Baltimore's John Hopkins University all offer a series of short courses for minimal fees, although Michigan's Brooks is quick to stress these training sessions provide students of various academic backgrounds with a basic grounding to see if they would like to take it further. 

"Through these short courses, a manager can make educated decisions about what their department should be doing via data science," said Brooks.

"Data science is fundamental knowledge that people need for all kinds of careers. Thinking about data and how to interact with data is something that almost every undergraduate student should be taught in some fashion."

Data dives into deep learning 

Amid the shortage of human data crunchers, machine learning systems are increasingly stepping up to fill the data science skills gap, with incredibly sophisticated deep learning algorithms powering data analysis.

Deep learning, a growing branch of machine learning, has claimed headline-grabbing advancements in artificial intelligence in recent years, including a victory at a January poker tournament against some of the world's best Texas Hold 'Em players in which a computer taught itself how to bluff, a landmark achievement in AI. 

The poker triumph followed an AI win over the world champion of Go, a fiendishly complex board game that offers more potential outcomes than there are atoms in universe. Yet games are still simple compared to real-life issues and a new breed of start-ups are using deep learning techniques to try to solve the data science industry's problems.  

This is transforming the role of data scientists, easing shortages in some sub-disciplines while creating bottlenecks elsewhere, according to Chris Nicholson, chief executive and co-founder of San Francisco-based deep learning data analysis firm Skymind.

Skymind specialises in database predictions, pattern recognition, image and video processing, fraud detection and natural-language processing, for major banks and telecom operators such as Orange.

For Nicholson, demand for new data scientists remains strong although deep learning is radically changing the profile and skill sets needed for the data crunchers of tomorrow, as there is less need for the "feature engineers" who help direct the AI algorithms.

Skymind chief executive and co-founder Chris Nicholson 

Why are feature engineers important?

If you're doing fraud detection they need to know all about transactions so they can help teach the algorithm what to look for. A feature is an input variable, and humans need to intervene to pick the right ones. If you don't have a good feature engineer, you're not going to be directing the algorithm to look at the right stuff and it will never find the relevant correlations and causal relationships between input and output.

Deep learning can determine which features are important without human intervention, so what does that mean for the role of data scientists?

If you give it an image, which is just a blob of pixels, deep learning will figure out which pixels are important when it's trying to apply a name to a face and which are just background; you don't have to tell it. 
That's a huge difference. We used to construct templates of digital noses so the algorithms would know what a nose looks like. They can (now) figure it out themselves, which gets us around a problematic choke point and makes feature engineers less important for many situations.

How do you rate the varying data science courses?

Some are preparing people well with traditional machine learning algorithms and some are preparing well to work with the newer deep learning algorithms. And some may not be preparing people well at all. Courses are springing up to meet a need in the market because demand is there. Companies want machine learning and need data scientists to help them get more value from data.

Who is studying data science?

A lot of people are switching careers, because they can get six figures as a data scientist. You have a lot of analysts coming out of finance, a lot of scientists who may not have found academia to be the paradise they expected. These are people with math skills and sometimes have some light coding skills. Those are a good basis for data science.

Is data science like many other professions in that the real training happens on the job and not in the classroom?

The real way to get good at tuning these neural nets are long weeks and late nights - change the parameters and see what happens. it's empirical. It's Madame Curie and her cauldron. It's experimental.

Has deep learning improved data science's cost efficiency?

Feature engineers are less necessary, which is good, because there were never enough of them. Nobody could find the feature engineers they needed. Getting better insights from data can create huge efficiencies, because you have a better idea of what you need to do to reach your goals. There's less waste.

Why all companies must master data science 

Boris Guarisma, Big Data and Data Science Consultant at CapgeminiBig data crunching is hitting the mainstream as an ever growing number of companies realise that their data is an asset and if it is analysed correctly, can have a significant impact on the bottom line.

To find out why it is important for business to join the big data revolution, we spoke to Capgemini's Boris Guarisma.

"Coursework should always be evolving to meet the needs of data science industry"

Graduate perspective: Prudhvi Badri
Prudhvi is a data scientist at Salt Lake City's Snap! Finance, a web-based firm that provides merchandise financing to people with poor credit ratings who want to buy products such as furniture, electronics and jewellery. Prudhvi, 24, graduated from Utah State University with a Masters' degree in computer science last year, having previously completed a bachelors' degree in the same discipline at Sri Venkateswara University in his native India. 

Is a computer science or a data science course better training to be a data scientist?

Computer science helps us learn programming concepts, which facilitate us to code easily to preprocess data, data exploration and to run machine learning models. Apart from coding, we need more knowledge in statistics and to understand machine learning in a better way to become a good data scientist. I think data science with a computer science background will give a sound foundation for a data scientist.

How could data science academic courses be improved?

The academia of data science should include a strong statistical foundation. This would help in understanding machine learning at very basic level, as a result of which we could try to tune the models to give best results for our dataset. The coursework should also include concepts to deal with big data such as hadoop, spark, etcetera. Professors should encourage students to participate in data science competitions and guide them to succeed. Finally, I think coursework should always be evolving to meet the needs of data science industry.

How well is academia responding to the demand for data scientists?

As a data scientist, I feel current academia was able to satisfy the needs of the data science industry. The students should be more interested to explore data science blogs and participate in the data science competitions to improve their skills by working on different data sets. Students feel more comfortable in the industry only if they work on different data science problems. 
Nigel Lewis, Meet Nigel Lewis, expert in Business Analytics.
Nigel Guy, Meet Nigel Guy your SI Delivery Excellence, BI & Analytics expert..
John Parkinson, Meet John Parkinson, your Data Governance, Data Quality, Data Strategy, Data Management expert.
Subhinder Dhillon, Meet Subhinder Dhillon, your Financial services, Banking and Insurance expert.