Banner

Friday, January 19, 2018

Is it Time to Stop Using the Term “Data Science”?

I would not be opposed to dropping the term “data science” and dissolving it into specialized disciplines. Do not misunderstand, I think the global “data science” movement was necessary and had a positive impact on the curmudgeon corporate world. But the campaign has been won and everybody is bought into the idea. Rather than continuing to evangelize, perhaps we should allow the dust to settle so people can adjust to the change.

Data science professionals, consider no longer burdening yourselves with the heavy title of “data scientist”. Most of us do not have PhD’s or encyclopedic knowledge on every new topic. Maybe we should specialize and relieve ourselves the pressure of having to know everything. Data science has become too broad of a buzzword, and it is so ubiquitous and vague it is practically meaningless. Why would anybody want to take ownership of something so nondescript?

In this article, I want to highlight how “data science” has evolved and why it may be time to fragment it.

The Jabberwocky Effect

In 2010, there was a short-lived but memorable U.S. TV series called Better Off Ted. The show is a silly workplace comedy that lampoons corporate culture to a hyperbolic extent. But one episode, Jabberwocky (Season 1 Episode 12), captures the corporate buzzword effect too accurately.

Ted, the lead character, tries to hide budget for a pet project. When his boss Veronica confronts him, he lies and says the funds went to the revolutionary “Jabberwocky” project, which he vaguely makes up on the spot.

Here’s the funny part though. Rather than clarify what “Jabberwocky” is, Veronica pretends to be “in the know” fearing to look incompetent for being out of the loop. She pushes the nonexistant Jabberwocky project as top priority on the rest of the company. With hilarious results, every leader and employee works on Jabberwocky having no idea what it is, but would never dare admit their ignorance to each other.

Blindsided by how far it escalated, Ted comes clean to Veronica right before they do a keynote on “Jabberwocky”. Veronica tells Ted to proceed anyway because “products are for people who don’t have presentations”.

I probably do not have to explain the analogy that is “Jabberwocky”. Replace that word with “Blockchain”, “Big Data”, “Bitcoin”, “Artificial Intelligence”, “Internet of Things”, or “Data Science” and you know exactly what I mean. Corporate culture has long had a history of hyping innovations and people pretending to understand them, only to encounter their limits and chase something else.

Now that I have highlighted the "Jabberwocky Effect", let’s continue.

A Brief History of Data Science

If you want to define “data science” as anything that has to do with “data”, you can go back to the dawn of computing. If you think math and statistics are crucial to data science just as much as data, you could go centuries back and say statisticians were the original “data scientists”.

For the sake of brevity, let’s go to the 1990’s. Things used to be pretty simple. Analysts, statisticians, researchers, and data engineers were all pretty separate roles with occasional overlap. Tooling stacks often consisted of spreadsheets, R, MATLAB, SAS, and/or SQL.

Of course throughout the 2000’s things were changing. Google pushed data collection and analytics to unimaginable heights. In 2009, Google executives insisted statisticians will have the “sexiest job” for the next 10 years. I was in college at the time, and I recall that being a strange sentiment. But lo and behold, in 2011 “Harvard Business Review” mainstreamed this concept called “data science” and declared it the sexiest job of the 21st century.

It was at that moment the craze started in “Jabberwocky” fashion. Harvard created a void called “data science” and everyone raced to fill it. SQL developers, analysts, researchers, quants, statisticians, physicists, biologists, and a myriad of other professionals rebranded themselves as “data science” professionals. Silicon Valley companies, feeling that traditional role titles like “analyst” or “researcher” sounded too limited, renamed the roles to “data scientist” which sounded more empowered and impactful.

Outside Silicon Valley, this added to the confusion as most folks think of “scientists” as PhD’s in white lab coats. Counterintuitively, data scientists actually come from many backgrounds (technical and nontechnical) with varying levels of education (BS, BA, MBA, and sometimes PhDs). Many hiring managers, HR departments, and organizations in general struggled to define what they needed in a “data scientist”. I have heard too many anecdotes about hiring managers being asked “What skills are needed in this data science role?” which was vaguely answered “Well we need to be data-driven. That’s why we are hiring a data scientist”. Rather than defining jobs based on need, they defined jobs on a buzzword.

Throw in scaling advancements in data engineering (think “Big Data”), as well as the rapid rise of “machine learning”, then the “data science” umbrella gets larger and more vague. More buzzwords are thrown around that many people are saying and yet few understand. Before you know it, "Big Data" and "machine learning" have become synonymous, and distinction of disciplines becomes lost.

The domain of “Data Science” has been exhausted by the “Jabberwocky” effect. If we want it to continue succeeding we need to dissolve it, rather than continuing to stare blankly into the rabbit hole.

Reasons to Dissolve “Data Science”

The “data science” push did some great things. It rejuventated old, grumpy businesses to do something fresh and exciting. IT departments, who were traditionally stingy about giving access to data and allowing non-I.T. staff to write code, were forced to evolve and support such initiatives. Most importantly, it democratized technology to so many non-technology professions. The idea that a lawyer can benefit from learning to code is not so fringe anymore, and the rite is no longer reserved for computer scientists and engineers.

But this is a sign that the “data science” campaign has succeeded and ran its course. Continuing to push it is starting to become detrimental. Here are some reasons why:

It is Too Broad

Not too long ago, if you got a bachelors degree in “Business Management”, you could easily be upwardly mobile. But today, conventional success often requires specializing and focusing in a specific area, simply because our world has gotten complicated. A business student will be much better off studying finance, supply chain management, operations research, accounting, marketing, or some other specific business discipline.

I believe “Data Science” needs to go through a similar transition. Like business itself, there are too many disciplines to expect total mastery. It is unproductive to try learning all of them, especially at once. Of course high-level awareness of what’s out there is beneficial. It is also healthy to change interests over time. However, attempting to be omniscient will never yield value.

It’s always bothered me that “data science” can be creating a chart in Excel or Tableau… as well as building a machine learning algorithm. Seriously, what is up with that? These two tasks are thousands of miles apart in their nature, the technical skill needed, and the salary. Writing a SQL query versus designing a neural network? These are also unrelated skillsets and definitely not interchangeable. Yet there are those that insist we brand each of these skills equally as “data science”, and we generalize people with these diverse skills as “data scientists”.

Some folks reading this may argue “well all these disciplines are interconnected and data science is important to keep them integrated”. That’s arguable to some degree, but marketing, finance, supply chain, accounting, and other business functions are interconnected as well. Despite a common objective, they still are distinct areas and we no longer put emphasis on the whole of “business management”. Fragmentation and specialization is part of a domain maturing, and over time those get more attention than the domain itself.

It is Overwhelming

One of the things that prompted me to write this article is the growing number of articles from data scientists confessing their feelings of “imposter syndrome”. There is this one which I’ve seen circulating. There is also this one. As time progresses, more data science professionals continue to come forward and confess their feelings of fraudulence. My initial reaction was “what took so long?”. Then I felt a bit more empathetic and perhaps even sorry. Professionally, the burden of Imposter Syndrome can fill you with dread and keep you up at night. The question always lingers “How long will it be until I’m discovered for the fraud I am?”

But I believe this a symptom of the larger issue in this article. It took me way too long to figure out that “data science” has become anything and everything related to “data”. Sadly, there are folks that take it upon themselves to own all that. Why anyone would want to is beyond me.

This is all you need to do to become a confident “data scientist”. Totally achievable, right?

The above graphic is a popular “roadmap” to become a “data scientist”. Not only is this impractical for folks with personal lives, but why is it prescribing a “one-size-fits-all” curriculum? Maybe you can get shallow knowledge of every topic on there, but people work in different environments with different problems. At a given point in time, why not learn the tools needed for your particular job? Never mind also that tools come and go. For instance, do you even see Apache Spark, the successor to MapReduce, under "Big Data"? The only part of this roadmap not prone to obsolescence are classic mathematical concepts .

Do not misunderstand, it’s always good to be learning and obtain general ideas of what solutions exist. But in the reality of day-to-day life, effective people know how to discern and prioritize, rather than be driven by FOMO.

It Saturated Everything

Data is like electricity now. It is used everywhere and for different purposes. In the 19th century people would marvel at what electricity enabled.

Today, there is less attention on electricity and more on the devices it is powering. It is not so much we take electricity for granted, but ya know, there just comes a point you stop celebrating it. It is the same thing with data. It has succeeded and became the new normal. Rather than continuing our exhausted celebration, we should focus on the next innovations that it will enable.

Do you think natural language processing can create an opportunity to improve customer complaint handling? Then push “natural language processing”, not “data science” or “machine learning”. Be specific and focused. Are you interested in optimizing profit, cost, revenue, or operational feasibility, then position yourself on linear/integer optimization. “Data science” is just white noise now. Focus on specific and tangible areas where problems are yet to be applied and solved.

The Buzzword Dilemma

To wrap up, here are a few final considerations. I made it clear we should stop using the term “data science”. Will that actually happen? Sooner or later, I think it will. Am I going to follow my own suggestion? I am not sure yet. While the term stays in vogue, it may be the only way to get people to show up to my talks. I cannot blame others for doing the same.

Ask yourself this also: do we use buzzwords to spur a positive change? Or to serve our own purposes? Again on a global scale, the “data science” buzzword has had a positive effect. It democratized technology across professions and empowered many people for the digital workforce. But I am sure there are folks calling themselves “data scientists” to exaggerate their capabilities and capitalize on the hype. Others are coping with the pressure to know everything, and I do hope data science fragments and specializes for the sake of their well-being.

In summary, let’s ease off on generalizing, categorizing, and forcign labels on people and what they do. Perhaps we should stop calling roles "Data Scientist" and instead make the role reflective of the tasks it entails. Hire "Data Engineers", "Mathematical Modeling Consultants", and "Machine Learning Analysts" rather than "Data Scientists". Give everyone a chance to find their niche and contribute individually in the best way they know how. In time, organizations will shape themselves and align to their people in ways that make sense.

2 comments:

  1. We at Coepd declared Data Science Internship Programs (Self sponsored) for professionals who want to have hands on experience. We are providing this program in alliance with IT Companies in COEPD Hyderabad premises. This program is dedicated to our unwavering participants predominantly acknowledging and appreciating the fact that they are on the path of making a career in Data Science discipline. This internship is designed to ensure that in addition to gaining the requisite theoretical knowledge, the readers gain sufficient hands-on practice and practical know-how to master the nitty-gritty of the Data Science profession. More than a training institute, COEPD today stands differentiated as a mission to help you "Build your dream career" - COEPD way.

    http://www.coepd.com/AnalyticsInternship.html

    ReplyDelete