Four Big Factors Shaping the Future of Data Science
In this special guest feature, Ryohei Fujimaki, Ph.D., Founder and CEO of dotData, discusses how AI and ML are having a profound impact on enterprise digital transformation becoming crucial as a competitive advantage and even for survival. As the field grows, four trends emerge, shaping data science in the next five years. dotData is a spin-off of NEC Corporation and the first company focused on delivering full-cycle data science automation for the enterprise. Dr. Fujimaki is a world-renowned data scientist and was the youngest research fellow appointed in the 119-year history of NEC.
According to the Gartner Group, digital business reached a tipping point last year, with 49% of CIOs reporting that their enterprises have already changed their business models or are in the process of doing so. When Gartner asked CIOs and IT leaders which technologies they expect to be most disruptive, artificial intelligence (AI) was the top-mentioned technology.
AI and ML are having a profound impact on enterprise digital transformation becoming crucial as a competitive advantage and even for survival. As the field grows, four trends emerge, shaping data science in the next five years:
Accelerate The Full Data Science Life-Cycle
The pressure to grow ROI from AI and ML initiatives has pushed demand for new innovative solutions that accelerate AI and data science. Although data science processes are iterative and highly manual, more than 40% of data science tasks are expected to be automated by 2020, according to Gartner, resulting in increased productivity and broader usage of data across the enterprise.
Recently, automated machine learning (AutoML) has become one of the fastest-growing technologies for data science. Machine learning, however, typically accounts for only 10-20% of the entire data science process. Real pains exist before the machine learning stage with data and feature engineering. The new concept of data science automation goes beyond machine learning automation, including data preparation, feature engineering, machine learning, and the production of full data science pipelines. With data science automation, enterprises can genuinely accelerate AI and ML initiatives.
Leverage Existing Resources for Democratization
Despite substantial investments in data science across many industries, the scarcity of data science skills and resources often limits the advancement of AI and ML projects in organizations. The shortage of data scientists has created a challenge for anyone implementing AI and ML initiatives, forcing a closer look at how to build and leverage data science resources.
Other than the need for highly specialized technical skills and mathematical aptitude, data scientists must also couple these skills with domain/industry knowledge that is relevant to a specific business area. Domain knowledge is required for problem definition and result validation and is a crucial enabler to deliver business value from data science. Relying on “data science unicorns” that have all these skill sets is neither realistic nor scalable.
Enterprises are focusing on repurposing existing resources as “citizen” data scientists. The rise of AutoML and data science automation can unlock data science to a broader user base and allow the practice to scale. By empowering citizen data scientists allowing them to execute standard use cases, skilled data scientists can focus on high-impact, technically-challenging projects to produce higher values.
Augment Insights for Greater Transparency
As more organizations are adopting data science in their business process, relying on AI-derived recommendations that lack transparency is becoming problematic. Increased regulatory oversight like the GDPR has exacerbated the problem. Transparent insights make AI models more “oversight” friendly and have the added benefit of being far more actionable.
White-box AI models help organizations maintain accountability in data-driven decisions and allow them to live within the boundaries of regulations. The challenge is the need for high-quality and transparent inputs (aka “features”), often requiring multiple manual iterations to achieve the needed transparency. Data science automation allows data scientists to explore millions of hypotheses and augments their ability to discover transparent and predictive features as business insights.
Operationalize Data Science in Business
Although ML models are often tiny pieces of code, when models are finally deemed ready for production, deploying them can be complicated and problematic. For example, since data scientists are not software engineers, the quality of their code may not be production-ready. Data scientists often validate the models with down-sampled datasets in labs environments and models may not be scalable enough for production-scale datasets. Also, the performance of deployed models decreases as data invariably changes, making model maintenance pivotal to extract business value from AI and ML models continuously. Data and feature pipelines are much bigger and more complex than ML models themselves, and operationalizing data and feature pipelines is even more complicated. One of the promising approaches is to leverage concepts from continuous deployment through APIs. Data science automation can generate APIs to execute the full data science pipeline, accelerating deployments while also providing an ongoing connection to development systems to accelerate the optimization and maintenance of models.
Data science is at the heart of AI and ML. While the promise of AI is real, the problems associated with data science are also real. Through better planning, closer cooperation with line of business and by automating the more tedious and repetitive parts of the process, data scientists can finally begin to focus on what to solve, rather than how to solve.