Well gone are those days when people used to deal with hundreds of dataset now is the time when we deal with millions and billions of dataset which leads to a number of observations and inferences. But the main area of concern is how we are actually going to assess such a large amount of data? As for a humanly attempt, it is impossible to look forward to such a huge amount of data. So to ease up the work we came up with the concept of Machine Learning, which works on data and processes it to unleash new patterns. It provides the ability to learn without being explicitly programmed. To get the accurate and appropriate results out of ML model it is necessary to understand the data.
Google brought up the concept of Facets whose basic aim is to transform data into an interpretable and understandable fashion allowing the user to see a holistic scenario of data at distinct granularity. It can help developers to look for basic distinction and introspection of large datasets. It consists of two exclusive visualizations which can actually transform the way we apprehend and analyze data. The visualizations are executed as Polymer web components, supported by Typescript code and can be easily embedded into Jupyter notebooks or web pages.
Overview considers input data from various datasets and analyzes them corresponding to each feature and visualize the observations. Multiple datasets might consist of training set and test set and along with this common data issues which have the capability to affect ML are pushed to the forefront. The main motive is to provide developers an insight of the characteristics of datasets, distribution, and unpredicted values. The visual statistical analysis produced by overview can be used to obtain a comparative study across two or more datasets. The tools can process both numeric as well as string features.
It is a tool which provides interactive examining of the huge amount of data and allows transition between high-level-overview and low-level-details helping us to fetch quality inferences. It is easy to customise and locate the patterns among complex data through smooth animation, zooming, and filtering. Each item in the visualization delineates as a data point. Placing of items can be done by faceting or bucketing them in multiple dimensions by their feature values. With Facets Dive, you can manage the location, color and visual representation of each data point based on its feature values.
The research is part of PAIR initiative which aims to design a people-centric AI system that can help customers to deal with the data by providing them a better level of understanding. It is expected that this will lead people towards discovering new and interesting models and since this is an open source approach one can customize the visualizations according to specific needs. Presently visualizations work only in the Chrome browser, but this will be resolved in the future and it is expected to take our existing work forward by improving upon AI and Machine Learning.