Episode Summary for Data Exploration with a New Python Library with Doris Lee

Doris Jung-Lin Lee is currently a graduate research assistant and a Ph.D. student in the Information Management and Systems department at the University of California, Berkeley. Her main research areas are the intersection of databases, data management, and human-computer interaction. She works on developing Lux which is a Python library for accelerating and simplifying the process of data exploration.  

Data exploration uses visual exploration to understand what is in a data set and the characteristics of the data. Data scientists explore data to understand things like customer behavior and resource utilization. Some common programming languages used for data exploration are Python, R, and MATLAB. Scientists use many automated assistance tools for interactive data exploration. Interactive data exploration has become an area of interest in the field of machine learning. Usage of automated assistance in the process of machine learning development could be fully automated robots. However, that is not the case since there are different phases of this automation.

The three main phases of this process can be very well explained with an example from cars. Cars could be fully automated, half automated, or, like our current cars, mostly manual. However, even our current cars also have some level of automation built-in. For instance, as the driver of the car, a driver does not need to think about how the gas piston in our engines works or how the gas pedal works. Hence, there is still some level of automation. Current cars could be thought of as the current standing point of the current machine learning tools like the Scikit-learn Python library or other packages. People manually develop these tools, and they develop the pipelines for some particular end objective just like current car manufacturers try to implement more automation with every new model. The end goal is a fully automated machine learning system.

H20 framework and other automated machine learning tools are great examples of this trend.


This article is purposely trimmed, please visit the source to read the full article.

The post Episode Summary for Data Exploration with a New Python Library with Doris Lee appeared first on Software Engineering Daily.

This post was originally published on this site