Dr. Ahmed AbuHalimeh, assistant professor of information science at UA Little Rock, has received a $103,036 grant to develop machine learning models that will improve data curation and data quality.
“I am super excited about this opportunity,” AbuHalimeh said. “It will open the door for many others in the near future. I am very excited to receive the grant in the short period of time I have been here since the spring semester. The grant will help me to establish my research and my career.”
AbuHalimeh received the two-year grant, “DC/LP: Developing Machine Learning Models to Improve the Effectiveness of Automated Data Curation Processes,” from the Arkansas NSF EPSCoR program, a multi-institutional statewide grant program funded by the National Science Foundation that will provide $24 million over five years to expand research, workforce development, and STEM educational outreach in Arkansas.
AbuHalimeh is one of five researchers in the state who received a grant from the Arkansas NSF EPSCoR Track 1 project, Data Analytics that are Robust and Trusted (DART), this year. The DART Research Seed Grant Program invites scientists throughout Arkansas to identify emerging or transformative areas of research in alignment with DART scientific focus.
Before datasets can be used in many kinds of learning models, they are often manually curated by researchers to assess the content and quality of source data, define data models, and to track and document data processes. The application of machine learning can result in a more automatic way to curate data, resulting in a saving of time and money for researchers. This project will address the lack of automation in data curation, which is a problem for both industry and academic research.
“Data curation is a process of acquiring multiple sources of data, improving the quality of the data, and integrating this data into a usable information product,” AbuHalimeh said. “A team of professors and students led by Dr. John Talburt, a PI in the DART grant and a UA Little Rock professor of information science, has already developed a tool called the Data Washing Machine.
In a clothes washing machine, you put dirty clothes in, put in some soap, set a dial, and it will automatically clean them for you. The idea is that you give the Data Washing Machine dirty datasets, and you get clean data out. However, the current Data Washing Machine is a rule-based system. In my project, we will use machine learning techniques to further automate the process of determining the quality of the data, such as for accuracy, consistency, and many other factors.”
AbuHalimeh’s long-term goal is to create a machine learning model that can be used to automatically assess data quality, detect and correct errors, and integrate data for data streams and datasets that are used by academic and industry researchers.