Off-campus UMass Amherst users: To download campus access dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.

Non-UMass Amherst users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Dissertations that have an embargo placed on them will not be available to anyone until the embargo expires.


Access Type

Open Access Thesis

Document Type


Degree Program

Chemical Engineering

Degree Type

Master of Science in Chemical Engineering (M.S.Ch.E.)

Year Degree Awarded


Month Degree Awarded



Polymer coatings offer a wide range of benefits across various industries, playing a crucial role in product protection and extension of shelf life. However, formulating them can be a non-trivial task given the multitude of variables and factors involved in the production process, rendering it a complex, high-dimensional problem. To tackle this problem, machine learning (ML) has emerged as a promising tool, showing considerable potential in enhancing various polymer and chemistry-based applications, particularly those dealing with high dimensional complexities.

Our research aims to develop a physics-guided ML approach to facilitate the formulations of polymer coatings. As the first step, this project focuses on finding machine-readable feature representation techniques most suitable for encoding formulation ingredients. Utilizing two polymer-informatics datasets, one encompassing a large set of 700,000 common homopolymers including epoxies and polyurethanes as coating base materials while the other a relatively small set of 1000 data points of epoxy-diluent formulations, four featurization schemes to represent polymer coating molecules were benchmarked. They include the molecular access system, the extended connectivity fingerprint, molecular graph-based chemical graph network, and graph convolutional network (MG-GCN) embeddings. These representation schemes were used with ensemble models to predict molecular properties including topological surface area and viscosity. The results show that the combination of MG-GCN and ensemble models such as the extreme boosting machine and random forest models achieved the best overall performance, with coefficient of determination (r2) values of 0.74 in topological surface area and 0.84 in viscosity, which compare favorably with existing techniques. These results lay the foundation for using ML with physical modeling to expedite the development of polymer coating formulations.


First Advisor

Peng Bai

Second Advisor

Dimitrios Maroudas

Third Advisor

Hui Guan

Creative Commons License

Creative Commons Attribution 4.0 License
This work is licensed under a Creative Commons Attribution 4.0 License.