Off-campus UMass Amherst users: To download campus access dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.

Non-UMass Amherst users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Dissertations that have an embargo placed on them will not be available to anyone until the embargo expires.

Author ORCID Identifier



Open Access Dissertation

Document Type


Degree Name

Doctor of Philosophy (PhD)

Degree Program

Electrical and Computer Engineering

Year Degree Awarded


Month Degree Awarded


First Advisor

David Irwin

Subject Categories

Computer and Systems Architecture | Computer Engineering | OS and Networks | Power and Energy | Systems and Communications | Systems Architecture


The aggregate solar capacity in the U.S. is rising rapidly due to continuing decreases in the cost of solar modules. For example, the installed cost per Watt (W) for residential photovoltaics (PVs) decreased by 6X from 2009 to 2018 (from $8/W to $1.2/W), resulting in the installed aggregate solar capacity increasing 128X from 2009 to 2018 (from 435 megawatts to 55.9 gigawatts). This increasing solar capacity is imposing operational challenges on utilities in balancing electricity's real-time supply and demand, as solar generation is more stochastic and less predictable than aggregate demand.

To address this problem, both academia and utilities have raised strong interests in solar analytics to accurately monitor, predict and react to variations in intermittent solar power. Prior solar analytics are mostly "white-box" approaches that are based on site-specific information and require expert knowledge and thus do not scale, recent research focuses on "black-box" approaches that use training data to automatically learn a custom machine learning (ML) model. Unfortunately, this approach requires months-to-years of training data, and often does not incorporate well-known physical models of solar generation, which reduces its accuracy. Instead, in this dissertation, we present a hybrid "black box" approach that can achieve the best of both to solar analytics. Our hypothesis is that the hybrid "black-box" approach can enable a wide range of accurate solar analytics, including modeling, disaggregation, and localization, with limited training data and without knowledge of key system parameters by integrating "black-box" machine learning approaches with "white-box" physical models. In evaluating our hypothesis, we make the following contributions:

(Mostly) ML "black-box" Solar Modeling. To get benefits from both of ML and physical approaches, we present a configurable hybrid "black-box" ML approach that combines well-known relationships from physical models with unknown relationships learned via ML. Rather than manually determining values for physical model parameters, our approach automatically calibrates them by finding values that best to the data. This calibration requires much less data (as few as 2 datapoints) than training an ML model. And we show that our hybrid approach significantly improves solar modeling accuracy.

(Mostly) Physical "black-box" Solar Modeling. The physical model used in the hybrid model above performs significantly worse than other approaches. To determine the primary source of this inaccuracy, we conduct a large-scale data analysis and show that the only weather metrics that affect solar output are temperature and cloud cover, and then derive a new physical model that accurately quantify cloud cover's effect on solar generation at all sites. We then enhance our physical model with an ML model that learns each site's unique shading effect. And we show that the hybrid modeling yields higher accuracy than current state-of-the-art ML approaches. We also identify a universal weather-solar effect that has not been articulated before and is broadly applicable to other solar analytics.

Solar Disaggregation. Solar forecast models require historical solar generation data for training. Unfortunately, pure solar generation data is often not available, as the vast majority of small-scale residential solar deployments (<10kW) are "Behind the Meter (BTM)", such that smart meter data exposed to utilities represents only the net of a building's solar generation and its energy consumption. To address this problem, we design SunDance, a "black-box'' system that leverages the clear sky maximum solar generation model, and the universal weather-solar effect from the hybrid "black-box" models above. We show that SunDance can accurately disaggregate solar generation from net meter data without access to a building's pure solar generation data for training.

Solar-based Localization. The energy data produced by solar-powered homes is considered "anonymous" and usually publicly available if it is not associated with identifying account information, e.g., a name and address. Our key insight is that solar energy data is not anonymous: every location on Earth has a unique solar signature, and it embeds detailed location information. We then design SunSpot to localize the source of solar generation data and show that SunSpot is able to localize a solar-powered home within 500 meters and 28 kilometers radius for per-second and per-minute resolution.

Weather-based Localization. However, the above solar-based localization has a fundamental limit due to Earth's rotation. To further localize towards a specific home, we identify another key insight: every location on Earth has a distinct weather signature that uniquely identifies it. Interestingly, we find that localizing coarse (one-hour resolution) solar data using weather signature is more accurate than localizing solar data (one minute or one second resolution) using its solar signature. Both of "SunSpot" and "Weatherman" expose a new serious privacy threat from energy data, which has not been presented in the past.