Dong Wang, Jeff Dalley, Elaine Wright
Weather Sensitivity (PDF) Weather Adjustment (PDF) NZ is the home of hundreds of tracks geographically distributed across NewZealand. It is known that those track locations are impacted by heterogeneous weather patterns, with various degrees, which unavoidably influence Visitors' decisions to visit. What complicated visitor' decisions also include weekends or/and public holiday effects, i.e., visitors more likely to go during (long) weekends or/and holidays.
In this pioneer study, an AI approaches has been adopted that can tease apart weather impacts from weekend and holiday effects on visitor numbers on any tracks.
The output (Map. 1 and Fig. 1) of this study can help infrastructure planners, pricing teams, managers and directors to make sound management decisions.
We are living in the world that increasingly rely on products and services featured some Artificial Intelligence (AI). AI, especially those based on Artificial neural networks (ANNs), are rapidly becoming essential and dominant for analysis of complex data and for decision support.
ANNs are highly parameterized, non-linear models with sets of interconnected processing units called neurons that can be used to approximate the relationship between input and output signals of a complex system (Stefaniak et al. 2005). Typically, ANNs are applied to predict the response of one or more variables given one to many explanatory variables, where smooth functions are fitted to dataset while residual error are minimized through iterative training (Hornik 1991)
Compared to conventional statistical models where traditional statisticians used (e.g., generalized linear/additive regression), ANNs have been proved to have a more powerful (probably unmatched) predicting capability. Leverageing ANNs such advantages, in this study, we are interested to use it to 1) predict visitor counts (VC), 2) quantify and explain what predictor(s) are most influentials among a range of confounding predictors/factors on a specifc track. Based on this, We will derive a generic weather index for each track that single out the effect from the weather predictors.
The input is a set of predictors: four weather parameters (daily wind speed (WS), daily minimal temperature (T), daily rain amount (Rain), daily solar radiation (SR)), daily holiday indicator (flagged as 0 or 1), weekend indicator (flagged as 0 or 1) and a various number of lagged weather parameters. The output is daily VC.
The primary reason to consider lagged structure in model specifications is weather parameters are usually autocorrelated. Both over and under -specification of lags were known to have impact on the models response, i.e., VC. In this study, we followed Wang and Lu (2006a) and Lu and Wang (2014) to identify the optimal lags. For a more systematic, generic and rigorous treatment on the effects of past history of predictors, we recommend the work from Burnham and Anderson (2010).
Daily data of both VC and weather parameters from months only in peak season: Oct., Nov., Dec., Jan., Feb., Mar. and Apr. were used in modelling. Data in peak season months from all available years were pooled and used. All available years here refer to years available in VC data. This is suppose to reduce the 'noise' when modelling: only months in peak seasons, visitors are abundence.
Number of layers of ANNs can affect models performance. We tested a range of hidden layers and found single hidden layer mostly sufficed our purpose. Increased layers, in some cases, did increase prediction accuracy, but such gains are little and negligible. Root mean squared errors (RMSE) were used as the metric in model selections. For a through consideration of ANNs and its applications, we recommend the works from Shanmuganathan (2016).
Initially, we tested configured ANNs on its ability to rank relative importance of inputs predictors on two selected New Zealand Walks: Tongariro Alpine Crossing and Dolomite Point. Encouraging resutls were obtained: Tongariro Alpine Crossing are almost twice as much as affected by weather (summed by all 4 weather parameters) compared to Dolomite Point, which likely reflected the fact and experience as per the subject matter expert from tourism industy .
DOC owns 700+ tracks/walks across New Zealand. VC data on each walks has variant qualities, which pose a risk that could compromise ANNs modelling. Based on known ANNs performance and quality of VC data on two walks in Section 2.4, three criteria were considered when candidates walks were picked for ANNs modelling:
|
All daily weather data were estimated on a regular (~5km) grid covering the whole of New Zealand, i.e., VCSN data simulated by NIWA. The estimates are produced every day, based on the spatial interpolation of actual data observations made at climate stations located around the country. A thin-plate smoothing spline model is used for the spatial interpolations. This model incorporates two location variables (latitude and longitude) and a third "pattern" variable (Tait et al. 2012). The software used for the interpolations is ANUSPLIN (Hutchinson 2012). We used the VCSN data in the grid that is the closest to the walk(s) in modelling.
ANNs are not statistical models. Once generated, ANNs are not statistically identifiable (deterministic) at all. For a given dataset and configuration, there can be unlimited numbers of ANNs with different weights that could generate very different predictions. Usually, the implicit relationship built by ANNs between the input and outputs are difficult to be intepretated directly, let alone ranking predictors relative importance or aseessing models's sensitivity. As such, a (always) common criticism is ANNs are notoriously known as 'black box models' that offer minimal insight into relationships between inputs and outputs variables. Wang and Lu (2006b) and among a few of others provided a rebuttal to this concern by describing methods to extract information about such variable relationships from ANNs. Here, based on solutions especially in Wang and Lu (2006b), we provided another version of methods to explore the relationship between the outcome variable, VC, and predictor(s) of interest.
Our methods, in a nutshell, is to explore the relationship of VC and a predictor of interest, say, WS, while holding other covariates at constant values (from 5th, 25th, 50th, 75th to 95th quartile), after ANNs being built.
As an example for illustration, let's say if we want to know the effect of WS on VC in Tongariro Alpine Crossing on Holiday and Weekend scenario.
|
Track sites with close geo-locations are supposed to be have similar weather impact (especially when visitation levels are similar). It is interesting to test how AI models interpret this and validate this.
Below we singled out such a paired sites: Great Lake Trail, Kawakawa Bay and Great Lake Trail, W2K Track, and show you how similar weather impact could be on each's visitations.
You can click graphs below to get a better view
Four levels of weather imapct were considered to classify weather impact on visitations across NZ
|
Weather impact seems stronger in west and north west in walks in NZ
Stefaniak B, Cholewiński W, Tarkowska A. Algorithms of Artificial Neural Networks - Practical application in medical science. Polski Merkuriusz Lekarski 2005;19:819-22.
Hornik K (1991). “Approximation Capabilities of Multilayer Feedforward Networks.” Neural Networks, 4(2), 251–257. doi:10.1016/0893-6080(91)90009-t.
Wang, D, Lu, WZ, 2006a, Ground-level ozone prediction using multi-layer perceptron trained with an innovative hybrid approach, Ecological Modelling, 198(3-4), 332-340
Lu, WZ, Wang, D, 2014, Learning machines: Rationale and application in ground-level ozone prediction, Applied Soft Computing, 24, 135-141.
Burnham, K. P., and D. R. Anderson. Model Selection and Multi-Model Inference: A Practical Information-Theoretic Approach. New York: Springer, 2010.
R Core Team, 2019. R: A Language and Environment for Statistical Computing. Ver-sion 3.2.0. R Foundation for Statistical Computing, Vienna, Austria.
Shanmuganathan, Subana, and Sandhya Samarasinghe. Artificial Neural Network Modelling: Subana Shanmuganathan, Sandhya Samarasinghe, Editors. Cham: Springer, 2016.
Tait A, Sturman J, Clark M, 2012. An assessment of the accuracy of interpolated daily rainfall for New Zealand. Journal of Hydrology (NZ), 51(1), 25-44.
Hutchinson M, 2012. ANUSPLIN version 4.3. https://researchers.anu.edu.au/publications/38018
Wang, D, Lu, WZ, 2006b, Interval estimation of urban ozone level and selection of influential factors by employing automatic relevance determination model, Chemosphere,62, 1600-1611.
This works benefit from numerous discussions with DOC's other statisticians: Ian Westbrooke, Helene Thygesen and NIWA climate scientist Andrew Tait. Zac Taylor has been taken great cautious in QA piles of counter data. Also, other NIWA's staffs are appreciated: Andrew Tait, Petra Pearce, Gregor Macara who kindly provided VCSN weather data.