How much weather effects is on NZ tracks visitations

an AI approach to divide and conquer

Dong Wang, Jeff Dalley, Elaine Wright

Weather Sensitivity (PDF) Weather Adjustment (PDF)

1. Introduction and preview of the tools

NZ is the home of hundreds of tracks geographically distributed across NewZealand. It is known that those track locations are impacted by heterogeneous weather patterns, with various degrees, which unavoidably influence Visitors' decisions to visit. What complicated visitor' decisions also include weekends or/and public holiday effects, i.e., visitors more likely to go during (long) weekends or/and holidays.

In this pioneer study, an AI approaches has been adopted that can tease apart weather impacts from weekend and holiday effects on visitor numbers on any tracks.

The output (Map. 1 and Fig. 1) of this study can help infrastructure planners, pricing teams, managers and directors to make sound management decisions.

Map. 1. Weather effect map
Fig. 1. Weather effect on its original and standarised ([0,1]) scale

2. Methodology

We are living in the world that increasingly rely on products and services featured some Artificial Intelligence (AI). AI, especially those based on Artificial neural networks (ANNs), are rapidly becoming essential and dominant for analysis of complex data and for decision support.

ANNs are highly parameterized, non-linear models with sets of interconnected processing units called neurons that can be used to approximate the relationship between input and output signals of a complex system (Stefaniak et al. 2005). Typically, ANNs are applied to predict the response of one or more variables given one to many explanatory variables, where smooth functions are fitted to dataset while residual error are minimized through iterative training (Hornik 1991)

Compared to conventional statistical models where traditional statisticians used (e.g., generalized linear/additive regression), ANNs have been proved to have a more powerful (probably unmatched) predicting capability. Leverageing ANNs such advantages, in this study, we are interested to use it to 1) predict visitor counts (VC), 2) quantify and explain what predictor(s) are most influentials among a range of confounding predictors/factors on a specifc track. Based on this, We will derive a generic weather index for each track that single out the effect from the weather predictors.

2.1 ANNs Model Specifications

The input is a set of predictors: four weather parameters (daily wind speed (WS), daily minimal temperature (T), daily rain amount (Rain), daily solar radiation (SR)), daily holiday indicator (flagged as 0 or 1), weekend indicator (flagged as 0 or 1) and a various number of lagged weather parameters. The output is daily VC.

The primary reason to consider lagged structure in model specifications is weather parameters are usually autocorrelated. Both over and under -specification of lags were known to have impact on the models response, i.e., VC. In this study, we followed Wang and Lu (2006a) and Lu and Wang (2014) to identify the optimal lags. For a more systematic, generic and rigorous treatment on the effects of past history of predictors, we recommend the work from Burnham and Anderson (2010).

2.2 Peak-season models to establish daily relationship between VC and its influentials

Daily data of both VC and weather parameters from months only in peak season: Oct., Nov., Dec., Jan., Feb., Mar. and Apr. were used in modelling. Data in peak season months from all available years were pooled and used. All available years here refer to years available in VC data. This is suppose to reduce the 'noise' when modelling: only months in peak seasons, visitors are abundence.

2.3 ANNs configuration

Number of layers of ANNs can affect models performance. We tested a range of hidden layers and found single hidden layer mostly sufficed our purpose. Increased layers, in some cases, did increase prediction accuracy, but such gains are little and negligible. Root mean squared errors (RMSE) were used as the metric in model selections. For a through consideration of ANNs and its applications, we recommend the works from Shanmuganathan (2016).

2.4 Initial models testing

Initially, we tested configured ANNs on its ability to rank relative importance of inputs predictors on two selected New Zealand Walks: Tongariro Alpine Crossing and Dolomite Point. Encouraging resutls were obtained: Tongariro Alpine Crossing are almost twice as much as affected by weather (summed by all 4 weather parameters) compared to Dolomite Point, which likely reflected the fact and experience as per the subject matter expert from tourism industy .

2.5 Sites selection to further build a picture of weather impact across New Zealand

DOC owns 700+ tracks/walks across New Zealand. VC data on each walks has variant qualities, which pose a risk that could compromise ANNs modelling. Based on known ANNs performance and quality of VC data on two walks in Section 2.4, three criteria were considered when candidates walks were picked for ANNs modelling:

  1. low % of hourly VC data missing (<= 10%)

  2. high % of active hours: hours with VC > 0 (>= 40%)

  3. ideally high number of years with data availability (>= 2)


This results in 40+ candidates sites picked for modelling.

2.6 Source of weather data and simulation software

All daily weather data were estimated on a regular (~5km) grid covering the whole of New Zealand, i.e., VCSN data simulated by NIWA. The estimates are produced every day, based on the spatial interpolation of actual data observations made at climate stations located around the country. A thin-plate smoothing spline model is used for the spatial interpolations. This model incorporates two location variables (latitude and longitude) and a third "pattern" variable (Tait et al. 2012). The software used for the interpolations is ANUSPLIN (Hutchinson 2012). We used the VCSN data in the grid that is the closest to the walk(s) in modelling.

2.7 Simulations for weather sensitivity analysis after ANNs modelling

ANNs are not statistical models. Once generated, ANNs are not statistically identifiable (deterministic) at all. For a given dataset and configuration, there can be unlimited numbers of ANNs with different weights that could generate very different predictions. Usually, the implicit relationship built by ANNs between the input and outputs are difficult to be intepretated directly, let alone ranking predictors relative importance or aseessing models's sensitivity. As such, a (always) common criticism is ANNs are notoriously known as 'black box models' that offer minimal insight into relationships between inputs and outputs variables. Wang and Lu (2006b) and among a few of others provided a rebuttal to this concern by describing methods to extract information about such variable relationships from ANNs. Here, based on solutions especially in Wang and Lu (2006b), we provided another version of methods to explore the relationship between the outcome variable, VC, and predictor(s) of interest.

Our methods, in a nutshell, is to explore the relationship of VC and a predictor of interest, say, WS, while holding other covariates at constant values (from 5th, 25th, 50th, 75th to 95th quartile), after ANNs being built.

As an example for illustration, let's say if we want to know the effect of WS on VC in Tongariro Alpine Crossing on Holiday and Weekend scenario.

Steps
  1. calculate ith (e.g., 5th) quartile for other weather covariates: T, SR and Rain on the daily data for Holiday and Weekend

  2. plug ith quartile for other weather covariates data into ANNs

  3. while holding 5th quartile for other weather covariates as constants, vary WS values from 0 to its known daily maximum, record the VC outputs from ANNs

  4. run a linear regression model between VC outputs and WS, record and graph such linear regression model: slope, intercept, p-value and R-squared

  5. the slope: -51.7 (and the intercept: 734) are WS effect on VC in this scenario, shown in Fig.2a


In this way, for a particular weather parameter, (WS here), we examined its effect on 20 possible scenarios: 5 (quartiles of other weather covariates) × 2 (weekend or not) × 2 (holiday or not) in Fig. 2.
Slopes and intercepts were then standardized whcih allow each slope in regression lines (i.e., weather effects) are comparable. Therefore, WS effect WS on VC in Tongariro Alpine Crossing on Holiday and Weekend scenario is -0.78, which was the standardized slope.

The overall effect for the particular weather parameter, (WS here) was a sum of the standardized slopes (after taking absolute values) of all 20 scenarios. For Tongariro Alpine Crossing, its effect was 14.157 as shown in Fig.1, 2 and 3.

Following the same steps and procedures above, we obtained other 3 weather parameters effects on Tongariro Alpine Crossing as shown in Fig.1 and 3. This allows us to quantify effects for each weather parameter separately and rank the relative importance of each weather parameters.
The overall weather effects on any particular walks then is a simple sum of the effects from each separate weather parametrs. For Tongariro Alpine Crossing, it is 40.321.

We then processed each weather effects and its overalls for other eligible candidates sites selected as per Section 2.5 and shown in Fig. 1.

The overalls of weather effects then were scaled in the range of [0,1], which allows us to rank the weather impact site-wise, as shown in Fig. 1 and Map 1.

Fig. 2. Wind Effects on Tongariro Xing Mangatepopo Tk (Tongariro Alpine Crossing) under 20 scenarios and its overall effects (click graph for better view)
image



Fig. 3. How overall weather Effects (before scaling) being calculated under 20 scenarios on Tongariro Xing Mangatepopo Tk (Tongariro Alpine Crossing) (click graph for better view)
image



3. Case studies: weather impact on sites with close geo-locations

Track sites with close geo-locations are supposed to be have similar weather impact (especially when visitation levels are similar). It is interesting to test how AI models interpret this and validate this.

Below we singled out such a paired sites: Great Lake Trail, Kawakawa Bay and Great Lake Trail, W2K Track, and show you how similar weather impact could be on each's visitations.

You can click graphs below to get a better view

4. Results and Discussions

4.1 Classification of weather impact across NZ

Four levels of weather imapct were considered to classify weather impact on visitations across NZ

  1. Level 1: weak. All weather Index (sum effect of Rain, T, WS, SR) is in the range of 0 to 0.25

  2. Level 2: lower medium. All weather Index (sum effect of Rain, T, WS, SR) is in the range of 0.25 to 0.5

  3. Level 3: higher medium. All weather Index (sum effect of Rain, T, WS, SR) is in the range of 0.5 to 0.75

  4. Level 4: strong. All weather Index (sum effect of Rain, T, WS, SR) is in the range of 0.75 to 1

Weather impact seems stronger in west and north west in walks in NZ

5. Reference

Stefaniak B, Cholewiński W, Tarkowska A. Algorithms of Artificial Neural Networks - Practical application in medical science. Polski Merkuriusz Lekarski 2005;19:819-22.
Hornik K (1991). “Approximation Capabilities of Multilayer Feedforward Networks.” Neural Networks, 4(2), 251–257. doi:10.1016/0893-6080(91)90009-t.
Wang, D, Lu, WZ, 2006a, Ground-level ozone prediction using multi-layer perceptron trained with an innovative hybrid approach, Ecological Modelling, 198(3-4), 332-340
Lu, WZ, Wang, D, 2014, Learning machines: Rationale and application in ground-level ozone prediction, Applied Soft Computing, 24, 135-141.
Burnham, K. P., and D. R. Anderson. Model Selection and Multi-Model Inference: A Practical Information-Theoretic Approach. New York: Springer, 2010.
R Core Team, 2019. R: A Language and Environment for Statistical Computing. Ver-sion 3.2.0. R Foundation for Statistical Computing, Vienna, Austria.
Shanmuganathan, Subana, and Sandhya Samarasinghe. Artificial Neural Network Modelling: Subana Shanmuganathan, Sandhya Samarasinghe, Editors. Cham: Springer, 2016.
Tait A, Sturman J, Clark M, 2012. An assessment of the accuracy of interpolated daily rainfall for New Zealand. Journal of Hydrology (NZ), 51(1), 25-44.
Hutchinson M, 2012. ANUSPLIN version 4.3. https://researchers.anu.edu.au/publications/38018
Wang, D, Lu, WZ, 2006b, Interval estimation of urban ozone level and selection of influential factors by employing automatic relevance determination model, Chemosphere,62, 1600-1611.

6. Acknowledgement

This works benefit from numerous discussions with DOC's other statisticians: Ian Westbrooke, Helene Thygesen and NIWA climate scientist Andrew Tait. Zac Taylor has been taken great cautious in QA piles of counter data. Also, other NIWA's staffs are appreciated: Andrew Tait, Petra Pearce, Gregor Macara who kindly provided VCSN weather data.