Developing information-theoretic models for testing the power and significance of pinniped survival rate estimates using differing monitoring techniques.

The effective monitoring and assessment of trends of protected, endangered or declining animal populations is an important tool for management and conservation organizations. Several life history parameters need to be accurately estimated to provide valid population models, including survival rates (preferably specific to age classes and gender), reproductive rates and recruitment just to name the salient ones. Different techniques for collecting such vital data and monitoring populations of interest have been used historically, and new techniques are being developed. Classic techniques for monitoring survival include longitudinal mark-resight studies using permanent animal identification such as hot-branding, or VHF-based telemetry transmitters (Burnham et al. 1987, White & Burnham 1999), or the evaluation of age structure information collected in regions/rookeries based on cross-sectional one-time samplings (Holmes & York 2003). Mark-resight studies have been used not only to assess survival rates and population trends, but also to estimate population sizes (e.g Ries et al. 1998). In fact, mark-recapture models were originally developed for abundance estimates, and not for the determination of survival rates or population trends (Schwarz 2001). Various non-VHF telemetry devices have also been used on fish (Graves et al. 2002, Goodyear 2002), and pinnipeds (McConnel et al. 2004). Early approaches to mark-resight studies required specifically structured resight efforts, whereas more recent approaches based on the Cormack-Jolly-Seber model allow the use of opportunistic resight data (see Link & Barker 2005). All of these approaches, including those using classic telemetry transmitters, have to either assess or assume dispersal and emigration rates/patterns (Barker 1997, White & Burnham 1999).

Recently, we developed a new type of telemetry transmitter specifically designed for the collection of long-term, longitudinal data from individual animals, without any spatial constraints or resight effort (Horning & Hill 2005). These satellite-linked "Life History Transmitters" (LHX tags) will provide information on the date an animal dies, irrespective of location. Thus, these devices will provide data of a new structure, with unrestricted spatial resolution and temporal survival resolution of one day. In addition, the resight effort is essentially unlimited, since each animal will be effectively monitored continuously throughout its life.

Such recent developments using long-term, not spatially constrained telemetry devices for the assessment of survival rates and/or population trends, suggest the need for appropriate approaches to the determination of the statistical power of such samplings, and the statistical significance of effects. Classic approaches to power and significance testing have substantial limitations. Many classic tests are based on parametric approaches, and include inherent assumptions about the nature of population sampling. These assumptions may not always be equally satisfied by different approaches, making a comparative assessment of the suitability of different techniques difficult to conduct (although some attempts have been made, see Link & Barker 2005). In addition, many approaches are based on an analysis of variance and/or linear relationships, and thus require either very large sample sizes or substantial effects to avoid incurring Type 1 errors (Gerrodette 1987).

In recent years, many authors have questioned the appropriateness of significance testing for divergent population trends using arbitrary probability thresholds (Berger & Sellke 1987, Johnson 1995). In particular, null-hypothesis testing based on arbitrary P value thresholds has been severely criticized (Berger & Sellke 1987, Johnson 1995, Anderson et al. 2000). Many authors have suggested the use of an information theoretic comparison based on an approach first presented by H. Akaike in 1974 (Akaike 1974). This approach, now known as the application of Akaike’s Information Criterion (AIC), has gained acceptance and use in recent years, although it is not without its critics (DeLeeuw 1992, Buckland et al. 1997, Burnham & Anderson 1998, Anderson et al. 2000). In this approach, multiple alternate models are constructed that could explain an observed effect, and are then ranked based on their likelihood of providing a reasonable explanation of observed effects. Such an approach would seem more suited to a comparison of different population monitoring techniques, in particular a comparison of sensitivity and required sample sizes. The latter is of particular interest when comparing techniques with differing degrees of invasiveness, such as the use of hot-branding or implanted telemetry devices. One reason for the development of LHX tags is the assumption of a substantially smaller required sample size, as a result of the new data structure these tags provide. However, sample sizes cannot currently be directly compared since different assumptions have to be met by LHX vs branding sample designs.

Current approaches to determine the statistical power and significance of data collected through various methods, to estimate survival rates of animal populations of interest have substantial limitations. Different assumptions and parameters apply to such methods, that cannot always be assessed and that preclude a direct comparison of optimal sampling regimes and experimental designs. In addition, the applicability of null-hypothesis testing using arbitrary probability thresholds has been questioned. Recent trends to incorporate information theoretic ranking of multiple models have not been applied to such efforts, and power testing in particular. We propose to develop a new method to estimate power and sensitivity of different methods to estimate survival rates in wild pinniped populations. These new methods will be based on the combination of information-theoretic approaches and randomization techniques, and will be directly suited to analyzing data from a new telemetry transmitter recently developed specifically for such applications.

This project is supported by:

North Pacific Universities Marine Mammal Research Consortium through the North Pacific Marine Science Foundation.

We are developing a novel approach to power and significance testing for survival data collected by different techniques. The new approach will combine the ranking of multiple models using Akaike’s Information Criteria and randomization procedures. Randomization tests in many ways are the most basic statistical test (see Manly 1997). A randomization procedure directly calculates the likelihood of a given type of pattern to appear in a data set, versus the null hypothesis, which states that the observed pattern has appeared purely by chance in a random set of observations. A randomization test seeks to determine whether the null hypothesis is reasonable in a given data set. For such a test, a test statistic St is determined that quantifies an observed pattern. (e.g., a correlation coefficient). The observed value of St is then compared to the distribution of St that is obtained, when the data set is reorganized at random (randomized). If the null hypothesis is true, then all possible values of St are equally likely to occur. While there is little sense in a comparison to such a null hypothesis, other comparisons are permissible and more useful, such as comparisons to previously published patterns or theoretically likely ones. The significance of St can be calculated as the proportion of values equally or more extreme than the observed value of St. Randomization procedures are ideally suited for the analysis of large, finite data sets, and in particular for the analysis of telemetered time series data (see Horning & Trillmich 1999).

Randomization has two distinct advantages and two distinct disadvantages: Randomization delivers valid significance levels without the random sampling from a larger data set required for the application of "conventional" statistics, and as a direct consequence randomization procedures are largely exempt from any restrictions that apply to conventional parametric statistics in terms of distributions. However, and for the same reason, results from randomization tests cannot directly be extrapolated to a larger, sampled data set; results initially only apply within a complete data set. This restriction however, is irrelevant when dealing with a model of a population, where an entire population can be simulated, producing a "complete" data set. Simulating an entire population furthermore addresses the second disadvantage of randomization tests: the difficulty in dealing with small data sets. Small data sets do not directly lend themselves to the calculation of the many permutations needed to accurately obtain reasonable significance levels. This latter shortcoming can also be addressed with special modifications of randomization procedures, bootstrapping and Monte-Carlo simulations. Bootstrapping is a special case of randomization statistics, where a limited sample of a larger data set is resampled based on the initial sample itself. Without knowing anything else about a sampled population than the sample itself, we can approximate what might happen were the population to be resampled by resampling the sample. In other words, the distribution of the sample from the population is the best descriptor of the distribution in the population itself. The difference to general randomization procedures lies in the fact that elements of the data set treated with randomization procedures are replaced in the random permutations, because bootstrapping is a "resampling with replacement" procedure vs. conventional randomization procedures, which are resampling without replacement. In a special case, the Monte-Carlo simulation, resampling is achieved by using an assumed model. Some authors however consider randomization tests as a special case of Monte-Carlo simulations, where the assumed model is based on complete randomness (all data pairings are equally likely). Some of the recent efforts to integrate differently structured data for the modeling of demographics are based on such a Bayesian analysis, using Monte-Carlo simulations with a Markov chain assumed model (Link & Barker 2005). However, as a model designed to assess population demographics, many life history factors need to be parameterized, including some that many not be identifiable.

  1. Develop an effective power test for comparing survival rates between a simulated population, and a sampled population, based on an information-theoretic approach, and independent of the type of data collection/structure.
  2. Use this power test to compare sensitivity and sample sizes for three different methods for determining survival rates, using the example of the Steller sea lion.
  3. Use the power test to assess the effects of seasonality in survival on sensitivity/required sample sizes for three different survival rate measures.
  4. Use the power test to assess the effects of ontogenetic changes in annual survival rates on sensitivity/required sample sizes for three different survival rate measures.
  5. Develop the above approach into a test that will permit the comparison of changes in measured juvenile survival rates for Steller sea lions, using different monitoring techniques.

Akaike, H. 1974. A new look at the statistical model identification. IEEE Transactions on Automatic Control. 19:716-723. 

Anderson, D.R., K.P. Burnham and W.L. Thompson. 2000. Null Hypothesis Testing: Problems, Prevalence, and an Alternative. J. Wildl. Manage. 64(4): 912-923.

Barker, R.J. 1997. Joint modeling of live-recapture, tag-resight, and tag-recovery data. Biometrics. 53: 666-677.

Berger, J.O. and T. Sellke. 1987. Testing a point null hypothesis: the irreconcilability of P values and evidence. J Am. Statistical Association. 82: 112-122.

Buckland, S.T., K.P. Burnham and N.H. Augustin. 1997. Model selection: an integral part of inference. Biometrics. 53: 603-618.

Burnham, K.P., D.R. Anderson, G.C. White, C. Brownie and K.H. Pollock. 1987. Design and analysis methods for fish survival experiments based on release re-capture. American Fisheries Society Monograph.5.

Burnham, K.P. and D.R. Anderson. 1998. Model selection and inference: a practical information-theoretic approach. Springer Verlag, Berlin, Germany.

DeLeeuw, J. 1992. Introduction to Akaike (1973) information theory and an extension of the maximum likelihood principle. Pages 599-609 in S. Kotz abd N.L. Johnson, editors: Breakthroughs in Statistics. Volume 1. Springer Verlag, London, UK.

Gerrodette, T. 1987. A Power Analysis for Detecting Trends. Ecology. 65(5): 1364-1372.

Graves, J. E., B.E. Luckhurst and E.D. Prince. 2002. An evaluation of pop-up satellite tags for estimating postrelease survival of blue marlin (Makaira nigricans) from a recreational fishery. Fish. Bull. 100: 134-142.

Goodyear, C. P. 2002. Factors affecting robust estimates of the catch and release mortality using pop-up tag technology," In: Symposium on catch and release in marine recreational fisheries, A. Studholme, E.D. Prince, and J. Lucy, eds., Spec. Pub. Am. Fish. Soc., pp. 172-179.

Holmes, E.E. and A.E. York. 2003. Using Age Structure to Detect Impacts on Threatened Populations: a Case Study with Steller Sea Lions. Conservation Biology. 17(6): 1794-1806.

Horning, M. and F. Trillmich. 1999. Lunar cycles in diel prey migrations exert a stronger effect on the diving of juveniles than adult Galapagos fur seals. Proc. Roy. Soc. Lond. B. 266: 1127-1132.

Horning, M. and R.D. Hill. 2005. Designing an archival satellite transmitter for life-long deployments on oceanic vertebrates: The Life History Transmitter. IEEE Journal of Oceanic Engineering. 30(4): 807-817.

Johnson, D.H. 1995. The insignificance of statistical significance testing. J. Wildl. Manage. 63: 763-772.

Manly, B.F.J. 1997. Randomization, bootstrap and Monte-Carlo methods in biology. 2nd edition. Chapman & Hall, London, UK. 399 pp.

McConnel, B.C., R. Beaton, E. Bryant, C. Hunter, P. Lovell and A. Hall. 2004. Phoning home - a new GCM mobile phone telemetry system to collect mark-recapture data. Mar. Mamm. Sci. 20: 274-283.

Ries, E.H., L.R. Hiby and P.J.H. Reijnders. 1998. Maximum likelihood population size estimation of harbour seals in the Dutch Wadden Sea based on a mark-recapture experiment. J. Appl. Ecol. 35: 332-339.

Schwarz, C.J. 2001. The Jolly-Seber model: More than just abundance. J. Appl. Agr. Biol. Env. Stat. 6: 195-205.

White, G.C. and K.P. Burnham. 1999. MARK – survival estimation from populations of marked animals. Bird Study. 46 (supplement): S120-S139.

Willis, K. and M. Horning. 2005. A novel approach to measuring heat flux in swimming animals. J. Exp. Mar. Biol. Ecol. 315(2): 147-162.

Willis, K., M. Horning, D.A.S. Rosen and A.W. Trites. 2005. Spatial variation in heat flux in Steller sea lions: evidence for consistent avenues of heat exchange along the body trunk. J. Exp. Mar. Biol. Ecol. 315(2): 163-175.