Title: | Exploring Water Quality Monitoring Data |
---|---|
Description: | Functions to assist in the processing and exploration of data from environmental monitoring programs. The package name stands for "water quality" and reflects the original focus on time series data for physical and chemical properties of water, as well as the biota. Intended for programs that sample approximately monthly, quarterly or annually at discrete stations, a feature of many legacy data sets. Most of the functions should be useful for analysis of similar-frequency time series regardless of the subject matter. |
Authors: | Alan Jassby [aut], James Cloern [ctb], Jemma Stachelek [ctb, cre] |
Maintainer: | Jemma Stachelek <[email protected]> |
License: | GPL-2 |
Version: | 1.0.2 |
Built: | 2025-02-21 06:16:53 UTC |
Source: | https://github.com/jsta/wql |
One of the more tedious tasks in exploring environmental data sets is creating usable time series from the original complex data sets, especially when you want many series at will that group data in different ways. So wql also provides a way of transforming data sets to a common format that then allows a diversity of time series to be created quickly. A few functions are specific to the fields of limnology and oceanography.
The plots are designed for easy use, not for publication-quality graphs.
Nonetheless, extensive customization is possible by passing options through
...{}
, adding annotations in the case of base graphics, and adding
layers in the case of ggplot2 objects.
Two functions are used mainly for preparing the times series:
a function that transforms incoming data to a common data
structure in the form of the WqData
class
a function that easily prepares time series objects from this class
The WqData
class can be easily adapted to non-aquatic data.
Obviously, the depth
field can be used for elevation in atmospheric
studies. But more generally, the site
and depth
fields can be
used for many two-way classifications and don't need to refer to spatial
location.
Some of the time series functions include:
a variety of plots to examine changes in seasonal patterns
nonparametric trend tests
time series interpolation and related manipulations
a simple decomposition of a series into different time scales
phenological analyses
the use of empirical orthogonal functions to detect multiple independent mechanisms underlying temporal change
A few functions are specialized for the aquatic sciences:
converting between oxygen concentrations and percent saturation
converting between salinity and conductivity
The capabilities of wql are more fully explained in the accompanying vignette: “wql: Exploring environmental monitoring data”.
Maintainer: Jemma Stachelek [email protected] [contributor]
Authors:
Alan Jassby
Other contributors:
James Cloern [contributor]
Useful links:
Report bugs at https://github.com/jsta/wql/issues
date2decyear
date2decyear(w)
date2decyear(w)
w |
date |
Alan Jassby, James Cloern
A class union of "Date"
and "POSIXct"
classes.
A virtual Class: No objects may be created from it.
showClass("DateTime")
showClass("DateTime")
The function decomposes a time series into a long-term mean, annual, seasonal and "events" component. The decomposition can be multiplicative or additive, and based on median or mean centering.
decompTs( x, event = TRUE, type = c("mult", "add"), center = c("median", "mean") )
decompTs( x, event = TRUE, type = c("mult", "add"), center = c("median", "mean") )
x |
a monthly time series vector |
event |
whether or not an "events" component should be determined |
type |
the type of decomposition, either multiplicative ("mult") or additive ("add") |
center |
the method of centering, either median or mean |
The rationale for this simple approach to decomposing a time series, with examples of its application, is given by Cloern and Jassby (2010). It is motivated by the observation that many important events for estuaries (e.g., persistent dry periods, species invasions) start or stop suddenly. Smoothing to extract the annualized term, which can disguise the timing of these events and make analysis of them unnecessarily difficult, is not used.
A multiplicative decomposition will typically be useful for a biological community- or population-related variable (e.g., chlorophyll-a) that experiences exponential changes in time and is approximately lognormal, whereas an additive decomposition is more suitable for a normal variable. The default centering method is the median, especially appropriate for series that have large, infrequent events.
If event = TRUE
, the seasonal component represents a recurring
monthly pattern and the events component a residual series. Otherwise, the
seasonal component becomes the residual series. The latter is appropriate
when seasonal patterns change systematically over time. You can use
plotSeason
and seasonTrend
to investigate the
way seasonality changes.
A monthly time series matrix with the following individual time series:
original |
original time series |
annual |
annual mean series |
seasonal |
repeating seasonal component |
events |
optionally, the residual or "events" series |
Alan Jassby, James Cloern
Cloern, J.E. and Jassby, A.D. (2010) Patterns and scales of phytoplankton variability in estuarine-coastal ecosystems. Estuaries and Coasts 33, 230–241.
# Apply the function to a single series (Station 27) and plot it: y <- decompTs(sfbayChla[, 's27']) y plot(y, nc=1, main="")
# Apply the function to a single series (Station 27) and plot it: y <- decompTs(sfbayChla[, 's27']) y plot(y, nc=1, main="")
decyear2date
decyear2date(x)
decyear2date(x)
x |
date |
Alan Jassby, James Cloern
Electrical conductivity data are converted to salinity using the Practical Salinity Scale and an extension for salinities below 2.
ec2pss(ec, t, p = 0)
ec2pss(ec, t, p = 0)
ec |
conductivity, mS/cm |
t |
temperature, Celsius |
p |
gauge pressure, decibar |
ec2pss
converts electrical conductivity data to salinity using the
Practical Salinity Scale 1978 in the range of 2-42 (Fofonoff and Millard
1983). Salinities below 2 are calculated using the extension of the
Practical Salinity Scale (Hill et al. 1986).
R2pss
is the same function, except that conductivity ratios rather
than conductivities are used as input.
ec2pss
and R2pss
both return salinity values on the
Practical Salinity Scale.
Input pressures are not absolute pressures but rather gauge pressures. Gauge pressures are measured relative to 1 standard atmosphere, so the gauge pressure at the surface is 0.
Alan Jassby, James Cloern
Fofonoff N.P. and Millard Jr R.C. (1983) Algorithms for Computation of Fundamental Properties of Seawater. UNESCO Technical Papers in Marine Science 44. UNESCO, Paris, 53 p.
Hill K.D., Dauphinee T.M. and Woods D.J. (1986) The extension of the Practical Salinity Scale 1978 to low salinities. IEEE Journal of Oceanic Engineering 11, 109-112.
# Check values from Fofonoff and Millard (1983): R = c(1, 1.2, 0.65) t = c(15, 20, 5) p = c(0, 2000, 1500) R2pss(R, t, p) # 35.000 37.246 27.995 # Repeat calculation with equivalent conductivity values by setting # ec <- R * C(35, 15, 0): ec = c(1, 1.2, 0.65) * 42.9140 ec2pss(ec, t, p) # same results
# Check values from Fofonoff and Millard (1983): R = c(1, 1.2, 0.65) t = c(15, 20, 5) p = c(0, 2000, 1500) R2pss(R, t, p) # 35.000 37.246 27.995 # Repeat calculation with equivalent conductivity values by setting # ec <- R * C(35, 15, 0): ec = c(1, 1.2, 0.65) * 42.9140 ec2pss(ec, t, p) # same results
Finds and rotates empirical orthogonal functions (EOFs).
eof(x, n, scale. = TRUE)
eof(x, n, scale. = TRUE)
x |
a data frame or matrix, with no missing values |
n |
number of EOFs to retain for rotation |
scale. |
logical indicating whether the (centered) variables should be scaled to have unit variance |
EOF analysis is used to study patterns of variability (“modes”) in a matrix time series and how these patterns change with time (“amplitude time series”). Hannachi et al. (2007) give a detailed discussion of this exploratory approach with emphasis on meteorological data. In oceanography and climatology, the time series represent observations at different spatial locations (columns) over time (rows). But columns can also be seasons of the year (Jassby et al. 1999) or even a combination of seasons and depth layers (Jassby et al. 1990). EOF analysis uses the same techniques as principal component analysis, but the time series are observations of the same variable in the same units. Scaling the data is optional, but it is the default here.
Eigenvectors (unscaled EOFs) and corresponding eigenvalues (amount of
explained variance) are found by singular value decomposition of the
centered and (optionally) scaled data matrix using prcomp
. In
order to facilitate a physical interpretation of the variability modes, a
subset consisting of the n
most important EOFs is rotated (Richman
1986). eofNum
can be used to help choose n
. Hannachi et
al. (2007) recommend orthogonal rotation of EOFs scaled by the square root
of the corresponding eigenvalues to avoid possible computation problems and
reduce sensitivity to the choice of n
. We follow this recommendation
here, using the varimax
method for the orthogonal rotation.
Note that the signs of the EOFs are arbitrary.
A list with the following members:
REOF |
a matrix with rotated EOFs |
amplitude |
a matrix with amplitude time series of REOFs |
eigen.pct |
all eigenvalues of correlation matrix as percent of total variance |
variance |
variance explained by retained EOFs |
Alan Jassby, James Cloern
Hannachi, A., Jolliffe, I.T., and Stephenson, D.B. (2007) Empirical orthogonal functions and related techniques in atmospheric science: A review. International Journal of Climatology 27, 1119–1152.
Jassby, A.D., Powell, T.M., and Goldman, C.R. (1990) Interannual fluctuations in primary production: Direct physical effects and the trophic cascade at Castle Lake, California (USA). Limnology and Oceanography 35, 1021–1038.
Jassby, A.D., Goldman, C.R., Reuter, J.E., and Richards, R.C. (1999) Origins and scale dependence of temporal variability in the transparency of Lake Tahoe, California-Nevada. Limnology and Oceanography 44, 282–294.
Richman, M. (1986) Rotation of principal components. Journal of Climatology 6, 293–335.
eofNum
, eofPlot
,
monthCor
, ts2df
# Create an annual matrix time series chla1 <- aggregate(sfbayChla, 1, mean, na.rm = TRUE) chla1 <- chla1[, 1:12] # remove stations with missing years # eofNum (see examples) suggests n = 1 eof(chla1, 1)
# Create an annual matrix time series chla1 <- aggregate(sfbayChla, 1, mean, na.rm = TRUE) chla1 <- chla1[, 1:12] # remove stations with missing years # eofNum (see examples) suggests n = 1 eof(chla1, 1)
Plots the variances associated with empirical orthogonal functions (EOF). Useful for deciding how many EOFs to retain for rotation.
eofNum(x, n = nrow(x), scale. = TRUE)
eofNum(x, n = nrow(x), scale. = TRUE)
x |
a data frame or matrix, with no missing values |
n |
effective sample size |
scale. |
logical indicating whether the (centered) variables should be scaled to have unit variance |
Calculates the eigenvalues from an EOF analysis, as described in
eof
. The eigenvalues are plotted against eigenvalue number
(sometimes called a “scree plot”), and the cumulative variance as %
of total is plotted over each eigenvalue. The approximate 0.95 confidence
limits are depicted for each eigenvalue using North et al.'s (1982)
rule-of-thumb, which ignores any autocorrelation in the data. If the
autocorrelation structure is assessed separately and can be expressed in
terms of effective sample size (e.g., Thiebaux and Zwiers 1984), then
n
can be set equal to this number.
There is no universal rule for deciding how many of the EOFs should be retained for rotation (Hannachi et al. 2007). In practice, the number is chosen by requiring a minimum cumulative variance, looking for a sharp break in the spectrum, requiring that confidence limits not overlap, various Monte Carlo methods, or many other techniques. The plot produced here enables the first three methods.
A plot of the eigenvectors.
Alan Jassby, James Cloern
Hannachi, A., Jolliffe, I.T., and Stephenson, D.B. (2007) Empirical orthogonal functions and related techniques in atmospheric science: A review. International Journal of Climatology 27, 1119–1152.
North, G., Bell, T., Cahalan, R., and Moeng, F. (1982) Sampling errors in the estimation of empirical orthogonal functions. Monthly Weather Review 110, 699–706.
Thiebaux H.J. and Zwiers F.W. (1984) The interpretation and estimation of effective sample sizes. Journal of Climate and Applied Meteorology 23, 800–811.
eof
, interpTs
, monthCor
,
eofPlot
# Create an annual time series data matrix from sfbay chlorophyll data # Average over each year chla1 <- aggregate(sfbayChla, 1, mean, na.rm = TRUE) chla1 <- chla1[, 1:12] # remove stations with missing years eofNum(chla1) # These stations appear to act as one with respect to chlorophyll # variability on the annual scale because there's one dominant EOF.
# Create an annual time series data matrix from sfbay chlorophyll data # Average over each year chla1 <- aggregate(sfbayChla, 1, mean, na.rm = TRUE) chla1 <- chla1[, 1:12] # remove stations with missing years eofNum(chla1) # These stations appear to act as one with respect to chlorophyll # variability on the annual scale because there's one dominant EOF.
Plots the rotated empirical orthogonal functions or amplitude time series
resulting from eof
.
eofPlot(x, type = c("coef", "amp"), rev = FALSE, ord = FALSE)
eofPlot(x, type = c("coef", "amp"), rev = FALSE, ord = FALSE)
x |
result of the function |
type |
whether the EOF coefficients or amplitudes should be plotted |
rev |
logical indicating whether coefficients and amplitudes should be
multiplied by |
ord |
logical indicating whether coefficients should be ordered by size |
When the columns of the original data have a natural order, such as stations
along a transect or months of the year, there may be no need to reorder the
EOF coefficients. But if there is no natural order, such as when columns
represents disparate sites around the world, the plot can be more
informative if coefficients are ordered by size (ord = TRUE
).
Coefficients and amplitudes for a given EOF may be more easily interpreted
if rev = TRUE
, because the sign of the first coefficient is
arbitrarily determined and all the other signs follow from that choice.
A plot of the EOF coefficients or amplitudes.
Alan Jassby, James Cloern
# Create an annual matrix time series chla1 <- aggregate(sfbayChla, 1, mean, na.rm = TRUE) chla1 <- chla1[, 1:12] # remove stations with missing years # eofNum (see examples) suggests n = 1 e1 <- eof(chla1, n = 1) eofPlot(e1, type = 'coef') eofPlot(e1, type = 'amp')
# Create an annual matrix time series chla1 <- aggregate(sfbayChla, 1, mean, na.rm = TRUE) chla1 <- chla1[, 1:12] # remove stations with missing years # eofNum (see examples) suggests n = 1 e1 <- eof(chla1, n = 1) eofPlot(e1, type = 'coef') eofPlot(e1, type = 'amp')
Imterpolates or substitutes missing data in a time series for gaps up to a specified size.
interpTs( x, type = c("linear", "series.median", "series.mean", "cycle.median", "cycle.mean"), gap = NULL )
interpTs( x, type = c("linear", "series.median", "series.mean", "cycle.median", "cycle.mean"), gap = NULL )
x |
object of class |
type |
method of interpolation or substitution |
gap |
maximum gap to be replaced |
When type = "linear"
, the function performs linear interpolation of
any NA
runs of length smaller than or equal to gap
. When
gap = NULL
, gaps of any size will be replaced. Does not change
leading or trailing NA
runs. This interpolation approach is best for
periods of low biological activity when sampling is routinely suspended.
When type = "series.median"
or "series.mean"
, missing values
are replaced by the overall median or mean, respectively. This may be
desirable when missing values are not allowed but one wants, for example, to
avoid spurious enhancement of trends.
When type = "cycle.median"
or type = "cycle.mean"
, missing
values are replaced by the median or mean, respectively, for the same cycle
position (i.e., same month, quarter, etc., depending on the frequency). This
may give more realistic series than using the overall mean or median.
Intended for time series but first three types will work with any vector or matrix. Matrices will be interpolated by column.
The time series with some or all missing values replaced.
Alan Jassby, James Cloern
### Interpolate a vector time series and highlight the imputed data chl27 <- sfbayChla[, 's27'] x1 <- interpTs(chl27, gap = 3) plot(x1, col = 'red') lines(chl27, col = 'blue') x2 <- interpTs(chl27, type = "series.median", gap = 3) plot(x2, col = 'red') lines(chl27, col = 'blue') ### Interpolate a matrix time series and plot results x3 <- interpTs(sfbayChla, type = "cycle.mean", gap = 1) plot(x3[, 1:10], main = "SF Bay Chl-a\n(gaps of 1 month replaced)")
### Interpolate a vector time series and highlight the imputed data chl27 <- sfbayChla[, 's27'] x1 <- interpTs(chl27, gap = 3) plot(x1, col = 'red') lines(chl27, col = 'blue') x2 <- interpTs(chl27, type = "series.median", gap = 3) plot(x2, col = 'red') lines(chl27, col = 'blue') ### Interpolate a matrix time series and plot results x3 <- interpTs(sfbayChla, type = "cycle.mean", gap = 1) plot(x3[, 1:10], main = "SF Bay Chl-a\n(gaps of 1 month replaced)")
Acts on a matrix or data frame with depth in the first column and observations for different variables (or different sites, or different times) in each of the remaining columns. The trapezoidal mean over the given depths is calculated for each of the variables. Replicate depths are averaged, and missing values or data with only one unique depth are handled. Data are not extrapolated to cover missing values at the top or bottom of the layer. The result can differ markedly from the simple mean even for equal spacing of depths, because the top and bottom values are weighted by 0.5 in a trapezoidal mean.
layerMean(d)
layerMean(d)
d |
data.frame |
Alan Jassby, James Cloern
TRUE
if x
is a leap year, FALSE
otherwise.
leapYear(x)
leapYear(x)
x |
integer year |
Alan Jassby, James Cloern
Applies Kendall's tau test for the significance of a monotonic time series trend (Mann 1945). Also calculates the Sen slope as an estimate of this trend.
mannKen( x, plot = FALSE, type = c("slope", "relative"), order = FALSE, pval = 0.05, pchs = c(19, 21), ... )
mannKen( x, plot = FALSE, type = c("slope", "relative"), order = FALSE, pval = 0.05, pchs = c(19, 21), ... )
x |
A numeric vector, matrix or data frame |
plot |
Should the trends be plotted when x is a matrix or data frame? |
type |
Type of trend to be plotted, actual or relative |
order |
Should the plotted trends be ordered by size? |
pval |
p-value for significance |
pchs |
Plot symbols for significant and not significant trend estimates, respectively |
... |
Other arguments to pass to plotting function |
The Sen slope (alternately, Theil or Theil-Sen slope)—the median slope joining all pairs of observations—is expressed by quantity per unit time. The fraction of missing slopes involving the first and last fifths of the data are provided so that the appropriateness of the slope estimate can be assessed and results flagged. Schertz et al. [1991] discuss this and related decisions about missing data. Other results are used for further analysis by other functions. Serial correlation is ignored, so the interval between points should be long enough to avoid strong serial correlation.
For the relative slope, the slope joining each pair of observations is
divided by the first of the pair before the overall median is taken. The
relative slope makes sense only as long as the measurement scale is
non-negative (not, e.g., temperature on the Celsius scale). Comparing
relative slopes is useful when the variables in x
have different
units.
If plot = TRUE
, then either the Sen slope (type = "slope"
) or
the relative Sen slope (type = "relative"
) are plotted. The plot
symbols indicate, respectively, that the trend is significant or not
significant. The plot can be customized by passing any arguments used by
dotchart
such as xlab
or xlim
, as well as
graphical parameters described in par
.
A list of the following if x
is a vector:
sen.slope |
Sen slope |
sen.slope.rel |
Relative Sen slope |
p.value |
Significance of slope |
S |
Kendall's S |
varS |
Variance of S |
miss |
Fraction of missing slopes connecting
first and last fifths of |
or a matrix with corresponding columns if
x
is a matrix or data frame.
Approximate p-values with corrections for ties and continuity are used
if or if there are any ties. Otherwise, exact p-values based on
Table B8 of Helsel and Hirsch (2002) are used. In the latter case,
should be interpreted as
.
Alan Jassby, James Cloern
Mann, H.B. (1945) Nonparametric tests against trend. Econometrica 13, 245–259.
Helsel, D.R. and Hirsch, R.M. (2002) Statistical methods in water resources. Techniques of Water Resources Investigations, Book 4, chapter A3. U.S. Geological Survey. 522 pages. doi:10.3133/twri04A3
Schertz, T.L., Alexander, R.B., and Ohe, D.J. (1991) The computer program EStimate TREND (ESTREND), a system for the detection of trends in water-quality data. Water-Resources Investigations Report 91-4040, U.S. Geological Survey.
tsp(Nile) # an annual time series mannKen(Nile) y <- sfbayChla y1 <- interpTs(y, gap=1) # interpolate single-month gaps only y2 <- aggregate(y1, 1, mean, na.rm=FALSE) mannKen(y2) mannKen(y2, plot=TRUE) # missing data means missing trend estimates mannKen(y2, plot=TRUE, xlim = c(0.1, 0.25)) mannKen(y2, plot=TRUE, type='relative', order = TRUE, pval = .001, xlab = "Relative trend") legend("topleft", legend = "p < 0.001", pch = 19, bty="n")
tsp(Nile) # an annual time series mannKen(Nile) y <- sfbayChla y1 <- interpTs(y, gap=1) # interpolate single-month gaps only y2 <- aggregate(y1, 1, mean, na.rm=FALSE) mannKen(y2) mannKen(y2, plot=TRUE) # missing data means missing trend estimates mannKen(y2, plot=TRUE, xlim = c(0.1, 0.25)) mannKen(y2, plot=TRUE, type='relative', order = TRUE, pval = .001, xlab = "Relative trend") legend("topleft", legend = "p < 0.001", pch = 19, bty="n")
meanSub
meanSub(x, sub, na.rm = FALSE)
meanSub(x, sub, na.rm = FALSE)
x |
numeric vector |
sub |
integer index |
na.rm |
logical |
Alan Jassby, James Cloern
monthCor
monthCor(x)
monthCor(x)
x |
ts |
Alan Jassby, James Cloern
Converts dates to the corresponding numeric month.
monthNum(y)
monthNum(y)
y |
date |
Alan Jassby, James Cloern
First aggregates multivariate matrix time series by year. Then converts to a vector time series in which “seasons” correspond to these annualized values for the original variables.
mts2ts(x, seas = 1:frequency(x), na.rm = FALSE)
mts2ts(x, seas = 1:frequency(x), na.rm = FALSE)
x |
An object of class "mts" |
seas |
Numeric vector of seasons to aggregate in original time series. |
na.rm |
Should missing data be ignored when aggregating? |
The seas
parameter enables focusing the subsequent analysis on
seasons of special interest, or to ignore seasons where there are too many
missing data. The function can be used in conjunction with seaKen
to
conduct a Regional Kendall trend analysis. Sometimes just plotting the
resulting function can be useful for exploring a spatial transect over time.
A vector time series
Alan Jassby, James Cloern
## Quick plot a spatial transect of chlorophyll a during the ## spring bloom period (Feb-Apr) for each year. y <- mts2ts(sfbayChla, seas = 2:4) plot(y, type = 'n') abline(v = 1978:2010, col = 'lightgrey') lines(y, type = 'h')
## Quick plot a spatial transect of chlorophyll a during the ## spring bloom period (Feb-Apr) for each year. y <- mts2ts(sfbayChla, seas = 2:4) plot(y, type = 'n') abline(v = 1978:2010, col = 'lightgrey') lines(y, type = 'h')
Finds dissolved oxygen concentration in equilibrium with water-saturated air.
oxySol(t, S, P = NULL)
oxySol(t, S, P = NULL)
t |
temperature, degrees C |
S |
salinity, on the Practical Salinity Scale |
P |
pressure, atm |
Calculations are based on the approach of Benson and Krause (1984), using
Green and Carritt's (1967) equation for dependence of water vapor partial
pressure on t
and S
. Equations are valid for temperature in
the range 0-40 C and salinity in the range 0-40.
Dissolved oxygen concentration in mg/L at 100% saturation. If
P = NULL
, saturation values at 1 atm are calculated.
Alan Jassby, James Cloern
Benson, B.B. and Krause, D. (1984) The concentration and isotopic fractionation of oxygen dissolved in fresh-water and seawater in equilibrium with the atmosphere. Limnology and Oceanography 29, 620-632.
Green, E.J. and Carritt, D.E. (1967) New tables for oxygen saturation of seawater. Journal of Marine Research 25, 140-147.
# Convert DO into % saturation for 1-m depth at Station 32. # Use convention of expressing saturation at 1 atm. sfb1 <- subset(sfbay, depth == 1 & stn == 32) dox.pct <- with(sfb1, 100 * dox/oxySol(temp, sal)) summary(dox.pct)
# Convert DO into % saturation for 1-m depth at Station 32. # Use convention of expressing saturation at 1 atm. sfb1 <- subset(sfbay, depth == 1 & stn == 32) dox.pct <- with(sfb1, 100 * dox/oxySol(temp, sal)) summary(dox.pct)
Locates a single change-point in an annual series based on the Pettitt test.
pett(x, plot = FALSE, order = FALSE, pval = 0.05, pchs = c(19, 21), ...)
pett(x, plot = FALSE, order = FALSE, pval = 0.05, pchs = c(19, 21), ...)
x |
a numeric vector, matrix or data frame with no missing interior values |
plot |
Should the trends be plotted when x is a matrix? |
order |
Should the plotted trends be ordered by size? |
pval |
p-value for significance |
pchs |
Plot symbols for significant and not significant trend estimates, respectively |
... |
Other arguments to pass to plotting function |
Pettitt's (1979) method is a rank-based nonparametric test for abrupt
changes in a time series. It uses the Mann-Whitney statistic for testing
that two samples (before and after the change-point) come from the same
distribution, choosing the change-point that maximizes the statistic. The
p-value is approximate but accurate to 0.01 for 0.5.
Serial correlation is ignored, so the interval between points should be long
enough to avoid strong serial correlation. The size of the change is
estimated as the median difference between all pairs of observations in
which the first one is after the change-point and the second is up to the
change-point.
Missing values are allowed at the beginning or end of each variable but interior missing values will produce an NA. Otherwise the change-point might not be meaningful.
If plot = TRUE
, a dot plot of change.times
is shown. If
sort = TRUE
, the dots are sorted by change.time
. The plot
symbols indicate, respectively, that the trend is significant or not
significant. The plot can be customized by passing any arguments used by
dotchart
such as xlab
, as well as graphical parameters
described in par
.
A list of the following if x
is a vector:
pettitt.K |
Pettitt's statistic |
p.value |
significance probability for statistic |
change.point |
last position preceding change to new level |
change.time |
if available, time of change.point position |
change.size |
median of all differences between points after and up to change.point |
or a matrix with corresponding columns if x
is a matrix or data frame.
The change.point
returned by these functions is the last
position before the series actually changes, for consistency with the
original Pettitt test. But for reporting purposes, the following position
might be more appropriate to call the “change-point”.
The Pettitt test produces a supposed change-point, even when the trend is smooth, or when the abrupt change is smaller than the long-term smooth change. Remove any smooth, long-term trend before applying this test.
Alan Jassby, James Cloern
Pettitt, A. N. (1979) A non-parametric approach to the change-point problem. Journal of the Royal Statistical Society. Series C (Applied Statistics) 28(2), 126–135.
# data from Pettitt (1979, Table 1): y <- c(-1.05, 0.96, 1.22, 0.58, -0.98, -0.03, -1.54, -0.71, -0.35, 0.66, 0.44, 0.91, -0.02, -1.42, 1.26, -1.02, -0.81, 1.66, 1.05, 0.97, 2.14, 1.22, -0.24, 1.60, 0.72, -0.12, 0.44, 0.03, 0.66, 0.56, 1.37, 1.66, 0.10, 0.80, 1.29, 0.49, -0.07, 1.18, 3.29, 1.84) pett(y) # K=232, p=0.0146, change-point=17, the same results as Pettitt # identify the year of a change-point in an annual time series: pett(Nile) # apply to a matrix time series: y <- ts.intersect(Nile, LakeHuron) pett(y) pett(y, plot = TRUE, xlab = "Change-point") legend("topleft", legend = "p < 0.05", pch = 19, bty="n") # note how a smooth trend can disguise a change-point: # smooth trend with change-point at 75 y <- 1:100 + c(rep(0, 75), rep(10, 25)) pett(y) # gives 50, erroneously pett(residuals(lm(y~I(1:100)))) # removing trend gives 75, correctly
# data from Pettitt (1979, Table 1): y <- c(-1.05, 0.96, 1.22, 0.58, -0.98, -0.03, -1.54, -0.71, -0.35, 0.66, 0.44, 0.91, -0.02, -1.42, 1.26, -1.02, -0.81, 1.66, 1.05, 0.97, 2.14, 1.22, -0.24, 1.60, 0.72, -0.12, 0.44, 0.03, 0.66, 0.56, 1.37, 1.66, 0.10, 0.80, 1.29, 0.49, -0.07, 1.18, 3.29, 1.84) pett(y) # K=232, p=0.0146, change-point=17, the same results as Pettitt # identify the year of a change-point in an annual time series: pett(Nile) # apply to a matrix time series: y <- ts.intersect(Nile, LakeHuron) pett(y) pett(y, plot = TRUE, xlab = "Change-point") legend("topleft", legend = "p < 0.05", pch = 19, bty="n") # note how a smooth trend can disguise a change-point: # smooth trend with change-point at 75 y <- 1:100 + c(rep(0, 75), rep(10, 25)) pett(y) # gives 50, erroneously pett(residuals(lm(y~I(1:100)))) # removing trend gives 75, correctly
Finds various measures of the amplitude of the annual cycle, or of some specified season range.
x |
A seasonal time series, or a class |
season.range |
A vector of two numbers specifying the season range to be considered. |
phenoAmp
gives three measures of the amplitude of a seasonal cycle:
the range, the variance, and the median absolute deviation, along with the
mean and median to allow calculation of other statistics as well.
These measures can be restricted to a subset of the year by giving the
desired range of season numbers. This can be useful for isolating measures
of, say, the spring and autumn phytoplankton blooms in temperate waters. In
the case of a monthly time series, for example, a non-missing value is
required for every month or the result will be NA
, so using a period
shorter than one year can also help avoid any months that are typically not
covered by the sampling program. Similarly, in the case of dated
observations, a shorter period can help avoid times of sparse data. The
method for time series allows for other than monthly frequencies, but
season.range
is always interpreted as months for zoo
objects.
Note that the amplitude is sensitive to the number of samples for small
numbers. This could be a problem for zoo
objects if the sample number
is changing greatly from year to year, depending on the amplitude measure
and the underlying data distribution. So use ts
objects or make sure
that the sample number stays more or less the same over time.
tsMake
can be used to produce ts
and zoo
objects
suitable as arguments to this function.
A matrix of class ts
or zoo
with individual series for
the range, variance, median absolute deviation, mean, median and – in the
case of zoo
objects – number of samples.
Cloern, J.E. and Jassby, A.D. (2008) Complex seasonal patterns of primary producers at the land-sea interface. Ecology Letters 11, 1294–1303.
y <- sfbayChla[, "s27"] phenoAmp(y) # entire year # i.e., Jan-Jun only, which yields results for more years phenoAmp(y, c(1, 6))
y <- sfbayChla[, "s27"] phenoAmp(y) # entire year # i.e., Jan-Jun only, which yields results for more years phenoAmp(y, c(1, 6))
Finds various measures of the amplitude of the annual cycle.
Finds various measures of the phase of the annual cycle, or of some specified month range.
x |
A seasonal time series, or a class |
season.range |
A vector of two numbers specifying the season range to be considered. |
out |
The form of the output. |
... |
Additional arguments to be passed for changing integration defaults. |
phenoPhase
gives three measures of the phasing of a seasonal cycle:
the time of the maximum (Cloern and Jassby 2008), the fulcrum or
center of gravity, and the weighted mean season (Colebrook 1979). The latter
has sometimes been referred to in the literature as “centre of
gravity”, but it is not actually the same. These measures differ in their
sensitivity to changes in the seasonal pattern, and therefore also in their
susceptibility to sampling variability. The time of maximum is the most
sensitive, the weighted mean the least.
These measures can be restricted to a subset of the year by giving the
desired range of seasons. This can be useful for isolating measures of, say,
the spring and autumn phytoplankton blooms in temperate waters. In the case
of a seasonal time series, a non-missing value is required for every season
or the result will be NA
, so using a period shorter than one year can
also help avoid any seasons that are typically not covered by the sampling
program. Similarly, in the case of dated observations, a shorter period can
help avoid times of sparse data. The method for time series allows for other
than monthly frequencies, but season.range
is always interpreted as
months for zoo
objects. The method for time series requires data for
all seasons in season.range
. The method for zoo
objects will
provide a result regardless of number of sampling days, so make sure that
data are sufficient for a meaningful result.
The measures are annum-centric, i.e., they reflect the use of calendar year as the annum, which may not be appropriate for cases in which important features occur in winter and span two calendar years. Such cases can be handled by lagging the time series by an appropriate number of months, or by subtracting an appropriate number of days from the individual dates.
tsMake
can be used to produce ts
and zoo
objects
suitable as arguments to this function.
The default parameters used for the integrate
function in
phenoPhase
may fail for certain datasets. Try increasing the number
of subdivisions above its default of 100 by adding, for example,
subdivisions = 1000
to the arguments of phenoPhase
.
A data frame with columns year, time of the maximum, fulcrum,
weighted mean time and – in the case of zoo
objects – number of
observations. In the case of seasonal time series, the results are all given
as decimal seasons of the year. In the case of dated observations, the
results can be dates, day of the year, or julian day with an origin of
1970-01-01, depending on the option out
.
Cloern, J.E. and Jassby, A.D. (2008) Complex seasonal patterns of primary producers at the land-sea interface. Ecology Letters 11, 1294–1303.
Colebrook, J.M. (1979) Continuous plankton records - seasonal cycles of phytoplankton and copepods in the North Atlantic ocean and the North Sea. Marine Biology 51, 23–32.
# ts example y <- sfbayChla[, "s27"] p1 <- phenoPhase(y) p1 apply(p1, 2, sd, na.rm = TRUE) # max.time > fulcrum > mean.wt phenoPhase(y, c(3, 10)) # zoo example sfb <- wqData(sfbay, c(1, 3, 4), 5:12, site.order = TRUE, type = "wide", time.format = "%m/%d/%Y") y <- tsMake(sfb, focus = "chl", layer = c(0, 5), type = "zoo") phenoPhase(y[, "s27"])
# ts example y <- sfbayChla[, "s27"] p1 <- phenoPhase(y) p1 apply(p1, 2, sd, na.rm = TRUE) # max.time > fulcrum > mean.wt phenoPhase(y, c(3, 10)) # zoo example sfb <- wqData(sfbay, c(1, 3, 4), 5:12, site.order = TRUE, type = "wide", time.format = "%m/%d/%Y") y <- tsMake(sfb, focus = "chl", layer = c(0, 5), type = "zoo") phenoPhase(y[, "s27"])
Finds various measures of the phase of the annual cycle.
Divides the time range for a monthly time series into different eras and plots composites of seasonal pattern. Can also plot each month separately for the entire record.
plotSeason( x, type = c("by.era", "by.month"), num.era = 4, same.plot = TRUE, ylab = NULL, num.col = 3 )
plotSeason( x, type = c("by.era", "by.month"), num.era = 4, same.plot = TRUE, ylab = NULL, num.col = 3 )
x |
Monthly time series |
type |
Plot seasonal pattern by era, or each month for the entire record |
num.era |
Integer number of eras, or vector of era year breaks |
same.plot |
Should eras be plotted by month? |
ylab |
Optional character string label for y-axis |
num.col |
Number of columns when plotted |
If num.era
is an integer, the time range is divided into that many
equal eras; otherwise, the time range is divided into eras determined by the
num.era
vector of years. When plotted "by.era"
and
same.plot = FALSE
, the composite patterns are plotted in a horizontal
row for easier comparison, which limits the number of periods that can be
examined. Boxes based on fewer than half of the maximum possible years
available are outlined in red. If same.plot = TRUE
, a single plot is
produced with era boxplots arranged by month. When plotted
"by.month"
, values for each month are first converted to standardized
anomalies, i.e., by subtraction of long-term mean and division by standard
deviation. As always, and especially with these plots, experiment with the
device aspect ratio and size to get the clearest information.
A plot (and the corresponding object of class "ggplot"
).
Alan Jassby, James Cloern
chl27 <- sfbayChla[, "s27"] plotSeason(chl27, num.era = c(1978, 1988, 1998, 2008), ylab = "Stn 27 Chl-a") ## Not run: plotSeason(chl27, num.era = 3, same.plot = FALSE, ylab = "Stn 27 Chl-a") plotSeason(chl27, "by.month", ylab = "Stn 27 Chl-a") ## End(Not run)
chl27 <- sfbayChla[, "s27"] plotSeason(chl27, num.era = c(1978, 1988, 1998, 2008), ylab = "Stn 27 Chl-a") ## Not run: plotSeason(chl27, num.era = 3, same.plot = FALSE, ylab = "Stn 27 Chl-a") plotSeason(chl27, "by.month", ylab = "Stn 27 Chl-a") ## End(Not run)
Creates line plot of vector or matrix time series, including any data surrounded by NAs as additional points.
plotTs( x, dot.size = 1, xlab = NULL, ylab = NULL, strip.labels = colnames(x), ... )
plotTs( x, dot.size = 1, xlab = NULL, ylab = NULL, strip.labels = colnames(x), ... )
x |
matrix or vector time series |
dot.size |
size of dots representing isolated data points |
xlab |
optional x-axis label |
ylab |
optional y-axis label |
strip.labels |
labels for individual time series plots |
... |
additional options |
The basic time series line plot ignores data points that are adjacent to
missing data, i.e., not directly connected to other observations. This can
lead to an uninformative plot when there are many missing data. If one
includes both a point and line plot, the resulting graph can be cluttered
and difficult to decipher. plotTs
plots only isolated points as well
as lines joining adjacent observations.
Options are passed to the underlying facet_wrap
function in
ggplot2. The main ones of interest are ncol
for setting the
number of plotting columns and scales = "free_y"
for allowing the y
scales of the different plots to be independent.
A plot or plots and corresponding object of class “ggplot”.
Alan Jassby, James Cloern
# Chlorophyll at 4 stations in SF Bay chl <- sfbayChla[, 1:4] plotTs(chl, dot.size = 1.5, ylab = 'Chl-a', strip.labels = paste('Station', substring(colnames(chl), 2, 3)), ncol = 1, scales = "free_y")
# Chlorophyll at 4 stations in SF Bay chl <- sfbayChla[, 1:4] plotTs(chl, dot.size = 1.5, ylab = 'Chl-a', strip.labels = paste('Station', substring(colnames(chl), 2, 3)), ncol = 1, scales = "free_y")
Series are illustrated by vertical lines extending from individual data values to the long-term mean. The axes are not scaled in any way. Anomaly plots are useful for visualizing shifts in time series levels.
plotTsAnom(x, xlab = NULL, ylab = NULL, strip.labels = colnames(x), ...)
plotTsAnom(x, xlab = NULL, ylab = NULL, strip.labels = colnames(x), ...)
x |
matrix or vector time series |
xlab |
optional x-axis label |
ylab |
optional y-axis label |
strip.labels |
labels for individual time series plots |
... |
additional options |
Options are passed to the underlying facet_wrap
function in
ggplot2. The main ones of interest are ncol
for setting the
number of plotting columns and scales = "free_y"
for allowing the y
scales of the different plots to be independent.
A plot and corresponding object of class “ggplot”.
Alan Jassby, James Cloern
# Spring bloom size for 6 stations in SF Bay bloom <- aggregate(sfbayChla[, 1:6], 1, meanSub, sub=3:5) plotTsAnom(bloom, ylab = 'Chl-a', strip.labels = paste('Station', substring(colnames(bloom), 2, 3)), ncol = 2, scales = "free_y")
# Spring bloom size for 6 stations in SF Bay bloom <- aggregate(sfbayChla[, 1:6], 1, meanSub, sub=3:5) plotTsAnom(bloom, ylab = 'Chl-a', strip.labels = paste('Station', substring(colnames(bloom), 2, 3)), ncol = 2, scales = "free_y")
Monthly values are transformed into deciles or other bins, and corresponding colors are plotted in a month by year matrix.
plotTsTile( x, plot.title = NULL, legend.title = NULL, four = TRUE, loganom = TRUE, square = TRUE, legend = TRUE, trim = TRUE, overall = TRUE, stat = c("median", "mean") )
plotTsTile( x, plot.title = NULL, legend.title = NULL, four = TRUE, loganom = TRUE, square = TRUE, legend = TRUE, trim = TRUE, overall = TRUE, stat = c("median", "mean") )
x |
monthly time series. |
plot.title |
plot title. |
legend.title |
legend title. |
four |
logical indicating if data should be binned into 4 special groups or into deciles. |
loganom |
logical indicating if data should be transformed into log-anomalies. |
square |
logical indicating if tiles should be square. |
legend |
logical indicating if a legend should be included. |
trim |
logical indicating if leading and trailing NA values should be removed. |
overall |
determines whether anomalies are calculated with respect to overall mean or to long-term mean for the same month. |
stat |
determines whether anomalies are calculated and binned using mean or median. |
If four = TRUE
, then x
is first divided into a positive and
negative bin. Each bin is then further divided into two bins by its mean,
yielding a total of four bins. If four=FALSE
, then x
is simply
divided into deciles. In either case, each bin has its own assigned color,
with colors ranging from dark blue (smallest numbers) through light blue and
pink to red.
Although four = TRUE
can be useful for any data in which 0 represents
a value with special significance, it is especially so for data converted
into log-anomalies, i.e., log10(x/xbar)
where xbar = mean(x,
na.rm=TRUE)
. The mean month then has value 0, and a value of -1, for
example, indicates original data equal to one-tenth the mean. Log-anomaly
transforms can be particularly appropriate for biological populations, in
which variability is often approximately proportional to the mean.
When loganom = TRUE
, the anomalies are calculated with respect to the
overall mean month. This differs from, for example, the log-anomaly
zooplankton plot of O'Brien et al. (2008), in which a monthly anomaly is
calculated with respect to the mean value of the same month. To get the
latter behavior, set overall = FALSE
. A further option is to set
stat = "median"
rather than the default stat = "mean"
, in
which case xbar = median(x, na.rm = TRUE)
, and the positive and
negative bins are each divided into two bins by their median instead of
mean. Using combinations of these different options can reveal complementary
information.
You may want to set square = FALSE
and then adjust the plot window
manually if you plan to use the plot in a subsequent layout or if there is
too much white space.
An image plot of monthly values classified into either deciles or into four bins as described above (and corresponding object of class “ggplot”).
Alan Jassby, James Cloern
O'Brien T., Lopez-Urrutia A., Wiebe P.H., Hay S. (editors) (2008) ICES Zooplankton Status Report 2006/2007. ICES Cooperative Research Report 292, International Council for the Exploration of the Sea, Copenhagen, 168 p.
# plot log-anomalies in four bins chl27 = sfbayChla[, 's27'] plotTsTile(chl27, legend.title = 'Chl log-anomaly') # plot deciles plotTsTile(chl27, plot.title = 'SF Bay station 27', legend.title = 'chlorophyll', four = FALSE, loganom = FALSE, square = FALSE)
# plot log-anomalies in four bins chl27 = sfbayChla[, 's27'] plotTsTile(chl27, legend.title = 'Chl log-anomaly') # plot deciles plotTsTile(chl27, plot.title = 'SF Bay station 27', legend.title = 'chlorophyll', four = FALSE, loganom = FALSE, square = FALSE)
R2pss
R2pss(R, t, p = 0)
R2pss(R, t, p = 0)
R |
conductivity ratio, dimensionless |
t |
temperature, Celsius |
p |
gauge pressure, decibar |
Alan Jassby, James Cloern
Calculates the Seasonal or Regional Kendall test of trend significance, including an estimate of the Sen slope.
seaKen( x, plot = FALSE, type = c("slope", "relative"), order = FALSE, pval = 0.05, mval = 0.5, pchs = c(19, 21), ... )
seaKen( x, plot = FALSE, type = c("slope", "relative"), order = FALSE, pval = 0.05, mval = 0.5, pchs = c(19, 21), ... )
x |
A numeric vector, matrix or data frame made up of seasonal time series |
plot |
Should the trends be plotted when x is a matrix or data frame? |
type |
Type of trend to be plotted, actual or relative to series median |
order |
Should the plotted trends be ordered by size? |
pval |
p-value for significance |
mval |
Minimum fraction of seasons needed with non-missing slope estimates |
pchs |
Plot symbols for significant and not significant trend estimates, respectively |
... |
Other arguments to pass to plotting function |
The Seasonal Kendall test (Hirsch et al. 1982) is based on the Mann-Kendall
tests for the individual seasons (see mannKen
for additional
details). p-values provided here are not corrected for serial
correlation among seasons.
If plot = TRUE
, then either the Sen slope in units per year
(type = "slope"
) or the relative slope in fraction per year
(type = "relative"
) is plotted. The relative slope is defined
identically to the Sen slope except that each slope is divided by the first
of the two values that describe the slope. Plotting the relative slope is
useful when the variables in x
are always positive and have different
units.
The plot symbols indicate, respectively, that the trend is statistically
significant or not. The plot can be customized by passing any arguments used
by dotchart
such as xlab
, as well as graphical
parameters described in par
.
If mval
or more of the seasonal slope estimates are missing, then
that trend is considered to be missing. The seasonal slope estimate
(mannKen
), in turn, is missing if half or more of the possible
comparisons between the first and last 20% of the years are missing.
The function can be used in conjunction with mts2ts
to calculate a
Regional Kendall test of significance for annualized data, along with a
regional estimate of trend (Helsel and Frans 2006). See the examples below.
A list of the following if x
is a vector: seaKen
returns a list with the following members:
sen.slope |
Sen slope |
sen.slope.pct |
Sen slope as percent of mean |
p.value |
significance of slope |
miss |
for each season, the fraction missing of slopes connecting first and last 20% of the years |
or a
matrix with corresponding columns if x
is a matrix or data frame.
Alan Jassby, James Cloern
Helsel, D.R. and Frans, L. (2006) Regional Kendall test for trend. Environmental Science and Technology 40(13), 4066-4073.
Hirsch, R.M., Slack, J.R., and Smith, R.A. (1982) Techniques of trend analysis for monthly water quality data. Water Resources Research 18, 107-121.
# Seasonal Kendall test: chl <- sfbayChla # monthly chlorophyll at 16 stations in San Francisco Bay seaKen(sfbayChla[, 's27']) # results for a single series at station 27 seaKen(sfbayChla) # results for all stations seaKen(sfbayChla, plot=TRUE, type="relative", order=TRUE) # Regional Kendall test: # Use mts2ts to change 16 series into a single series with 16 "seasons" seaKen(mts2ts(chl)) # too many missing data # better when just Feb-Apr, spring bloom period, # but last 4 stations still missing too much. seaKen(mts2ts(chl, seas = 2:4)) seaKen(mts2ts(chl[, 1:12], 2:4)) # more reliable result
# Seasonal Kendall test: chl <- sfbayChla # monthly chlorophyll at 16 stations in San Francisco Bay seaKen(sfbayChla[, 's27']) # results for a single series at station 27 seaKen(sfbayChla) # results for all stations seaKen(sfbayChla, plot=TRUE, type="relative", order=TRUE) # Regional Kendall test: # Use mts2ts to change 16 series into a single series with 16 "seasons" seaKen(mts2ts(chl)) # too many missing data # better when just Feb-Apr, spring bloom period, # but last 4 stations still missing too much. seaKen(mts2ts(chl, seas = 2:4)) seaKen(mts2ts(chl[, 1:12], 2:4)) # more reliable result
Calculates the Seasonal Kendall test of significance, including an estimate of the Sen slope, for rolling windows over a time series.
seaRoll( x, w = 10, plot = FALSE, pval = 0.05, mval = 0.5, pchs = c(19, 21), xlab = NULL, ylab = NULL, ... )
seaRoll( x, w = 10, plot = FALSE, pval = 0.05, mval = 0.5, pchs = c(19, 21), xlab = NULL, ylab = NULL, ... )
x |
A seasonal time series vector. |
w |
The window width for “rolling” estimates of slope. |
plot |
Indicates if a plot should be drawn |
pval |
p-value for significance |
mval |
Minimum fraction of seasons needed with non-missing slope estimates |
pchs |
Plot symbols for significant and not significant trend estimates, respectively |
xlab |
Optional label for x-axis |
ylab |
Optional label for y-axis |
... |
Other arguments to pass to plotting function |
The function seaRoll
applies seaKen
to rolling time windows of
width w
. A minimum w
of five years is required. For any
window, a season is considered missing if half or more of the possible
comparisons between the first and last 20% of the years is missing. If
mval
or more of the seasons are missing, then that windowed trend is
considered to be missing.
If plot = TRUE
, a point plot will be drawn with the Sen slope plotted
at the leading year of the trend window. The plot symbols indicate,
respectively, that the trend is significant or not significant. The plot can
be customized by passing any arguments used by plot.default
,
as well as graphical parameters described in par
.
seaRoll
returns a matrix with one row per time window
containing the Sen slope, the relative Sen slope, and the p-value.
Rows are labelled with the leading year of the window.
Alan Jassby, James Cloern
chl27 <- sfbayChla[, 's27'] seaRoll(chl27) seaRoll(chl27, plot = TRUE)
chl27 <- sfbayChla[, 's27'] seaRoll(chl27) seaRoll(chl27, plot = TRUE)
Finds the trend for each season and each variable in a time series.
seasonTrend(x, plot = FALSE, type = c("slope", "relative"), pval = 0.05, ...)
seasonTrend(x, plot = FALSE, type = c("slope", "relative"), pval = 0.05, ...)
x |
Time series vector, or time series matrix with column names |
plot |
Should the results be plotted? |
type |
Type of trend to be plotted, actual or relative to series median |
pval |
p-value for significance |
... |
Further options to pass to plotting function |
The Mann-Kendall test is applied for each season and series (in the case of
a matrix). The actual and relative Sen slope (actual divided by median for
that specific season and series); the p-value for the trend; and the
fraction of missing slopes involving the first and last fifths of the data
are calculated (see mannKen
).
If plot = TRUE
, each season for each series is represented by a bar
showing the trend. The fill colour indicates whether or not.
If the fraction of missing slopes is 0.5 or more, the corresponding trends
are omitted.
Parameters can be passed to the plotting function, in particular, to
facet_wrap
in ggplot2. The most useful parameters here are
ncol
(or nrow
), which determines the number of columns (or
rows) of plots, and scales
, which can be set to "free_y"
to
allow the y-axis to change for each time series. Like all ggplot2
objects, the plot output can also be customized extensively by modifying and
adding layers.
A data frame with the following fields:
series |
series names |
season |
season number |
sen.slope |
Sen slope in original units per year |
sen.slope.rel |
Sen slope divided by median for that specific season and series |
p |
p-value for the trend according to the Mann-Kendall test. |
missing |
Proportion of slopes joining first and last fifths of the data that are missing |
Alan Jassby, James Cloern
mannKen
, plotSeason
,
facet_wrap
x <- sfbayChla seasonTrend(x) seasonTrend(x, plot = TRUE, ncol = 4)
x <- sfbayChla seasonTrend(x) seasonTrend(x, plot = TRUE, ncol = 4)
Selected observations and variables from U.S. Geological Survey water quality stations in south San Francisco Bay. Data include CTD and nutrient measurements.
sfbay
is a data frame with 23207 observations (rows) of 12
variables (columns):
[, 1] |
date |
date |
[, 2] |
time |
time |
[, 3] |
stn |
station code |
[, 4] |
depth |
measurement depth |
[, 5] |
chl |
chlorophyll a |
[, 6] |
dox.pct
|
dissolved oxygen |
[, 7] |
spm |
suspended particulate matter |
[, 8] |
ext |
extinction coefficient |
[, 9] |
sal |
salinity |
[, 10]
|
temp |
water temperature |
[, 11] |
nox
|
nitrate + nitrite |
[, 12] |
nhx |
ammonium |
sfbayStns
is a data frame with 16 observations of 6 variables:
[, 1] |
site |
station code |
[,
2] |
description |
station description |
[, 3] |
lat |
latitude |
[, 4] |
long |
longitude |
[, 5] |
depthMax |
maximum depth, in m |
[, 6]
|
distFrom36 |
distance from station 36, in km |
sfbayVars
is a data frame with 7 observations of 3 variables:
[, 1] |
variable |
water quality variable code |
[, 2] |
description |
description |
[,
3] |
units |
measurement units |
sfbayChla
is a time series matrix (380 months x
16 stations)
of average 0-5 m chlorophyll a concentrations calculated from the
data in sfbay
.
The original downloaded dataset was modified by taking a subset of six
well-sampled stations and the period 1985–2004. Variable names were also
simplified. The data frames sfbayStns
and sfbayVars
describe
the stations and water quality variables in more detail; they were created
from information at the same web site. Note that the station numbers in
sfbayStns
have been prefixed with s
to make station codes into
legal variable names. sfbayChla
was constructed from the entire
downloaded sfbay dataset and encompasses the period 1969–2009.
Downloaded from https://sfbay.wr.usgs.gov/water-quality-database/ on 2009-11-17.
data(sfbay) str(sfbay) str(sfbayStns) str(sfbayVars) plot(sfbayChla[, 1:10], main = "SF Bay Chl-a")
data(sfbay) str(sfbay) str(sfbayStns) str(sfbayVars) plot(sfbayChla[, 1:10], main = "SF Bay Chl-a")
Tests for homogeneity of seasonal trends using method proposed by van Belle
and Hughes (1984). Seasons with insufficient data as defined in
mannKen
are ignored.
trendHomog(x)
trendHomog(x)
x |
A vector time series with frequency > 1 |
chisq.trend |
"Trend" chi-square. |
chisq.homog |
"Homogeneous" chi-square. |
p.value |
For null hypothesis that trends are homogeneous. |
n |
Number of seasons used. |
Alan Jassby, James Cloern
van Belle, G. and Hughes, J.P. (1984) Nonparametric tests for trend in water quality. Water Resources Research 20, 127-136.
## Apply to a monthly vector time series to test homogeneity ## of seasonal trends. x <- sfbayChla[, 's27'] trendHomog(x)
## Apply to a monthly vector time series to test homogeneity ## of seasonal trends. x <- sfbayChla[, 's27'] trendHomog(x)
Convert monthly time series vector to a year x
month data frame for
several possible subsequent analyses. Leading and trailing empty rows are
removed.
ts2df(x, mon1 = 1, addYr = FALSE, omit = FALSE)
ts2df(x, mon1 = 1, addYr = FALSE, omit = FALSE)
x |
monthly time series vector |
mon1 |
starting month number, i.e., first column of the data frame |
addYr |
rows are normally labelled with the year of the starting month,
but |
omit |
if |
Our main use of ts2df
is to convert a single monthly time series into
a year x
month data frame for EOF analysis of interannual
variability.
monthCor
finds the month-to-month correlations in a monthly time
series x
. It is useful for deciding where to start the 12-month
period for an EOF
analysis (mon1
in ts2df
), namely, at
a time of low serial correlation in x
.
An n x 12
data frame, where n
is the number of years.
Alan Jassby, James Cloern
Craddock, J. (1965) A meteorological application of principal component analysis. Statistician 15, 143–156.
# San Francisco Bay station 27 chlorophyll has the lowest serial # correlation in Oct-Nov, with Sep-Oct a close second chl27 <- sfbayChla[, 's27'] monthCor(chl27) # Convert to a data frame with October, the first month of the # local "water year", in the first column tsp(chl27) chl27 <- round(chl27, 1) ts2df(chl27, mon1 = 10, addYr = TRUE) ts2df(chl27, mon1 = 10, addYr = TRUE, omit = TRUE)
# San Francisco Bay station 27 chlorophyll has the lowest serial # correlation in Oct-Nov, with Sep-Oct a close second chl27 <- sfbayChla[, 's27'] monthCor(chl27) # Convert to a data frame with October, the first month of the # local "water year", in the first column tsp(chl27) chl27 <- round(chl27, 1) ts2df(chl27, mon1 = 10, addYr = TRUE) ts2df(chl27, mon1 = 10, addYr = TRUE, omit = TRUE)
Creates a matrix time series object from an object of class "WqData"
,
either all variables for a single site or all sites for a single variable.
object |
Object of class |
focus |
Name of a site or water quality variable. |
layer |
Number specifying a single depth; a numeric vector of length 2
specifying top and bottom depths of layer; a list specifying multiple depths
and/or layers; or just the string |
type |
|
qprob |
quantile probability, a number between 0 and 1. |
When qprob = NULL
, the function averages all included depths for each
day, the implicit assumption being that the layer is well-mixed and/or the
samples are evenly distributed with depth in the layer. If layer =
"max.depths"
, then only the value at the maximum depth for each time, site
and variable combination will be used. If no layer is specified, all depths
will be used.
The function produces a matrix time series of all variables for the
specified site or all sites for the specified variable. If type =
"ts.mon"
, available daily data are averaged to produce a monthly time
series, from which a quarterly or annual series can be created if needed. If
you want values for the actual dates of observation, then set type =
"zoo"
.
When qprob
is a number from 0 to 1, it is interpreted as a
probability and the corresponding quantile is used to aggregate observations
within the specified layer. So to get the maximum, for example, use qprob =
1. If type = "ts.mon"
, the same quantile is used to aggregate all the
available daily values.
A matrix of class "mts"
or "zoo"
.
The layer list is allowed to include negative numbers, which may have
been used in the WqData
object to denote variables that apply to the
water column as a whole, such as, say, -1 for light attenuation coefficient.
This enables focus = 's27'
and layer = list(-1, c(0, 5))
to
produce a time series matrix for station 27 that includes both attenuation
coefficient and chlorophyll averaged over the top 5 m. Negative numbers may
also have been used in the WqData
object to identify qualitative
depths such as “near bottom”, which is not uncommon in historical
data sets. So data from such depths can be aggregated easily with other data
to make these time series.
# Create new WqData object sfb <- wqData(sfbay, c(1, 3:4), 5:12, site.order = TRUE, time.format = "%m/%d/%Y", type = "wide") # Find means in the 0-10 m layer y <- tsMake(sfb, focus = "s27", layer = c(0, 10)) plot(y, main = "Station 27") # Or select medians in the same layer y1 <- tsMake(sfb, focus = "s27", layer = c(0, 10), qprob = 0.5) plot(y1, main = "Station 27") # Compare means:medians apply(y / y1, 2, mean, na.rm = TRUE) # Combine a layer with a single additional depth y <- tsMake(sfb, focus = "chl", layer = list(c(0, 2), 5)) plot(y, main = "Chlorophyll a, ug/L") # Use values from the deepest samples y <- tsMake(sfb, focus = "dox", layer = "max.depths", type = "zoo") head(y) plot(y, type = "h", main = "'Bottom' DO, mg/L")
# Create new WqData object sfb <- wqData(sfbay, c(1, 3:4), 5:12, site.order = TRUE, time.format = "%m/%d/%Y", type = "wide") # Find means in the 0-10 m layer y <- tsMake(sfb, focus = "s27", layer = c(0, 10)) plot(y, main = "Station 27") # Or select medians in the same layer y1 <- tsMake(sfb, focus = "s27", layer = c(0, 10), qprob = 0.5) plot(y1, main = "Station 27") # Compare means:medians apply(y / y1, 2, mean, na.rm = TRUE) # Combine a layer with a single additional depth y <- tsMake(sfb, focus = "chl", layer = list(c(0, 2), 5)) plot(y, main = "Chlorophyll a, ug/L") # Use values from the deepest samples y <- tsMake(sfb, focus = "dox", layer = "max.depths", type = "zoo") head(y) plot(y, type = "h", main = "'Bottom' DO, mg/L")
Creates a matrix of observations indexed by time.
tsSub
tsSub(x1, seas = 1:frequency(x1))
tsSub(x1, seas = 1:frequency(x1))
x1 |
ts |
seas |
numeric |
Alan Jassby, James Cloern
wqData
is a constructor for the "WqData"
class that is often
more convenient to use than new
. It converts a data.frame containing
water quality data in “long” or “wide” format to a
"WqData"
object. In “long” format, observations are all in one
column and a second column is used to designate the variable being observed.
In “wide” format, observations for each variable are in a separate
column.
wqData( data, locus, wqdata, site.order, time.format = "%Y-%m-%d", type = c("long", "wide") )
wqData( data, locus, wqdata, site.order, time.format = "%Y-%m-%d", type = c("long", "wide") )
data |
Data frame containing water quality data. |
locus |
Character or numeric vector designating column names or
numbers, respectively, in |
wqdata |
In the case of “long” data, character or numeric vector
designating column names or numbers, respectively, in |
site.order |
If |
time.format |
Conversion specification for |
type |
Either “long” or “wide” |
If the data are already in long format, the function has little to do but
rename the data fields. If in wide format, the reshape2 package is
called to melt
the data. The function also removes NA
observations, converts site
to (possibly ordered) factors with valid
variable names, and converts time
to class "Date"
or
"POSIXct"
and ISO 8601 format, depending on time.format
.
An object of class "WqData"
.
Alan Jassby, James Cloern
International Organization for Standardization (2004) ISO 8601. Data elements and interchange formats - Information interchange - Representation of dates and times.
as.Date
, strptime
,
WqData-class
## Not run: # Create new WqData object from sfbay data. First combine date and time # into a single string after making sure that all times have 4 digits. sfb <- within(sfbay, time <- substring(10000 + time, 2, 5)) sfb <- within(sfb, time <- paste(date, time, sep = ' ')) sfb <- wqData(sfb, 2:4, 5:12, site.order = TRUE, type = "wide", time.format = "%m/%d/%Y %H%M") head(sfb) tail(sfb) # If time of day were not required, then the following would suffice: sfb <- wqData(sfbay, c(1,3,4), 5:12, site.order = TRUE, type = "wide", time.format = "%m/%d/%Y") ## End(Not run)
## Not run: # Create new WqData object from sfbay data. First combine date and time # into a single string after making sure that all times have 4 digits. sfb <- within(sfbay, time <- substring(10000 + time, 2, 5)) sfb <- within(sfb, time <- paste(date, time, sep = ' ')) sfb <- wqData(sfb, 2:4, 5:12, site.order = TRUE, type = "wide", time.format = "%m/%d/%Y %H%M") head(sfb) tail(sfb) # If time of day were not required, then the following would suffice: sfb <- wqData(sfbay, c(1,3,4), 5:12, site.order = TRUE, type = "wide", time.format = "%m/%d/%Y") ## End(Not run)
A simple extension or subclass of the "data.frame"
class for typical
“discrete” water quality monitoring programs that examine phenomena
on a time scale of days or longer. It requires water quality data to be in a
specific “long” format, although a generating function
wqData
can be used for different forms of data.
Objects can be created by calls of the form
new("WqData", d)
, where d
is a data.frame. d
should
have columns named time, site, depth, variable, value
of class
"DateTime", "factor", "numeric", "factor", "numeric"
, respectively.
DateTime-class
, tsMake,WqData-method
,
wqData
showClass("WqData") # Construct the WqData object sfb as shown in the wqData examples. sfb <- wqData(sfbay, c(1, 3, 4), 5:12, site.order = TRUE, type = "wide", time.format = "%m/%d/%Y") # Summarize the data summary(sfb) # Create boxplot summary of data plot(sfb, vars = c("chl", "dox", "spm"), num.col = 2) # Extract some of the data as a WqData object sfb[1:10, ] # first 10 observations sfb[sfb$depth == 20, ] # all observations at 20 m
showClass("WqData") # Construct the WqData object sfb as shown in the wqData examples. sfb <- wqData(sfbay, c(1, 3, 4), 5:12, site.order = TRUE, type = "wide", time.format = "%m/%d/%Y") # Summarize the data summary(sfb) # Create boxplot summary of data plot(sfb, vars = c("chl", "dox", "spm"), num.col = 2) # Extract some of the data as a WqData object sfb[1:10, ] # first 10 observations sfb[sfb$depth == 20, ] # all observations at 20 m
A variety of small utilities used in other functions.
d |
A numeric matrix or data frame with depth in the first column and observations for some variable in each of the remaining columns. |
na.rm |
Should missing data be removed? |
seas |
An integer vector of seasons to be retained. |
sub |
An integer vector. |
w |
A vector of class |
x |
A numeric vector. |
x1 |
A matrix or vector time series. |
y |
A vector of class |
date2decyear
: Converts object of class "Date"
to decimal year
assuming time of day is noon.
decyear2date
: Converts decimal year to object of class "Date"
.
layerMean
: Acts on a matrix or data frame with depth in the first
column and observations for different variables (or different sites, or
different times) in each of the remaining columns. The trapezoidal mean over
the given depths is calculated for each of the variables. Replicate depths
are averaged, and missing values or data with only one unique depth are
handled. Data are not extrapolated to cover missing values at the top or
bottom of the layer. The result can differ markedly from the simple mean
even for equal spacing of depths, because the top and bottom values are
weighted by 0.5 in a trapezoidal mean.
leapYear
: TRUE
if x
is a leap year, FALSE
otherwise.
meanSub
: Mean of a subset of a vector.
monthNum
: Converts dates to the corresponding numeric month.
tsSub
: Drops seasons from a matrix or vector time series.
years
: Converts dates to the corresponding numeric years.
dates <- as.Date(c("1996-01-01", "1999-12-31", "2004-02-29", "2005-03-01")) date2decyear(dates) decyear2date(c(1996.0014, 1999.9986, 2004.1626, 2005.1630)) z <- c(1, 2, 3, 5, 10) # 5 depths x <- matrix(rnorm(30), nrow = 5) # 6 variables at 5 depths layerMean(cbind(z, x)) leapYear(seq(1500, 2000, 100)) leapYear(c(1996.9, 1997)) ## Aggregate monthly time series over Feb-Apr only. aggregate(sfbayChla, 1, meanSub, sub = 2:4) monthNum(as.Date(c("2007-03-17", "2003-06-01"))) ## Ignore certain seasons in a Seasonal Kendall test. c27 <- sfbayChla[, "s27"] seaKen(tsSub(c27)) # Aug and Dec missing the most key data seaKen(tsSub(c27, seas = c(1:7, 9:11))) y <- Sys.time() years(y)
dates <- as.Date(c("1996-01-01", "1999-12-31", "2004-02-29", "2005-03-01")) date2decyear(dates) decyear2date(c(1996.0014, 1999.9986, 2004.1626, 2005.1630)) z <- c(1, 2, 3, 5, 10) # 5 depths x <- matrix(rnorm(30), nrow = 5) # 6 variables at 5 depths layerMean(cbind(z, x)) leapYear(seq(1500, 2000, 100)) leapYear(c(1996.9, 1997)) ## Aggregate monthly time series over Feb-Apr only. aggregate(sfbayChla, 1, meanSub, sub = 2:4) monthNum(as.Date(c("2007-03-17", "2003-06-01"))) ## Ignore certain seasons in a Seasonal Kendall test. c27 <- sfbayChla[, "s27"] seaKen(tsSub(c27)) # Aug and Dec missing the most key data seaKen(tsSub(c27, seas = c(1:7, 9:11))) y <- Sys.time() years(y)
Registration of S3 class "zoo"
as a formally defined class. Used here
to allow the "zoo"
class to appear in method signatures.
A virtual Class: No objects may be created from it.
showClass("zoo")
showClass("zoo")