Hypothesis testing with Phenospex data

Introduction

Whenever doing plant phenotyping experiments, as researchers you may want to get clear answers to particular questions: Which plant cultivar performs better? Which plant genotypes deal best with salt- or drought-stress? What fertilizer product results in highest performance boost to plant growth? What pesticide has the least impact on plant performance? In order to get answers to these questions, researchers typically test for statistical significance, which is often used to prove or disprove a particular hypothesis (also known as hypothesis testing). This R-script was written to showcase how such hypothesis testing can be carried out (with p-value < 0.05 as threshold for statistical significance). In order to procure a small test dataset, a salt-stress intervention experiment was performed with 8 basil plants. The experiment started in the morning of the 3rd of February (2021), and at 11:00h plants were given a salt stress treatment through a regular watering: Three plants were given a watering with half-saturated (20g salt/100ml water) salt solution, three plants were given a watering with saturated (40g salt/100ml water) salt solution and two plants were used as control (watering with plain tapwater). Every 10 minutes, all plants were scanned using a Phenospex DualScan TraitFinder that held two PlantEye F500s. After the experiment had finished, the CSV spreadsheet data was downloaded from the HortControl user interface. A very basic analysis of the data within the CSV spreadsheet can be seen below, and can serve as a proof of concept for data acquisition, importing data into R, timestamp data conversion, figure creation, data filtering and hypothesis testing (t-test and ANOVA).

Disclaimer

R Statistical Software is great open source software that is supported by a vast community of researchers worldwide. As this is not software developed by the Phenospex Team, the Phenospex Service Team does not provide support for installing R or for working in R Studio. We are however open to ideas or collaborations. In order to contact us for ideas or collaborations, please send an email to support@phenospex.com

Load libraries

library(data.table) # The data.table package is great for importing large data files and for further processing of data within R Studio.
library(ggplot2) # The ggplot packages are great packages for creating figures for i.e. visualization purposes

Import CSV file into R

The CSV file can be downloaded from the PSX-Data tab in the HortControl interface, and contains all sorts of numerical data. This numerical data, containing morphological and spectral parameters, is calculated from the digital scans that the PlantEyes generate. These calculations, that lead to values for Digital Biomass, height, etc. and average Hue, NDVI, etc., are performed by Phenospex’s proprietary Phena chain.

Please note: For importing of the CSV file, you will have to make sure that your CSV file is present in the currently active working directory of R Studio, otherwise R Studio will not be able to find your CSV data file.

# Importing of the CSV data into R Studio (in UTF-8 format!):
dataframe1 = fread('Basil_Salt_Treatment_MPE_20210203_20210317_planteye.csv', encoding = "UTF-8")

# Convert timestamp values to standard time class (with which functions of the ggplot package work  excellently):
dataframe1$timestamp = as.POSIXct(dataframe1$timestamp)

Create overview figure

R Studio is a great tool to visualize data by creating all sorts of figures. As a first example, and to get a good overview of our data, we create a graph of Digital Biomass over time, in which each of the plants is given a unique color (each unit contains one basil plant). Data points are additionally shaped by treatment (circles, triangles and squares).

ggplot(dataframe1,aes(x=timestamp,y=`Digital biomass [mm³]`,color=unit, shape=treatment))+
  geom_line()+
  geom_point(size=3)+
  ylab('Digital biomass [mm³]')+
  xlab('Timestamp')+
  theme_minimal()

Filter samples and create boxplot

In order to investigate whether there are any differences in Digital Biomass between treatment groups prior to the actual treatment, it makes sense to first visualize the Digital Biomass of some early measurements by creating a boxplot and then do some hypothesis testing (i.e. Student’s T-test).

# In our example, we filter out the first two time points by filtering out a time range that holds the first two scans:
dataframe2 = dataframe1[timestamp%between%c('2021-02-03 09:11:25','2021-02-03 09:21:30')]
# Please note that two time points had to be filtered out, as otherwise the control group would hold just two data points, which would have been hard to make boxplots with or do statistics with. This is for sure not the most elegant or correct approach. In addition please note that apart from filtering by time (as in this example), any of the other (meta-)data columns can be used to filter by.

# Basic boxplot using R's built-in functionality
boxplot(dataframe2$`Digital biomass [mm³]` ~ dataframe2$treatment, col="#00A07D", xlab = "Treatment", ylab = "Digital biomass [mm³]")

Perform t-tests on first scans

Standard t-tests are performed to check whether there were statistically significant differences in the treatment groups prior to the actual treatment.

# Creating data objects using standard R functionality:
Control_data_standard_R <- dataframe2$`Digital biomass [mm³]`[dataframe2$treatment=="Control"]
Salt0.5_data_standard_R <- dataframe2$`Digital biomass [mm³]`[dataframe2$treatment=="Salt0.5"]
Salt1_data_standard_R <- dataframe2$`Digital biomass [mm³]`[dataframe2$treatment=="Salt1"]

# Alternatively, data objects can be created using the data.table package, which is a powerful data handling tool! The syntax is then as follows:
Control_data <- dataframe2[treatment=="Control", (`Digital biomass [mm³]`)]
Salt0.5_data <- dataframe2[treatment=="Salt0.5", (`Digital biomass [mm³]`)]
Salt1_data <- dataframe2[treatment=="Salt1", (`Digital biomass [mm³]`)]

# T-test on Control group versus Salt0.5 group:
t.test(Control_data, Salt0.5_data)

## 
##  Welch Two Sample t-test
## 
## data:  Control_data and Salt0.5_data
## t = -0.10639, df = 7.8097, p-value = 0.918
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -653580.8  596169.1
## sample estimates:
## mean of x mean of y 
##   3067958   3096663

# Resulting p-value = 0.918, so no statistically significant differences between treatment groups

# Control group versus Salt1 group:
t.test(Control_data, Salt1_data)

## 
##  Welch Two Sample t-test
## 
## data:  Control_data and Salt1_data
## t = 2.4969, df = 3.9859, p-value = 0.06721
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -45106.75 839811.75
## sample estimates:
## mean of x mean of y 
##   3067958   2670605

# p-value = 0.067, so no statistically significant differences between treatment groups (when using a p-value threshold of 0.05)

Please note that for hypothesis testing, many other tests than t-tests are possible when working in R Studio. Think about one-way or two-way ANOVA, MANOVA, permutation hypothesis testing, and many more. Additionally, R statistical software also allows for testing of conditions, such as testing for a normal distribution of your dataset (normality testing). For now, to keep things simple, we just used the t.test function, which is part of R’s built-in functionality. For the sake of practice, and because t-tests are not really useful for multi-group testing, we will show how the same analysis can be done using a one-way ANOVA:

ANOVA testing on first scans

one.way <- aov(`Digital biomass [mm³]` ~ treatment, data = dataframe2)
summary(one.way)

##             Df    Sum Sq   Mean Sq F value Pr(>F)
## treatment    2 6.465e+11 3.233e+11   2.215  0.149
## Residuals   13 1.898e+12 1.460e+11

# p-value = 0.149, so as with the separate t-tests, no statistically significant differences between treatment groups prior to treatment.

Filter samples and create boxplot

After the treatment, it again makes sense to visualize the data by creating a boxplot, after which the final hypothesis test is done.

# Filter out last two time points (more than three hours after the treatment):
dataframe3 = dataframe1[timestamp%between%c('2021-02-03 14:33:28','2021-02-03 14:37:33')]

# Basic boxplot
boxplot(dataframe3$`Digital biomass [mm³]` ~ dataframe3$treatment, col="#00A07D", xlab = "Treatment", ylab = "Digital biomass [mm³]")

Perform t-tests on last scans

Standard t-tests are performed to check whether there are statistically significant differences in the treatment groups after the treatment: This is the primary question of the experiment.

# Creating data objects:
Control_data <- dataframe3$`Digital biomass [mm³]`[dataframe3$treatment=="Control"]
Salt0.5_data <- dataframe3$`Digital biomass [mm³]`[dataframe3$treatment=="Salt0.5"]
Salt1_data <- dataframe3$`Digital biomass [mm³]`[dataframe3$treatment=="Salt1"]

# Control group versus Salt0.5 group:
t.test(Control_data, Salt0.5_data)

## 
##  Welch Two Sample t-test
## 
## data:  Control_data and Salt0.5_data
## t = 35.456, df = 5.7379, p-value = 6.077e-08
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  2584203 2971928
## sample estimates:
## mean of x mean of y 
## 2994830.0  216764.7

# p-value = 6.077e-08, so statistically significant differences between treatment groups

# Control group versus Salt1 group:
t.test(Control_data, Salt1_data)

## 
##  Welch Two Sample t-test
## 
## data:  Control_data and Salt1_data
## t = 38.378, df = 4.8866, p-value = 2.986e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  2647942 3030995
## sample estimates:
## mean of x mean of y 
## 2994830.0  155361.6

# p-value = 2.986e-07, so statistically significant differences between treatment groups

# Salt0.5 group versus Salt1 group:
t.test(Salt0.5_data, Salt1_data)

## 
##  Welch Two Sample t-test
## 
## data:  Salt0.5_data and Salt1_data
## t = 1.0727, df = 9.6047, p-value = 0.3096
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -66856.91 189662.98
## sample estimates:
## mean of x mean of y 
##  216764.7  155361.6

# p-value = 0.3096, so no statistically significant differences between salt treatment groups

ANOVA testing on last scans

one.way <- aov(`Digital biomass [mm³]` ~ treatment, data = dataframe3)
summary(one.way)

##             Df    Sum Sq   Mean Sq F value   Pr(>F)    
## treatment    2 2.368e+13 1.184e+13    1038 4.59e-15 ***
## Residuals   13 1.483e+11 1.141e+10                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# p-value = 4.59e-15, so as with the t-tests, there are statistically significant differences in digital biomass between treatment groups

Conclusion

If you give your plants too much salt in solution, they will lose Digital biomass rapidly: Within four hours (and possibly quite a bit earlier when looking at the overview graph), digital biomasses of plants within the salt treatment groups were statistically significantly lower than the digital biomasses of plants within the control group (p-value < 3e-07). This is in line with earlier work by the famous Dutch botanist Prof. Hugo de Vries, who found that turgor can be lost rapidly due to osmosis (Untersuchungen über die mechanischen Ursachen der Zellstreckung, ausgehend von der einwirkung von Salzlösungen auf den Turgor wachsender Pflanzenzellen, Leipzig, 1877). Link to article.

Hypothesis testing with Phenospex data

Hugo de Vries

15-September-2021

Introduction

Disclaimer

Load libraries

Import CSV file into R

Create overview figure

Filter samples and create boxplot

Perform t-tests on first scans

ANOVA testing on first scans

Filter samples and create boxplot

Perform t-tests on last scans

ANOVA testing on last scans

Conclusion