We previously discussed the Wilcoxon-Mann-Whitney test and the Wilcoxon-Mann-Whitney odds (WMWodds) test for comparing non-normally distributed data between two groups. We now discuss another nonparametric test, the Kruskal-Wallis test, otherwise known as the Kruskal-Wallis one-way analysis of variance by ranks, for comparing non-normal data from three or more groups. (Image source: Thinkstock)

We previously discussed the Wilcoxon-Mann-Whitney test and the Wilcoxon-Mann-Whitney odds (WMWodds) test for comparing non-normally distributed data between two groups. We now discuss another nonparametric test, the Kruskal-Wallis test, otherwise known as the Kruskal-Wallis one-way analysis of variance by ranks, for comparing non-normal data from three or more groups. (Image source: Thinkstock)

In their article “Minimal Current Intensity to Elicit an Evoked Motor Response Cannot Discern Between Needle-Nerve Contact and Intraneural Needle Insertion” published in this month’s issue of Anesthesia & Analgesia, Dr. Thomas Weismann, Department of Anesthesiology and Intensive Care Therapy, Philipps University Marburg, University Hospital Giessen-Marburg, Marburg, Germany, and colleagues examined the minimal stimulation current (MSC) that elicited an evoked motor response (EMR) at three needle positions (the needle-nerve contact position, the intraneural position, and 1 mm distance to the nerve, i.e., the control position) under three pulse duration settings applied at random (0.1, 0.3, or 1.0 milliseconds) in six anesthetized pigs whose brachial plexus nerves were surgically exposed. Their results suggest that minimal current intensity to elicit a motor response cannot reliably distinguish between a needle-nerve contact and an intraneural needle insertion.

Because of non-normality of the MSC intensity distributions, the authors presented their data as median (interquartile range) and applied a series of nonparametric statistical tests for group comparisons. Some statistical terms, such as the Wilcoxon-Mann-Whitney test and the Hodges and Lehmann estimator, should sound familiar, since both had been covered in an earlier blog post. Some terms, however, were not covered previously. For example, the nonparametric Kruskal-Wallis test was performed when the authors tried to identify potential individual interferences by the animals. In other words, was there any difference of primary endpoint (i.e., pairwise differences of MSC between 3 needletip positions) among the six animals? So, what is the Kruskal-Wallis test?

The Kruskal-Wallis test is an extension of the Wilcoxon-Mann-Whitney test to three or more independent groups. As the nonparametric alternative to the parametric analysis of variance (ANOVA) method, it is often used to test the null hypothesis that all independent samples have identical distribution functions against the alternative hypothesis that at least two of the samples are different. It is notable that a significant Kruskal-Wallis test result does not identify how many differences occur or where the differences occur. That is, the multiple or pairwise group comparisons are still needed, e.g., by applying the Wilcoxon-Mann-Whitney test.

Using the Jonckheere-Terpstra test, the authors reported that the increasing duration of a pulse (0.1 ms vs. 0.3 ms vs. 1 ms) had a significant negative influence on MSC intensity, with a shorter duration resulting in higher MSC intensity (i.e., the median MSC for the 0.1 ms pulse duration ≥ the median of MSC for the 0.3 ms pulse duration ≥ the median of MSC for the 1 ms pulse duration). Like the Kruskal-Wallis test, the Jonckheere-Terpstra test has the same null hypothesis that independent samples are from the same population; however, it has a more specific alternative hypothesis that requires a priori ordering (ascending or descending) of the populations where the samples are drawn. For example, at each needle location, the authors compared three groups of MSC intensities measured under increasing pulse durations (0.1 ms, 0.3 ms, and 1 ms). When there is an a priori ordering, the Jonckheere-Terpstra test will have higher statistical power than the Kruskal-Wallis test. With a more flexible hypothesis, though, the Kruskal-Wallis test is more frequently applied in the literature.

It is well known that the multiple comparisons, or multiplicity, must be controlled in order to avoid an inflated false positive error rate (i.e., Type I error rate). In their work, the authors described both a Bonferroni adjustment (i.e., 0.05/total number of all tests, most conservative) and a customized closed testing procedure (less conservative) that incurred a final significance level of 0.0167 (i.e., alpha = 0.05/3). This might contribute to the fact that 98.33% (= (1-0.0167) *100) confidence intervals (CIs) instead of the regular 95% (= (1-0.05)*100) CIs of effect estimates were computed and reported by the authors (see their table 2).

Furthermore, it is also worth noting that the authors computed the Hodges-Lehmann estimators (i.e., the median difference and its CIs) in their pairwise comparisons of MSC intensity using the Wilcoxon-Mann-Whitney tests. As we discussed before, when two groups are assumed to have the same shape (note that the authors measured their MSC intensities at the different needle locations in the same animals [see their table 1]) and data were ranked, the Hodges-Lehmann estimators could be insightful in being used to present the effect size. However, as we had previously emphasized, when the medians of two groups are equal while Wilcoxon-Mann-Whitney test could be statistically significant, the Hodges-Lehmann estimator would be inappropriate to use or must be interpreted cautiously.  In that regard, the Wilcoxon-Mann-Whitney odds (WMWodds) would be the right statistic to use for reporting the effect size when comparing two independent groups.

To provide some guides on how to choose the appropriate test for the purpose of simple statistical comparison, we prepared a quick reference that describes a battery of nonparametric and parametric tests of significance (Table). This reference table, however, does not cover the test (e.g., log-rank test) for survival time data or regression methods (e.g., linear or logistic or Cox regression) for more advanced statistical modeling. When necessary, we will cover these topics in future blog posts.

Table:  Parametric and nonparametric tests of significance

Goal

Parametric tests

Nonparametric tests

Normal data

Non-normal data

(Rank, scale, etc.)

Binomial data

(Two possible outcomes)

Compare one group to a hypothetical value One group t-test Wilcoxon signed rank test One sample proportion test, Chi-square goodness of fit test
Compare two unrelated groups Unpaired t-test Wilcoxon-Mann-Whitney test Chi-square test, or Fisher exact test
Compare two related groups Paired t-test Wilcoxon signed rank test McNemar’s test
Compare three or more unrelated groups Analysis of variance (ANOVA) Kruskal-Wallis test,Jonckheere-Terpstra test* Chi-square test, or Fisher exact test
Compare three or more related groups Repeated measures ANOVA Friedman test Cochrane Q test

*: The Jonckheere-Terpstra test has a specific alternative hypothesis that requires a priori ordering of the populations where the samples are drawn.