T zero dark thirty meaning

The mean would have dropped by approximately 721700/3000, which is about 240. Suppose that the maximum of 721,700 had not occurred in the second data set, for instance. The issue is not that they seem to agree it is that because they have a hard time detecting any difference at all, they simply cannot disagree!įor some intuition, consider what would happen if a change in a single value occurred in one dataset. Thus they both give anodyne p-values indicating no significance at all. Neither the t-test nor the permutation test have much power to identify a difference in means between two such extraordinarily skewed distributions. There are many sub-groups of people with very different cost distributions (women vs men, chronic conditions etc) that seem to voilate the iid requirement for central limit theorem, or should I not worry about that? My concern here is that the individual costs are not i.i.d. This is true of many other data sets I have like this and am wondering why the t-test appears to be working when it shouldn't. I thought the t-test values wold be more garbage than what I am getting here. Why is the permutation test statistic coming out so close to the t.test value? If I take logs of the data then I get a t.test p-value of 0.28 and the same from the permutation test. P-value estimated from 500 Monte Carlo replicationsĩ9 percent confidence interval on p-value: Using perm package in R and permTS with exact Monte Carlo Exact Permutation Test Estimated by Monte CarloĪlternative hypothesis: true mean x - mean y is not equal to 0 However, if I use a permutation test for the difference of the means, I get nearly the same p-value all the time (and it gets closer with more iterations). I know its not correct to use a t-test on this data since its so badly non-normal.

T = -0.4777, df = 3366.488, p-value = 0.6329Īlternative hypothesis: true difference in means is not equal to 0 If I perform Welch's t-test on this data I get a result back: Welch Two Sample t-test It looks like this for two sets of people (in this case two age bands with > 3000 obs each): Min. This data is highly skewed to the right and has a lot of zeros.

I have a data set with tens of thousands of observations of medical cost data.