Inter-Egg Correlation, Notes and Discussion

The inter-egg correlation work is difficult, and requires discussion of methodology and interpretation. This page will provide access to some of the content of our exchanges, as background for the primary results.
Date: Sat, 22 Apr 2000 08:10:50 -0400 (EDT)
From: rdnelson 
To: Matthew.J.Salganik@ccmail.census.gov
Cc: Doug Mast , York Dobyns ,
     Roger D. Nelson 
Subject: Re: Inter-Egg correlation

On Fri, 21 Apr 2000 Matthew.J.Salganik@ccmail.census.gov wrote:

> Here is Doug's reply to my e-mail.  I think he left you off the cc list
> accidentaly.  I am going to mull this over some this weekend.  Take care.
> 
Hi Matt,

Thanks for the copy.  I will copy this note to Doug too, and
to York, with whom I talked about it a bit.  Doug's response
is somewhat similar to the observation I made about the comparison 
of the synchronized and control counts -- both should be
affected in the same way by any general non-independence problem,
and the empirical results actually show opposite deviations
from expected counts.

One thing that would be helpful is to know exactly how the
array of correlations was constructed.  Doug says
"The method is then to compute Pearson correlation coefficients 
between the signals from all possible egg pairs, over a 
large number of one-minute intervals."  That seems clear,
and I read it to mean that if we have eggs A B C, there
will be correlations AB AC and BC.  The question is
whether these are independent, and the concern is, if AB
and AC are strongly correlated, then BC must be too.  On the
null hypothesis, there are no correlations and AB AC and BC
are indeed independent, but if there is an influence that affects 
the eggs in a common way, then we should expect the worrisome 
non-independence, but for precisely the reason that there is
an anomalous external agency.  Then the question becomes, can we 
legitimately count all three excess correlations?

(York suggests that one approach would be to count all
correlations to a single egg, i.e., count AB and AC.  One
could then repeat this with B as the pivot egg and again
with C, etc.  Each such set would be a separate independent
estimate of the inter-egg correlation, whose average would
be a well-qualified estimate using all available data.)

Assuming the control pairs are constructed in the same way as the 
synchronized pairs (the same offset is used for each set of 
correlations, that is, a "pseudo-synchronized" set is created from 
pairs with a common offset) they constitute a proper comparison set
in which the effect of non-independence is exactly the same, so 
we can be certain that differences between synchronized and control 
counts are not affected by the possible non-independence.  
Furthermore, the empirical counts show an excess for synchronized 
and a deficit for the control pairs, which strongly indicates that
the method of counting all significant correlations is not
contaminated by any effect of non-independence. 

Since it does seem that there is, however, some non-independence, 
as described, it is worth considering why it doesn't seem to
create a problem in the counting method.  My guess is that
the countable correlations are distributed more homogeneously 
than would be suggested by our image of how an "effect"
should work.  We envision a minute in which the effect
impinges on all the eggs in the network, thus resulting in
correlations among a given set of eggs, but this may be an
incorrect picture.  Instead, we may be seeing a single large
correlation here and another there, and finding an excess
of these otherwise unrelated correlations in different sets of
synchronized eggs.  Any non-independence, in this view, would 
constitute a second-order effect that is too small to observe,
and too small to affect Doug's measureable.

It is interesting to think about this, but a formal
assessment is, alas, beyond my capacities.  I am copying
this to York for his information, and hoping he may comment.

Roger

-----

[Doug's response to Matt's inquiry]

> Hi Matt,
> 
> Thanks for your comments.
> 
> > One question I had was about the independence of the correlation values.
> > For example if we have three data streams A, B, and C and we calculate the
> > three p-values of the correlation of the possible combinations of the three 
> > (i.e. p-value for A cor B, A cor C, and B cor C).  I am wondering if these
> > three p-values are independent.  I mean if A correlates with B and A 
> > correlates with C then B probably correlates with C.
> 
> I don't really know.  Your statement makes some sense.  But then, the
> intercorrelations of the egg data (the raw, not the chi-squared data)
> closely match the theory.  For example, one tenth of the signal
> pairs should have correlation coefficients above the threshold for the
> 0.10 significance level, and the actual fraction of correlation coefficients
> above that threshold (as seen in the tables for the synchronous and both
> control runs) is 0.10, within +- 0.01% in each case.
> 
> So, although I can't rigorously prove that the independence assumption
> is justified, the empirical data suggest that the inter-egg correlations
> are close to independent (or at least that any non-independent effects
> average out in the long run).
> 
> Cheers,
> 
> Doug.
You can return to the inter-egg correlation page by clicking here.
GCP Home