Talk:Cross-correlation

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Old comments[edit]

This page is very difficult to follow, when expecting a form of cross correlation that isn't signal processing. Why not start with a general description? —Preceding unsigned comment added by Quatch (talkcontribs) 16:14, 2 September 2009 (UTC)[reply]

I have a more basic problem with this entry than all of these other entries!
The author has described a CROSS CORRELATION as having a peak at zero shift.
THAT, MY FRIENDS, is an AUTO CORRELATION FUNCTION!!!
The CROSS CORRELATION FUNCTION has a peak at the time (or time shift) that is the DIFFERENCE in time betwixt the arrival time two nominally identical signals received at differing times - the Cross Correlation function measures THE DELAY TIME betwixt the arrival time of two, nominally identical signals.
Long ago while employed at National Technical Systems (NTS) I wrote software on an HP 5451C mainframe based Fourier Analyzer/data acquisition system for determining this and all manner of other typical signal processing functions.
This was decades before Matlab and similar applications!
We were on contract from DOD to determine if one could tell if a missile (can't remember which, too long ago!) was present in a launcher, of if there was a dummy present, using remote sensing techniques.
I used Bendat and Piersol's tomes on these subjects to write my programs.
I'm just say'n.
Somebody needs to fix this article. Not me, and not tonight. Pcmacd (talk) 02:24, 17 November 2022 (UTC)[reply]

building up text to add once verified..

given a reference signal and an input signal,
sref = 01011010010110000010111101111001001011010010111000100101101111
sinp = 01111011100100111000000111110011110010011100100111100001001110
the cross-correlation of reference signal with the input signal reaches its maximum of 0.61 when the input signal is rotated to the left 5 places (\dt = -5).

Waveguy 06:10, 28 Jan 2005 (UTC)


Things to cover:

  • Different variations, like the above binary signals, "regular old" digital signals like PCM audio, 2D cross-correlation of images, etc.
  • Circular cross-correlation
  • Faster calculation with the use of FFTs

- Omegatron 04:41, Mar 20, 2005 (UTC)

Move to Cross covariance[edit]

Please look at cross covariance article.

I moved the orginal definition (without dividing sigmas) to the cross covariance page. I know there is a lot of disagreement on the difference between covariance and correlation or whether there is a difference, but It seems to be the consensus of the relevant pages that correlation does involve dividing by the sigmas, while covariance does not divide by the sigmas. See Covariance matrix for a short discussion. So, since the new stuff added did not divide by the sigmas, I reverted back to the original. Here is a table I have been using for the relevant pages.

NO SIGMAS WITH SIGMAS
Covariance Correlation
Cross covariance Cross correlation see ext
Autocovariance Autocorrelation
Covariance matrix Correlation matrix
Estimation of covariance matrices

PAR 02:35, 10 July 2005 (UTC)[reply]

Discrete-Time Signal Processing by Oppenheim, Schafer, and Buck, which is the definitive textbook for DSP, defines the cross-correlation of two signals without dividing by any sigma. Numerical Recipes in C by Press et al. also defines it without dividing by sigma.
What you are calling the "cross correlation", dividing by sigma, is called the "linear-correlation coefficient" in the statistics text I happen to have on my shelf (Data Reduction and Error Analysis for the Physical Sciences by Bevington and Robinson.)
Perhaps there is a difference in usage between the statistics and signal-processing/engineering communities. Even if so, it is not Wikipedia's place to annoint one usage as the "right" one.
—Steven G. Johnson 19:13, July 10, 2005 (UTC)

I'm not annointing here, I'm just trying to clarify things. Looking at the table above, the cross-correlation was the only article that was in conflict with every other article in the table as far as the sigmas was concerned, so I changed it. If you have a better idea, lets do it. PAR 01:52, 11 July 2005 (UTC)[reply]

Please realize that the comments hear apply to every other correlation article in the table. I think that the articles should list forms both with and without sigma, probably merging the covariance/correlation articles to avoid duplication, explain the context for the different usages in signal processing and statistics, and explain the impact of the sigma. As it stands, Wikipedia is annointing one particular usage as the correct one, which is wrong. —Steven G. Johnson 15:55, July 11, 2005 (UTC)
It seems that the definition is ambiguous. - Either we need to find the dominant definition and go with that, or we have to present both. Cburnett 19:23, July 11, 2005 (UTC)

After checking 7 different statistics books, the following is unanimous:

  • The covariance of two different random variates X and Y is
Cov(X,Y)=E( (X-E(X)) (Y-E(Y)) ) where E(X) is expectation value of X.
  • The (linear) correlation coefficient is
R(X,Y) = Cov(X,Y)/(S(X) S(Y)) where S(X) is the std. deviation of X.

Oppenheim et al is the only one to define cross-covariance and cross-correlation and they do it in a very consistent way:

cross correlation
cross covariance
autocorrelation
autocovariance

I think it's clear that my moving cross-correlation to cross-covariance was wrong. It's not the sigmas that distinguish correlation from covariance, it's the subtraction of the means. The division by the sigmas is another issue. I would like to alter all articles to conform with Oppenheim's definition, and add very clearly the alternate definitions. There will be no conflict with the 7 books I mentioned, but there will be a clear conflict with the autocorrelation article as it stands. I understand that we do not want to favor a particular set of definitions if there is any controversy, but it seems that the controversy is not so bad, and we do want some clarity and predictability to these articles, rather than conflicting or missing definitons. I will make these changes in a day or two unless there is an objection. PAR 00:18, 12 July 2005 (UTC)[reply]

I have also seen "correlation function" used in physical contexts for the subtracted-mean version. The difference is also blurred because in many important cases the mean is zero. I would prefer if auto-correlation, auto-covariance, cross-correlation, and cross-covariance were all defined on a single page (with appropriate redirects). Mathematically, they are so closely related that it hardly makes sense to me to have separate pages. (I'm not sure what to do about the dividing-by-sigma variant, since I'm not so familiar with that). —Steven G. Johnson 01:37, July 12, 2005 (UTC)

OK - how about this: A page entitled "covariance and correlation" or something. It explains that there are conflicting definitions. Then it adopts Oppenheim's definitions for the sake of clarity, not because they are best, but because we need to settle on some consistent way of speaking about things. Then it redirects to the various pages, each of which is rewritten consistent with these definitions, including the important alternate definitions. They are also rewritten as if they might be subsections of the main page. If after this is all done, they look like they ought to go in the main article, we can do that. That way there's no big lump of work that needs to be done, it can all be done slowly, in pieces. PAR 02:07, 12 July 2005 (UTC)[reply]

Sounds good to me. —Steven G. Johnson 02:45, July 12, 2005 (UTC)

Ok - I put up a "starter page" at Covariance and correlation. PAR 04:06, 12 July 2005 (UTC)[reply]

By the way, I don't think there is anything wrong with an editorial policy that enforces consistent terminology. This is not at odds with NPOV: alternative definitions should be mentioned, but for the sake of coherence and consistency a common set of terms should be used. See for example groupoid vs. magma (algebra) for a precedent. --MarkSweep 16:58, 12 July 2005 (UTC)[reply]

some example[edit]

i need examples of mean,median,mode,variablity,range,variance,co-relation,standard deviation ,skewness related to marketing

f*?[edit]

The article mentions in many equations but doesn't define it. What's ? —Ben FrantzDale 20:07, 19 December 2006 (UTC)[reply]

A superscript asterisk indicates the complex conjugate. --Abdull 21:06, 20 May 2007 (UTC)[reply]

Zero-Lag?[edit]

Could someone please add information on the zero-lag? We also need to mention that cross-correlation operation is associative, distributive but not commutative.

Cross-correlation and convolution[edit]

I don't see yet why . Let's simplify things to f(t) and g(t) being real functions, therefore .

As , what does look like expressed in integral form?

Besides, if convolution is commutative and cross-correlation is not commutative why can you say at all? --Abdull 21:23, 20 May 2007 (UTC)[reply]

answer to the commutativity question:

Confusion with definition on convolution page[edit]

The definition used on convolution page is inconsistent with that used here such that the first property of the cross-correlation is unclear with respect to how to derive it starting from the definition of convolution currently being used. Would it be such a bad thing to show the entire derivation here, i.e. just fill in the 2-3 steps between the correlation of f and g and the convolution of f*(-t) and g? Alternatively, one could define the convolution in terms of the covariance since it would not require a definition that's not listed on the page and would create consistency. --Kdmckale (talk) 13:17, 18 October 2015 (UTC)[reply]

Corrected a mistake in the definition of cross-correlation as convolution:

Added property that:

Sergio Rescia (talk) 17:06, 9 February 2018 (UTC) Sergio Rescia[reply]

Dangling citation[edit]

The in-text citation "(Campbell, Lo, and MacKinlay 1996)" isn't all that useful without the actual reference. Could whoever put that in please add the full citation in a "references" section?

appropriate integration limits[edit]

The article curently does not describe what the appropriate integration limits are. Can someone who knows the answer please add them? Ngchen 14:58, 7 October 2007 (UTC)[reply]

As an example of (to me) confusing integration limits, consider the convolution of two distributions X ~ Gamma( kx, tx ) and Y ~ Gamma( ky, ty ) (where ki is always >0) -- the goal being to derive a distribution for the difference d=X-Y. It took me most of an evening to realize that the answer is piecewise, with different formulas for d<0 and d>0 (because you have to make sure the arguments of the Gamma distributions are positive -- I think). I eventually figured it out :), but perhaps someone with a better understanding could clarify? Anon.

Possible split[edit]

I think that the usage and terminology in "signal processing" and in "statistics" are so different that a split into articles specific to each is required. Melcombe (talk) 11:47, 26 March 2008 (UTC)[reply]

Things to add (but not by me)[edit]

Can someone please add some words about cross-correlating in the frequency domain, a basic description, its advantages (computing economy), disadvantages (only circular correlation unless periods are lengthened and high-pass is used), etc.

Can someone also add a few words about multidimensional correlation and how it reduces to sets of linear correlation?

Fleem (talk) 11:13, 26 September 2008 (UTC)[reply]

Another thing that might be included here or in a disambiguation page is the computing definition of the term. Here is a brief def : A cross-correlator is a generic name for a process that compares usually two input streams and usually produces 'summing' or 'difference' streams. All streams will follow their own defined rules and 'summing' and 'difference' are generic labels rather than mathematical terms. The centre of the definition is that it is about feature comparison. Common examples might include text searching algorithms in word processors or database analysis, or various levels in image processing. Lucien86 (talk) 16:22, 13 May 2009 (UTC)[reply]

Confusing Definition[edit]

For some reason when working with cross-correlations, the notation for the time-lag "t" and time variable "tau" are switched from the usual convention--little "t" as time and "tau" as a temporal shift. I think the definition could be substantially improved if the the time lag is stated explicitly as "t", as is done in "Numerical Recipes in C", 2nd Ed, on page 498, Eq 12.0.10. I will not change the page, however, because I am not in the habit of changing pages before I get feedback from others.

Bjf624 (talk) 17:07, 18 February 2009 (UTC)[reply]

I agree, usually "tau" is used for the lag, and "t" for the dummy variable in the integration. --98.201.96.169 (talk) 23:29, 30 January 2011 (UTC)[reply]

Help please[edit]

Is there any accepted definition of when a cross correlation can be rated as "good". Assuming any given normalized cross-correlation, a correlation coefficient of 1/-1 points to identical functions / inverted identical functions while the coefficient being 0 clearly defines a "non-correlation" (or orthogonal correlation). But what about the values inbetween? Is 0.8 a good correlation, or would 0.9 do the job? Is there a general classification of coefficients in terms of correlation quality? Or does that very much depend on the subject of application? I understand that the whole approach of cross-correlation is a statiscal one. Hence there may even be research into that field, describing correlation quality (cross-correlation coefficients) by statistical values such as standard deviation? Could someone knowledgeable maybe give some references and include information in the article? Thank you! --194.246.46.15 (talk) 16:13, 1 July 2009 (UTC) Thank you[reply]

No. Sometimes an apple is just an apple. --Drizzd (talk) 11:55, 4 July 2009 (UTC)[reply]
Yes, there is more that could be said about that. It depends a lot on your signal-to-noise ratio and on the frequency of your expected signal. If you have zero noise, then 1 means perfect alignment and anything else is imperfect alignment. If you have band-limited (no infinite spatial frequencies) then values near 1.0 mean near-perfect alignment. SNR means you can never expect 1.0 for perfect alignment, and the worse the SNR, the lower the maximum expected correlation value. So yes, the value means something but no it isn't directly interpretable. e.g., a maximum correlation of 0.99 doesn't mean there is a 99% chance that that is the correct correlation. You have to think about what could cause less-than perfect correlation values. —Ben FrantzDale (talk) 19:42, 6 July 2009 (UTC)[reply]
Intermediate values of correlation have an interpretation in terms of how well one series can be predicted from another in the sense that one minus the square of the correlation says what proportion of the variance of one series would be left as an error variance if a simple shift and scale is applied optimally from one series to the other. But in signal processing you could hope to do better than than this by using an optimally filtered version of one series to predict the other. Ideas about prewhitening both series could come in here. But even a small amount of correlation could be important depending on what is being predicted ... even a small improvement in predictive power might be important.Otherwise iy may be worth looking at other related articles such as correlation function and correlation. Melcombe (talk) 16:03, 7 July 2009 (UTC)[reply]

Incorrect formulation of normalised cross-correlation[edit]

The formula given for the normalised cross-correlation is wrong/misleading. The normalisation varies over the convolution, and is calculated based on the current template position. Refer to Lewis's paper (http://scribblethink.org/Work/nvisionInterface/nip.pdf) for more information. --StefanVanDerWalt (talk) 09:50, 6 December 2009 (UTC)[reply]

Known properties of cross-correlation should be added[edit]

I would suggest that all useful properties such as distributivity etc. are listed in an way analogous to how it is done in the convolution article. Furthermore, the notation with "h(-)" as used in the properties section was new to me. I didn't find a definition anywhere. Cheers, Jo. —Preceding unsigned comment added by 129.132.71.148 (talk) 07:40, 6 April 2010 (UTC)[reply]

different asterisks[edit]

>> If either f or g is Hermitean, then f*g=f*g.

At first glance, this seems to be a trivial identity which holds for _any_ f and g. At a closer look, you notice the difference between the fivefold and the sixfold asterisk. I think this can easily lead to confusion. Should we use a different notation? DrBaumeister (talk) 01:31, 24 April 2011 (UTC)[reply]

x(n)={1,2,3,4,8,-1,-7,-6} find

  a)auto correlation of x[n]
  b)cross correlation of x[n] with {-6,-7,-1,8,4,3,2,1}  — Preceding unsigned comment added by 14.139.128.15 (talk) 10:48, 12 July 2011 (UTC)[reply] 

Implementation of Circular Cross-Correlation via FFTs[edit]

When implementing a cross-correlation in digital logic, it often useful to implement the algorithm as a circular cross-correlation through the use of FFTs.

In the following days, I plan to produce an Algorithms Section, including a description, and implementation of the FFt-based approach.

I will also include an implementation of the "shift, multiply and sum" implementation. Does anyone know if there is a formal name for what I am calling the "Shift, multiply, and sum" approach?

Also: does anyone know of any other algorithms?

Kyle.drerup (talk) 05:26, 25 January 2012 (UTC)[reply]

Since cross-correlation is equivalent to convolution with a sign flip in the function argument, every fast convolution algorithm gives a fast correlation algorithm. See Convolution#Fast convolution algorithms. (Unfortunately, that section needs a lot of work. It doesn't even mention the Karatsuba algorithm or the Toom-Cook algorithm, and those algorithms in turn are described in their articles as "multiplication algorithms" when in fact they can be used for any convolution problem.) — Steven G. Johnson (talk) 14:41, 25 January 2012 (UTC)[reply]

Vitya 28.10.2015

"Does anyone know if there is a formal name for what I am calling the "Shift, multiply, and sum" approach?"
I have read in few places, that it's called "brute-force algorithm".

Hermitian properties[edit]

>> If either f or g is Hermitian, then:  

The same statement can also be found in the Hermitian page.

Is that true? I think only if is Hermitian then .Can anyone show the proof of if is Hermitian? -Clarafly (talk) 07:23, 17 June 2012 (UTC)[reply]

I agree that this statement is incorrect. Counterexample: f anything not Hermitian, g is the Dirac impulse. Then and . --Drizzd (talk) 13:59, 17 June 2012 (UTC)[reply]

Visual comparison graphs are offset[edit]

The "Visual comparison" picture is a great idea, but the lines that represents f*g, f♦g and f♦f (where I'm using "♦" for correlation since I can't find a star symbol) are offset. Assuming f(t) and g(t) use the same time scale, f*g should be shifted right and f♦g and f♦f should be shifted left. To be a bit more precise, and for the sake of simplicity assuming f(t) and g(t) are nonzero between t=0 and t=1 (i.e., defining t=0 at the leading edges of f and g, and t=1 at the falling edge of f and where g(t) goes to zero again), f*g is zero for t<0 and maximum when t=1; whereas f♦g and f♦f are zero for t>1 and maximum when t=0. It's also really easy to read the wrong thing into the graphs of the overlapping functions, but on the other hand there's a lot of intuitive value in it as well, so I'm not sure I would change that. But if there's a way to change the picture to shift the black lines, it would be a lot clearer. — Preceding unsigned comment added by Gdlong (talkcontribs) 20:56, 15 November 2013 (UTC)[reply]


Me to have noticed this. The cross correlation figure is wrong!! — Preceding unsigned comment added by Ahmedrashed00 (talkcontribs) 11:58, 7 July 2015 (UTC)[reply]

Problem with Nonlinear Section[edit]

"This problem arises because some moments can go to zero and this can incorrectly suggest that there is little correlation between two signals when in fact the two signals are strongly related by nonlinear dynamics."

If the moments go to zero, and the correlation becomes zero, then it isn't incorrect to suggest that there is little correlation between two signals. The writer seems to be confusing "correlation" with "connection" or some other word that means a relationship exists. The problem arises, because correlation is a measurement of linear dependence, so it makes sense for nonlinear dependencies to circumvent the measurement. A Wikipedia article itself describes this for random variables (and stochastic processes are simply a series of random variables, so it applies just as well): http://en.wikipedia.org/wiki/Correlation_and_dependence — Preceding unsigned comment added by 71.80.79.67 (talk) 09:40, 6 February 2014 (UTC)[reply]

Error in Time series Analysis Section[edit]

The writer says, correctly, that in time series the cross correlation is the normalized covariance function (i.e. the Pearson's correlation coefficient). It then shows in Latex the definition of the regular covariance function without normalization by the variances of the two random variables. — Preceding unsigned comment added by 71.80.79.67 (talk) 05:10, 9 February 2014 (UTC)[reply]

When variables f and g are normalized, then the cross-correlation is identical with Pearson’s r.
Cross-correlation = SUM[f(i)*g(i-lag)]/N_overlap where normslizing of raw data time series F and G means: f(i)=(F(i)-F_mean)/F_stdev and g(i)=[G(i)-G_mean]/G_stdev
This is exactly the same as:
Cross-correlation=S_FG/Sqrt(S_FF*S_GG)
where:
S_FG=SUM[(F(i)-F_mean)*(G(i-lag)-G_mean)]
S_FF=SUM[(F(i)-F_mean)^2] for all i
S_GG=SUM[(G(i)-G_mean)^2] for all i 85.221.95.150 (talk) 19:19, 6 April 2024 (UTC)[reply]

Zero Lag Peak Statement is Slightly Incorrect[edit]

"there will always be a peak at a lag of zero unless the signal is a trivial zero signal."

Peak here means "highest point in the signal", so even for the trivial zero signal, there is still a peak at a lag of 0 (equal to 0). — Preceding unsigned comment added by 96.38.109.155 (talk) 13:11, 14 March 2014 (UTC)[reply]

Visual explanation needed[edit]

The article for convolutions has an extremely helpful section for understand what the convolution function does: https://en.wikipedia.org/wiki/Convolution#Derivations It unpacks the terms in the function, the steps in the algorithm, and animates it.

Someone should do the same for this article! Richard☺Decal (talk) 04:32, 26 February 2015 (UTC)[reply]

Missing or misleading content in the introduction[edit]

(1) The definition given in the section entitled "Time-delay analysis" is different from both of the definitions given in the introduction. Yet it is a good, mature definition, and deserves to be mentioned in the introduction.

The "signal processing" definition in the introduction refers only to a time-lagged single inner product, not a random variable. On the other hand, the "probability and statistics" definition in the introduction does not have the structure of a time series or time-dependent signal; in particular it has no concept of "time lag".

(2) I also have trouble seeing how the time-lagged inner product of the introduction is useful for signals. Don't signals last forever, so they are not in L^2, so the convolution integral would be ill-defined or infinite?

I feel that is explained by "It is commonly used for searching a long signal for a shorter, known feature." An example is a matched filter for detecting the arrival of radar pulse returns. It cross-correlates the streaming signal with the finite-length pulse shape, producing peaks at the points that pulses occur.
--Bob K (talk) 13:31, 20 November 2016 (UTC)[reply]

I'm sure that for true signals, the integration is done over a finite time-window, yet this is not mentioned in the article.

178.38.161.142 (talk) 14:07, 25 May 2015 (UTC)[reply]

Questionable property[edit]

Regarding the last entry in the properties section:

I cannot reproduce the given equation . According to my calculations:

Can you give a reference for this property?

Best regards — Preceding unsigned comment added by 194.166.51.226 (talk) 08:45, 20 November 2016 (UTC)[reply]

I get the same result as you. Using this relationship three times:
it follows that:
--Bob K (talk) 22:51, 23 November 2016 (UTC)[reply]

Nice! — Preceding unsigned comment added by 213.162.68.159 (talk) 08:44, 26 November 2016 (UTC)[reply]

Bad property[edit]

That minus sign before the f* in the first property is just flat out wrong. — Preceding unsigned comment added by 188.174.57.61 (talk) 13:58, 14 August 2017 (UTC)[reply]

Reduction in animated .gif size[edit]

The animated .gif on this page (link here) is quite large at 7.13MB. It uses 20 fps for 40s (800 frames total). I suggest reducing the frame rate to get the .gif under 2 MB.

Would anyone have an objection to this? I cannot contact the original uploader to this .gif but it has CC 4.0 license

Davidjessop (talk) 00:16, 24 February 2019 (UTC)[reply]

No objection. Frankly, I don't find it helpful. You can delete it completely, as far as I'm concerned. --Bob K (talk) 05:04, 24 February 2019 (UTC)[reply]

asterisks[edit]

There is discussion in talk:convolution about the notation of the operators for convolution and cross-correlation. It seems that this article uses five and six pointed asterisks to distinguish them. Is there a reference for these being the WP:COMMONNAME (well, except that they aren't names) for them? After about 2 seconds, I forget which one is which. Gah4 (talk) 21:39, 7 June 2019 (UTC)[reply]

Also, List_of_mathematical_symbols_by_subject suggests that rho is used for cross-correlation. Asterisks with any number of points are not mentioned. Gah4 (talk) 21:42, 7 June 2019 (UTC)[reply]