CHSTR_FACTOR
Define a chunk strength factor to manipulate the frequency with which test item bigrams, trigrams, etc., appear in training items.
Contents
Basic chunk strength factor
CHSTR_FACTOR(aglss, 'fname') defines target values for low and high chunk strength test items. High chunk strength items will contain chunks occurring about twice as often in training items as chunks within low chunk strength items.
The chunk strength of a test item is the average strength of the chunks it contains. The strength of a chunk is defined as F/(F+E), where F is the chunk's frequency across all training items, and E is the chunk's expected frequency (basically, the average frequency across all chunks, ignoring nongrammatical chunks). Chunk strength is always between 0 and 1, and a chunk that occurs with average frequency will have a chunk strength of 0.5.
The example below generates training items based on the XX_GRAMMAR, and generates test items that are either low or high chunk strength relative to the training items.
The first output variable returned by CHSTR_FACTOR is an AGLSS object, updated to reflect the chunk strength factor. The second output variable is a cell array of strings naming the different levels of chunk strength defined. Actual chunk strength target values are returned as the third output variable.
s_xx = aglss(xx_grammar, [3 10]); [s, levnames, tgts] = chstr_factor(s_xx, 'ChStr'); levnames tgts s = factorial_testsets(s, {'ChStr', levnames{:}}); s = choose_items(s, 6, 2); disp('Training items:'); disp(format_train_items(s)); disp('Test items:'); disp(format_test_items(s));
Potential items: Grammar involves 2 symbols (xy) 2040 possible strings of length 3-10 369 grammatical strings (18.09%) 1671 ungrammatical strings (81.91%) Using all 369 grammatical strings Using all 1671 ungrammatical strings 1 of 4 chunks of length 2 appear in no grammatical items 3 of 8 chunks of length 3 appear in no grammatical items levnames = 'LowChStr' 'HighChStr' tgts = 0.4000 0.5714 Choosing training item 1.... Choosing training item 2.... Choosing training item 3.... Choosing training item 4.... Updating potential items.... Choosing test item 1 for each set... 2 1. Choosing training item 5.... Updating potential items.... Choosing test item 2 for each set... 2 1. Choosing training item 6.... Training items: Itm_num Itm_name 01 yxxxyxy 02 xyxyxxxxy 03 yxxxxxyxyx 04 yxyxxxxy 05 yxxxxyxy 06 yxyxxxy Test items: Tset_num ChStr_cat Itm_num Itm_name ChStr ChStr_tgt 01 LowChStr 01 yxxxyyyxx 0.399 0.400 01 LowChStr 02 yyxxxyyxx 0.399 0.400 02 HighChStr 01 xxx 0.570 0.571 02 HighChStr 02 xxxxx 0.570 0.571
Defining chunk strengths from relative chunk frequencies
CHSTR_FACTOR(aglss, 'fname', [f1 f2 ... fn]) defines target values for N chunk strength categories, based on a vector F of relative chunk frequencies. Test items in category I will contain chunks whose relative frequency of occurrence in training items is approximately f(I), on average.
The example below builds on the AGLSS object S_XX created in the previous example. Three levels of chunk strength are defined, with high chunk strength items being composed of chunks that are about three times as frequent in training items as the chunks making up the low chunk strength items.
[s, levnames, tgts] = chstr_factor(s_xx, 'MyChStr', [1 2 3]); levnames tgts s = factorial_testsets(s, {'MyChStr', levnames{:}}); s = choose_items(s, 6, 2); disp('Training items:'); disp(format_train_items(s)); disp('Test items:'); disp(format_test_items(s));
1 of 4 chunks of length 2 appear in no grammatical items 3 of 8 chunks of length 3 appear in no grammatical items levnames = 'MyChStr1' 'MyChStr2' 'MyChStr3' tgts = 0.3333 0.5000 0.6000 Choosing training item 1.... Choosing training item 2.... Choosing training item 3.... Choosing training item 4.... Updating potential items.... Choosing test item 1 for each set... 2 1 3. Choosing training item 5.... Updating potential items.... Choosing test item 2 for each set... 3 2 1. Choosing training item 6.... Training items: Itm_num Itm_name 01 xxxxyxyxxy 02 xxxy 03 xyxxxyxy 04 xxxxyxy 05 xxxxxy 06 xyxyxxxy Test items: Tset_num MyChStr_cat Itm_num Itm_name MyChStr MyChStr_tgt 01 MyChStr1 01 xxxyyyyyxy 0.337 0.333 01 MyChStr1 02 yyyyyxyxyx 0.302 0.333 02 MyChStr2 01 yxxxyxy 0.500 0.500 02 MyChStr2 02 xyxyxxx 0.500 0.500 03 MyChStr3 01 xxxxxxy 0.586 0.600 03 MyChStr3 02 xxx 0.598 0.600
Absolute chunk strengths
CHSTR_FACTOR(aglss, 'fname', [ch1 ch2 ... chN]), for CH(I) between 0 and 1, directly specifies chunk strength values for N categories. Test items in category I will have chunk strength approximately CH(I).
The example below builds on the AGLSS object S_XX created for a previous example. Two levels of chunk strength are defined, one lower than average, and one higher than average.
The third output variable from CHSTR_FACTOR matches the specified chunk strength values.
[s, levnames, tgts] = chstr_factor(s_xx, 'ChStr', [.3 .6]); levnames tgts s = factorial_testsets(s, {'ChStr', levnames{:}}); s = choose_items(s, 6, 2); disp('Training items:'); disp(format_train_items(s)); disp('Test items:'); disp(format_test_items(s));
1 of 4 chunks of length 2 appear in no grammatical items 3 of 8 chunks of length 3 appear in no grammatical items levnames = 'LowChStr' 'HighChStr' tgts = 0.3000 0.6000 Choosing training item 1.... Choosing training item 2.... Choosing training item 3.... Choosing training item 4.... Updating potential items.... Choosing test item 1 for each set... 1 2. Choosing training item 5.... Updating potential items.... Choosing test item 2 for each set... 2 1. Choosing training item 6.... Training items: Itm_num Itm_name 01 yxxxxxyxy 02 yxxxxxyxyx 03 xyxyxxxxxy 04 xyxxxxxyxy 05 yxyxxxxxxy 06 yxyxxxxxyx Test items: Tset_num ChStr_cat Itm_num Itm_name ChStr ChStr_tgt 01 LowChStr 01 xyyxx 0.302 0.300 01 LowChStr 02 xxyyx 0.302 0.300 02 HighChStr 01 xxxxxxxy 0.606 0.600 02 HighChStr 02 yxxxxxx 0.602 0.600
Naming chunk strength categories
CHSTR_FACTOR(aglss, 'fname', T, {'name1', 'name2', ...}) specifies names for the different categories of chunk strength, as an alternative to the default names otherwise assigned by CHSTR_FACTOR.
The example below builds on the AGLSS object S_XX created for a previous example. Two levels of chunk strength are defined, one based on chunks that are twice as frequent as the other. The levels are named 'L' and 'H' (for Low and High chunk strength, respectively).
[s, levnames, tgts] = chstr_factor(s_xx, 'ChStr12', [1 2], {'L', 'H'}); levnames tgts s = factorial_testsets(s, {'ChStr12', levnames{:}}); s = choose_items(s, 6, 2); disp('Training items:'); disp(format_train_items(s)); disp('Test items:'); disp(format_test_items(s));
1 of 4 chunks of length 2 appear in no grammatical items 3 of 8 chunks of length 3 appear in no grammatical items levnames = 'L' 'H' tgts = 0.4000 0.5714 Choosing training item 1.... Choosing training item 2.... Choosing training item 3.... Choosing training item 4.... Updating potential items.... Choosing test item 1 for each set... 2 1. Choosing training item 5.... Updating potential items.... Choosing test item 2 for each set... 1 2. Choosing training item 6.... Training items: Itm_num Itm_name 01 yxxxyxy 02 xyxyxxxxy 03 yxxxxxyxyx 04 yxyxxxxy 05 yxxxxyxy 06 yxyxxxy Test items: Tset_num ChStr12_cat Itm_num Itm_name ChStr12 ChStr12_tgt 01 L 01 yxxxyyyxx 0.399 0.400 01 L 02 yyxxxyyxx 0.399 0.400 02 H 01 xxx 0.570 0.571 02 H 02 xxxxx 0.570 0.571
Relating chunk strength to chunk frequencies
CHSTR_FACTOR(aglss, 'fname', T, N, [P_lo P_hi]) controls the fraction of chunks in the lowest and highest chunk strength items that are expected to be at the correct end of the frequency spectrum.
By default, chunk frequencies are controlled so that items can achieve their target chunk strength by containing a mixture of 80% of chunks of the correct frequency and 20% of chunks at the opposite end of the frequency spectrum. For example, a low chunk strength item of length 6 could contain 4 low frequency bigrams and 1 high frequency bigram.
The mixture of allowable chunk frequencies can be controlled by specifying P_lo and P_hi. P_lo specifies the fraction of low frequency chunks expected in low chunk strength items. P_hi specifies the fraction of high frequency chunks expected in high frequency items. If P_hi is the same as P_lo, it can be omitted.
P_lo and P_hi do not alter chunk strength categories. Rather, they help determine targets for the frequencies of different chunks within training items. This will affect the selection of training items, which will then affect the selection of test items in different chunk strength categories.
The example below builds on the AGLSS object S_XX created for a previous example. Chunk strength categories and names are given as [] to get the default values. P_lo is set to 1, which will try to compose low chunk strength items entirely of relatively low frequency chunks. P_hi is set to 0.9, which aims to compose high chunk strength items of 90% high frequency chunks.
[s, levnames, tgts] = chstr_factor(s_xx, 'MyChStr', [], [], [1 0.9]); levnames tgts s = factorial_testsets(s, {'MyChStr', levnames{:}}); s = choose_items(s, 6, 2); disp('Training items:'); disp(format_train_items(s)); disp('Test items:'); disp(format_test_items(s));
1 of 4 chunks of length 2 appear in no grammatical items 3 of 8 chunks of length 3 appear in no grammatical items levnames = 'LowMyChStr' 'HighMyChStr' tgts = 0.4000 0.5714 Choosing training item 1.... Choosing training item 2.... Choosing training item 3.... Choosing training item 4.... Updating potential items.... Choosing test item 1 for each set... 1 2. Choosing training item 5.... Updating potential items.... Choosing test item 2 for each set... 1 2. Choosing training item 6.... Training items: Itm_num Itm_name 01 yxxxyxy 02 yxxxxxyxy 03 yxxxxyxyxy 04 yxyxxxxy 05 yxxxxyxy 06 yxyxxxy Test items: Tset_num MyChStr_cat Itm_num Itm_name MyChStr MyChStr_tgt 01 LowMyChStr 01 xyxyy 0.396 0.400 01 LowMyChStr 02 xyxxxxyyyy 0.410 0.400 02 HighMyChStr 01 xxx 0.570 0.571 02 HighMyChStr 02 xxxxx 0.570 0.571
Chunk sizes used to compute chunk strength
CHSTR_FACTOR(aglss, 'fname', T, N, P, CHSIZE) specifies the maximum chunk size relevant to chunk strength.
By default, chunk strength is based on bigram and trigram chunk frequencies (chunk strength always ignores unigram letter frequencies). Specifying CHSIZE as 2 will limit the computation of chunk strength to bigrams. Specifying CHSIZE as 4 or greater will include chunks larger than trigrams in the computation of chunk strength.
The example below builds on the AGLSS object S_XX created for a previous example. Here, chunk strength is limited to bigrams.
[s, levnames, tgts] = chstr_factor(s_xx, 'BigramStr', [], [], [], 2); levnames tgts s = factorial_testsets(s, {'BigramStr', levnames{:}}); s = choose_items(s, 6, 2); disp('Training items:'); disp(format_train_items(s)); disp('Test items:'); disp(format_test_items(s));
1 of 4 chunks of length 2 appear in no grammatical items levnames = 'LowBigramStr' 'HighBigramStr' tgts = 0.4000 0.5714 Choosing training item 1.... Choosing training item 2.... Choosing training item 3.... Choosing training item 4.... Updating potential items.... Choosing test item 1 for each set... 1 2. Choosing training item 5.... Updating potential items.... Choosing test item 2 for each set... 1 2. Choosing training item 6.... Training items: Itm_num Itm_name 01 yxxy 02 yxxxxy 03 yxx 04 yxxxy 05 yxyxxxxy 06 yxxxxyxy Test items: Tset_num BigramStr_cat Itm_num Itm_name BigramStr BigramStr_tgt 01 LowBigramStr 01 xyyyxxxyyx 0.400 0.400 01 LowBigramStr 02 xyyxxyyyxx 0.400 0.400 02 HighBigramStr 01 xxx 0.582 0.571 02 HighBigramStr 02 yxxxxxxxx 0.570 0.571