CHSTR_FACTOR

Define a chunk strength factor to manipulate the frequency with which test item bigrams, trigrams, etc., appear in training items.

Contents

Basic chunk strength factor

CHSTR_FACTOR(aglss, 'fname') defines target values for low and high chunk strength test items. High chunk strength items will contain chunks occurring about twice as often in training items as chunks within low chunk strength items.

The chunk strength of a test item is the average strength of the chunks it contains. The strength of a chunk is defined as F/(F+E), where F is the chunk's frequency across all training items, and E is the chunk's expected frequency (basically, the average frequency across all chunks, ignoring nongrammatical chunks). Chunk strength is always between 0 and 1, and a chunk that occurs with average frequency will have a chunk strength of 0.5.

The example below generates training items based on the XX_GRAMMAR, and generates test items that are either low or high chunk strength relative to the training items.

The first output variable returned by CHSTR_FACTOR is an AGLSS object, updated to reflect the chunk strength factor. The second output variable is a cell array of strings naming the different levels of chunk strength defined. Actual chunk strength target values are returned as the third output variable.

s_xx = aglss(xx_grammar, [3 10]);

[s, levnames, tgts] = chstr_factor(s_xx, 'ChStr');

levnames

tgts

s = factorial_testsets(s, {'ChStr', levnames{:}});
s = choose_items(s, 6, 2);

disp('Training items:');
disp(format_train_items(s));

disp('Test items:');
disp(format_test_items(s));
 Potential items:
	Grammar involves 2 symbols (xy)
	2040 possible strings of length 3-10
		369 grammatical strings (18.09%)
		1671 ungrammatical strings (81.91%)
	Using all 369 grammatical strings
	Using all 1671 ungrammatical strings
 
	1 of 4 chunks of length 2 appear in no grammatical items
	3 of 8 chunks of length 3 appear in no grammatical items

levnames = 

    'LowChStr'
    'HighChStr'


tgts =

    0.4000    0.5714

 Choosing training item 1....
 Choosing training item 2....
 Choosing training item 3....
 Choosing training item 4....
 Updating potential items....
 Choosing test item 1 for each set... 2 1.
 Choosing training item 5....
 Updating potential items....
 Choosing test item 2 for each set... 2 1.
 Choosing training item 6....

Training items:
  Itm_num    Itm_name
       01     yxxxyxy
       02   xyxyxxxxy
       03  yxxxxxyxyx
       04    yxyxxxxy
       05    yxxxxyxy
       06     yxyxxxy

Test items:
 Tset_num     ChStr_cat  Itm_num    Itm_name      ChStr  ChStr_tgt
       01      LowChStr       01   yxxxyyyxx      0.399      0.400
       01      LowChStr       02   yyxxxyyxx      0.399      0.400
       02     HighChStr       01         xxx      0.570      0.571
       02     HighChStr       02       xxxxx      0.570      0.571

Defining chunk strengths from relative chunk frequencies

CHSTR_FACTOR(aglss, 'fname', [f1 f2 ... fn]) defines target values for N chunk strength categories, based on a vector F of relative chunk frequencies. Test items in category I will contain chunks whose relative frequency of occurrence in training items is approximately f(I), on average.

The example below builds on the AGLSS object S_XX created in the previous example. Three levels of chunk strength are defined, with high chunk strength items being composed of chunks that are about three times as frequent in training items as the chunks making up the low chunk strength items.

[s, levnames, tgts] = chstr_factor(s_xx, 'MyChStr', [1 2 3]);

levnames

tgts

s = factorial_testsets(s, {'MyChStr', levnames{:}});
s = choose_items(s, 6, 2);

disp('Training items:');
disp(format_train_items(s));

disp('Test items:');
disp(format_test_items(s));
	1 of 4 chunks of length 2 appear in no grammatical items
	3 of 8 chunks of length 3 appear in no grammatical items

levnames = 

    'MyChStr1'
    'MyChStr2'
    'MyChStr3'


tgts =

    0.3333    0.5000    0.6000

 Choosing training item 1....
 Choosing training item 2....
 Choosing training item 3....
 Choosing training item 4....
 Updating potential items....
 Choosing test item 1 for each set... 2 1 3.
 Choosing training item 5....
 Updating potential items....
 Choosing test item 2 for each set... 3 2 1.
 Choosing training item 6....

Training items:
  Itm_num    Itm_name
       01  xxxxyxyxxy
       02        xxxy
       03    xyxxxyxy
       04     xxxxyxy
       05      xxxxxy
       06    xyxyxxxy

Test items:
 Tset_num   MyChStr_cat  Itm_num    Itm_name    MyChStr MyChStr_tgt
       01      MyChStr1       01  xxxyyyyyxy      0.337       0.333
       01      MyChStr1       02  yyyyyxyxyx      0.302       0.333
       02      MyChStr2       01     yxxxyxy      0.500       0.500
       02      MyChStr2       02     xyxyxxx      0.500       0.500
       03      MyChStr3       01     xxxxxxy      0.586       0.600
       03      MyChStr3       02         xxx      0.598       0.600

Absolute chunk strengths

CHSTR_FACTOR(aglss, 'fname', [ch1 ch2 ... chN]), for CH(I) between 0 and 1, directly specifies chunk strength values for N categories. Test items in category I will have chunk strength approximately CH(I).

The example below builds on the AGLSS object S_XX created for a previous example. Two levels of chunk strength are defined, one lower than average, and one higher than average.

The third output variable from CHSTR_FACTOR matches the specified chunk strength values.

[s, levnames, tgts] = chstr_factor(s_xx, 'ChStr', [.3 .6]);

levnames

tgts

s = factorial_testsets(s, {'ChStr', levnames{:}});
s = choose_items(s, 6, 2);

disp('Training items:');
disp(format_train_items(s));

disp('Test items:');
disp(format_test_items(s));
	1 of 4 chunks of length 2 appear in no grammatical items
	3 of 8 chunks of length 3 appear in no grammatical items

levnames = 

    'LowChStr'
    'HighChStr'


tgts =

    0.3000    0.6000

 Choosing training item 1....
 Choosing training item 2....
 Choosing training item 3....
 Choosing training item 4....
 Updating potential items....
 Choosing test item 1 for each set... 1 2.
 Choosing training item 5....
 Updating potential items....
 Choosing test item 2 for each set... 2 1.
 Choosing training item 6....

Training items:
  Itm_num    Itm_name
       01   yxxxxxyxy
       02  yxxxxxyxyx
       03  xyxyxxxxxy
       04  xyxxxxxyxy
       05  yxyxxxxxxy
       06  yxyxxxxxyx

Test items:
 Tset_num     ChStr_cat  Itm_num    Itm_name      ChStr  ChStr_tgt
       01      LowChStr       01       xyyxx      0.302      0.300
       01      LowChStr       02       xxyyx      0.302      0.300
       02     HighChStr       01    xxxxxxxy      0.606      0.600
       02     HighChStr       02     yxxxxxx      0.602      0.600

Naming chunk strength categories

CHSTR_FACTOR(aglss, 'fname', T, {'name1', 'name2', ...}) specifies names for the different categories of chunk strength, as an alternative to the default names otherwise assigned by CHSTR_FACTOR.

The example below builds on the AGLSS object S_XX created for a previous example. Two levels of chunk strength are defined, one based on chunks that are twice as frequent as the other. The levels are named 'L' and 'H' (for Low and High chunk strength, respectively).

[s, levnames, tgts] = chstr_factor(s_xx, 'ChStr12', [1 2], {'L', 'H'});

levnames

tgts

s = factorial_testsets(s, {'ChStr12', levnames{:}});
s = choose_items(s, 6, 2);

disp('Training items:');
disp(format_train_items(s));

disp('Test items:');
disp(format_test_items(s));
	1 of 4 chunks of length 2 appear in no grammatical items
	3 of 8 chunks of length 3 appear in no grammatical items

levnames = 

    'L'    'H'


tgts =

    0.4000    0.5714

 Choosing training item 1....
 Choosing training item 2....
 Choosing training item 3....
 Choosing training item 4....
 Updating potential items....
 Choosing test item 1 for each set... 2 1.
 Choosing training item 5....
 Updating potential items....
 Choosing test item 2 for each set... 1 2.
 Choosing training item 6....

Training items:
  Itm_num    Itm_name
       01     yxxxyxy
       02   xyxyxxxxy
       03  yxxxxxyxyx
       04    yxyxxxxy
       05    yxxxxyxy
       06     yxyxxxy

Test items:
 Tset_num   ChStr12_cat  Itm_num    Itm_name    ChStr12 ChStr12_tgt
       01             L       01   yxxxyyyxx      0.399       0.400
       01             L       02   yyxxxyyxx      0.399       0.400
       02             H       01         xxx      0.570       0.571
       02             H       02       xxxxx      0.570       0.571

Relating chunk strength to chunk frequencies

CHSTR_FACTOR(aglss, 'fname', T, N, [P_lo P_hi]) controls the fraction of chunks in the lowest and highest chunk strength items that are expected to be at the correct end of the frequency spectrum.

By default, chunk frequencies are controlled so that items can achieve their target chunk strength by containing a mixture of 80% of chunks of the correct frequency and 20% of chunks at the opposite end of the frequency spectrum. For example, a low chunk strength item of length 6 could contain 4 low frequency bigrams and 1 high frequency bigram.

The mixture of allowable chunk frequencies can be controlled by specifying P_lo and P_hi. P_lo specifies the fraction of low frequency chunks expected in low chunk strength items. P_hi specifies the fraction of high frequency chunks expected in high frequency items. If P_hi is the same as P_lo, it can be omitted.

P_lo and P_hi do not alter chunk strength categories. Rather, they help determine targets for the frequencies of different chunks within training items. This will affect the selection of training items, which will then affect the selection of test items in different chunk strength categories.

The example below builds on the AGLSS object S_XX created for a previous example. Chunk strength categories and names are given as [] to get the default values. P_lo is set to 1, which will try to compose low chunk strength items entirely of relatively low frequency chunks. P_hi is set to 0.9, which aims to compose high chunk strength items of 90% high frequency chunks.

[s, levnames, tgts] = chstr_factor(s_xx, 'MyChStr', [], [], [1 0.9]);

levnames

tgts

s = factorial_testsets(s, {'MyChStr', levnames{:}});
s = choose_items(s, 6, 2);

disp('Training items:');
disp(format_train_items(s));

disp('Test items:');
disp(format_test_items(s));
	1 of 4 chunks of length 2 appear in no grammatical items
	3 of 8 chunks of length 3 appear in no grammatical items

levnames = 

    'LowMyChStr'
    'HighMyChStr'


tgts =

    0.4000    0.5714

 Choosing training item 1....
 Choosing training item 2....
 Choosing training item 3....
 Choosing training item 4....
 Updating potential items....
 Choosing test item 1 for each set... 1 2.
 Choosing training item 5....
 Updating potential items....
 Choosing test item 2 for each set... 1 2.
 Choosing training item 6....

Training items:
  Itm_num    Itm_name
       01     yxxxyxy
       02   yxxxxxyxy
       03  yxxxxyxyxy
       04    yxyxxxxy
       05    yxxxxyxy
       06     yxyxxxy

Test items:
 Tset_num   MyChStr_cat  Itm_num    Itm_name    MyChStr MyChStr_tgt
       01    LowMyChStr       01       xyxyy      0.396       0.400
       01    LowMyChStr       02  xyxxxxyyyy      0.410       0.400
       02   HighMyChStr       01         xxx      0.570       0.571
       02   HighMyChStr       02       xxxxx      0.570       0.571

Chunk sizes used to compute chunk strength

CHSTR_FACTOR(aglss, 'fname', T, N, P, CHSIZE) specifies the maximum chunk size relevant to chunk strength.

By default, chunk strength is based on bigram and trigram chunk frequencies (chunk strength always ignores unigram letter frequencies). Specifying CHSIZE as 2 will limit the computation of chunk strength to bigrams. Specifying CHSIZE as 4 or greater will include chunks larger than trigrams in the computation of chunk strength.

The example below builds on the AGLSS object S_XX created for a previous example. Here, chunk strength is limited to bigrams.

[s, levnames, tgts] = chstr_factor(s_xx, 'BigramStr', [], [], [], 2);

levnames

tgts

s = factorial_testsets(s, {'BigramStr', levnames{:}});
s = choose_items(s, 6, 2);

disp('Training items:');
disp(format_train_items(s));

disp('Test items:');
disp(format_test_items(s));
	1 of 4 chunks of length 2 appear in no grammatical items

levnames = 

    'LowBigramStr'
    'HighBigramStr'


tgts =

    0.4000    0.5714

 Choosing training item 1....
 Choosing training item 2....
 Choosing training item 3....
 Choosing training item 4....
 Updating potential items....
 Choosing test item 1 for each set... 1 2.
 Choosing training item 5....
 Updating potential items....
 Choosing test item 2 for each set... 1 2.
 Choosing training item 6....

Training items:
  Itm_num    Itm_name
       01        yxxy
       02      yxxxxy
       03         yxx
       04       yxxxy
       05    yxyxxxxy
       06    yxxxxyxy

Test items:
 Tset_num BigramStr_cat  Itm_num    Itm_name  BigramStr BigramStr_tgt
       01  LowBigramStr       01  xyyyxxxyyx      0.400         0.400
       01  LowBigramStr       02  xyyxxyyyxx      0.400         0.400
       02 HighBigramStr       01         xxx      0.582         0.571
       02 HighBigramStr       02   yxxxxxxxx      0.570         0.571