CHNOV_FACTOR

Define a chunk novelty factor to manipulate how many of a test item's bigrams, etc., were not present in any training items.

Contents

Basic chunk novelty factor

CHNOV_FACTOR(aglss, 'fname') defines target values for low and high chunk novelty test items, containing 0 or 1 novel bigrams, respectively. The selection of training items will avoid some bigrams that would otherwise be allowed by the grammar.

Chunk novelty is the number of a test item's bigrams (or chunks of a different size, if specified) that do not appear in training items.

The example below generates training items based on the XXX_GRAMMAR. Test items are then generated that have either low or high novelty relative to bigrams in the training items.

The first output variable returned by CHNOV_FACTOR is an AGLSS object, updated to reflect the chunk novelty factor. The second output variable is a cell array of strings naming the different levels of chunk novelty defined. Actual chunk novelty target values are returned as the third output variable.

Only two of the four possible bigrams occur in the training items, even though they would be allowed by the grammar. By default, CHNOV_FACTOR excludes two bigrams from training so that they can be used as novel bigrams in test items.

The display of test items lists chunk novelty scores for unigrams, bigrams, and trigrams, along with the target values. Since targets are specified here only for bigrams, the target values for unigrams and trigrams are listed as 'NaN' (for "not a number").

s_xxx = aglss(xxx_grammar, [3 10]);

[s, levnames, tgts] = chnov_factor(s_xxx, 'ChNov');

levnames

tgts

s = factorial_testsets(s, {'ChNov', levnames{:}});
s = choose_items(s, 6, 2);

disp('Training items:');
disp(format_train_items(s));

disp('Test items:');
disp(format_test_items(s));
 Potential items:
	Grammar involves 2 symbols (xy)
	2040 possible strings of length 3-10
		180 grammatical strings ( 8.82%)
		1860 ungrammatical strings (91.18%)
	Using all 180 grammatical strings
	Using all 1860 ungrammatical strings
 
	1 of 8 chunks of length 3 appear in no grammatical items

levnames = 

    'LowChNov'
    'HighChNov'


tgts =

     0     1

 Choosing training item 1....
 Choosing training item 2....
 Choosing training item 3....
 Choosing training item 4....
 Updating potential items....
 Choosing test item 1 for each set... 1 2.
 Choosing training item 5....
 Updating potential items....
 Choosing test item 2 for each set... 2 1.
 Choosing training item 6....

Training items:
  Itm_num    Itm_name
       01        xyxy
       02  xyxyxyxyxy
       03         xyx
       04         yxy
       05    xyxyxyxy
       06      xyxyxy

Test items:
 Tset_num     ChNov_cat  Itm_num    Itm_name    ChNov_1    ChNov_2    ChNov_3 ChNov_tgt_1 ChNov_tgt_2 ChNov_tgt_3
       01      LowChNov       01       yxyxy      0.000      0.000      0.000         NaN       0.000         NaN
       01      LowChNov       02        yxyx      0.000      0.000      0.000         NaN       0.000         NaN
       02     HighChNov       01      xyyxyx      0.000      1.000      2.000         NaN       1.000         NaN
       02     HighChNov       02    xxyxyxyx      0.000      1.000      1.000         NaN       1.000         NaN

Chunk novelty categories

CHNOV_FACTOR(aglss, 'fname', [z1 z2 ... zN]), specifies chunk novelty values for N categories. Test items in category I will have Z(I) novel bigrams (or chunks of a different size, if specified).

The example below builds on the AGLSS object S_XXX created for the previous example. Three levels of chunk novelty are defined, for test items containing 0, 1, or 2 novel bigrams.

[s, levnames, tgts] = chnov_factor(s_xxx, 'MyChunkNov', [0 1 2]);

levnames

tgts

s = factorial_testsets(s, {'MyChunkNov', levnames{:}});
s = choose_items(s, 6, 2);

disp('Training items:');
disp(format_train_items(s));

disp('Test items:');
disp(format_test_items(s));
	1 of 8 chunks of length 3 appear in no grammatical items

levnames = 

    'MyChunkNov1'
    'MyChunkNov2'
    'MyChunkNov3'


tgts =

     0     1     2

 Choosing training item 1....
 Choosing training item 2....
 Choosing training item 3....
 Choosing training item 4....
 Updating potential items....
 Choosing test item 1 for each set... 2 1 3.
 Choosing training item 5....
 Updating potential items....
 Choosing test item 2 for each set... 2 3 1.
 Choosing training item 6....

Training items:
  Itm_num    Itm_name
       01        xyxy
       02  xyxyxyxyxy
       03         xyx
       04         yxy
       05    xyxyxyxy
       06      xyxyxy

Test items:
 Tset_num MyChunkNov_cat  Itm_num    Itm_name MyChunkNov_1 MyChunkNov_2 MyChunkNov_3 MyChunkNov_tgt_1 MyChunkNov_tgt_2 MyChunkNov_tgt_3
       01    MyChunkNov1       01       yxyxy        0.000        0.000        0.000              NaN            0.000              NaN
       01    MyChunkNov1       02        yxyx        0.000        0.000        0.000              NaN            0.000              NaN
       02    MyChunkNov2       01      xyyxyx        0.000        1.000        2.000              NaN            1.000              NaN
       02    MyChunkNov2       02    xxyxyxyx        0.000        1.000        1.000              NaN            1.000              NaN
       03    MyChunkNov3       01    yxxyyxyx        0.000        2.000        4.000              NaN            2.000              NaN
       03    MyChunkNov3       02  yxyxyyxyyx        0.000        2.000        4.000              NaN            2.000              NaN

Naming chunk novelty categories

CHNOV_FACTOR(aglss, 'fname', T, {'name1', 'name2', ...}) specifies names for the different categories of chunk novelty, as an alternative to the default names otherwise assigned.

The example below builds on the AGLSS object S_XXX created for a previous example. The default levels of chunk novelty are named 'L' and 'H' (for Low and High). These names appear in the display of test items.

[s, levnames, tgts] = chnov_factor(s_xxx, 'ChNov', [], {'L', 'H'});

levnames

tgts

s = factorial_testsets(s, {'ChNov', levnames{:}});
s = choose_items(s, 6, 2);

disp('Training items:');
disp(format_train_items(s));

disp('Test items:');
disp(format_test_items(s));
	1 of 8 chunks of length 3 appear in no grammatical items

levnames = 

    'L'    'H'


tgts =

     0     1

 Choosing training item 1....
 Choosing training item 2....
 Choosing training item 3....
 Choosing training item 4....
 Updating potential items....
 Choosing test item 1 for each set... 2 1.
 Choosing training item 5....
 Updating potential items....
 Choosing test item 2 for each set... 1 2.
 Choosing training item 6....

Training items:
  Itm_num    Itm_name
       01        xyxy
       02  xyxyxyxyxy
       03         xyx
       04         yxy
       05    xyxyxyxy
       06      xyxyxy

Test items:
 Tset_num     ChNov_cat  Itm_num    Itm_name    ChNov_1    ChNov_2    ChNov_3 ChNov_tgt_1 ChNov_tgt_2 ChNov_tgt_3
       01             L       01       yxyxy      0.000      0.000      0.000         NaN       0.000         NaN
       01             L       02        yxyx      0.000      0.000      0.000         NaN       0.000         NaN
       02             H       01      xyyxyx      0.000      1.000      2.000         NaN       1.000         NaN
       02             H       02    xxyxyxyx      0.000      1.000      1.000         NaN       1.000         NaN

Reserving novel grammatical chunks

CHNOV_FACTOR(aglss, 'fname', T, NAMES, M) specifies the minimum number of grammatical bigrams to exclude from training items, over and above any ungrammatical bigrams. M determines the number of different bigrams (or chunks of a different length, if specified) that will be available to appear in test items as novel bigrams which did not appear in any training items. By default, M is 2.

The example below builds on the AGLSS object S_XXX created for a previous example. Here, chunk novelty is combined factorially with grammaticality to generate four sets of test items. The chunk novelty factor reserves a single grammatical bigram which is excluded from training items. This is the only novel bigram available to grammatical test items.

[s, levnames, tgts] = chnov_factor(s_xxx, 'ChNov', [], [], 1);

levnames

tgts

[s, glevs] = gram_factor(s, 'gram');
s = factorial_testsets(s, {'ChNov', levnames{:}}, {'gram', glevs{:}});
s = choose_items(s, 6, 2);

disp('Training items:');
disp(format_train_items(s));

disp('Test items:');
disp(format_test_items(s));
	1 of 8 chunks of length 3 appear in no grammatical items

levnames = 

    'LowChNov'
    'HighChNov'


tgts =

     0     1

 Choosing training item 1....
 Choosing training item 2....
 Choosing training item 3....
 Choosing training item 4....
 Updating potential items....
 Choosing test item 1 for each set... 3 2 4 1.
 Choosing training item 5....
 Updating potential items....
 Choosing test item 2 for each set... 2 3 4 1.
 Choosing training item 6....

Training items:
  Itm_num    Itm_name
       01        xyxy
       02         yxy
       03         xxy
       04     yxyxxxy
       05    xyxyxxxy
       06    yxyxxxxy

Test items:
 Tset_num     ChNov_cat      gram_cat  Itm_num    Itm_name    ChNov_1    ChNov_2    ChNov_3 ChNov_tgt_1 ChNov_tgt_2 ChNov_tgt_3
       01      LowChNov             G       01   xyxyxxxxx      0.000      0.000      0.000         NaN       0.000         NaN
       01      LowChNov             G       02     yxyxxxx      0.000      0.000      0.000         NaN       0.000         NaN
       02     HighChNov             G       01  yxyxxyxyyx      0.000      1.000      2.000         NaN       1.000         NaN
       02     HighChNov             G       02  yxyyxyxxxy      0.000      1.000      2.000         NaN       1.000         NaN
       03      LowChNov            NG       01     xxxyxxy      0.000      0.000      0.000         NaN       0.000         NaN
       03      LowChNov            NG       02   xyxxxyxxx      0.000      0.000      0.000         NaN       0.000         NaN
       04     HighChNov            NG       01    yxxyyxyx      0.000      1.000      2.000         NaN       1.000         NaN
       04     HighChNov            NG       02   xxxxyxxyy      0.000      1.000      1.000         NaN       1.000         NaN

Reserving novel chunks regardless of grammaticality

CHNOV_FACTOR(aglss, 'fname', T, NAMES, [N M]) specifies the minimum number of chunks (N) to exclude from training items whether those chunks are grammatical or not, in addition to the minimum number of grammatical chunks (M) to exclude. In particular, M can be set to 0 if grammaticality is not going to be used as a stimulus factor.

The example below builds on the AGLSS object S_XXX created for a previous example. Here, a single bigram is specifically excluded from training items, but it would not necessarily be a grammatical bigram. However, the XXX_GRAMMAR allows all eight of the possible bigrams, so in this case training items will have to avoid a grammatical bigram to leave a novel bigram available to items in the test set.

[s, levnames, tgts] = chnov_factor(s_xxx, 'ChNov', [], [], [1 0]);

levnames

tgts

s = factorial_testsets(s, {'ChNov', levnames{:}});
s = choose_items(s, 6, 2);

disp('Training items:');
disp(format_train_items(s));

disp('Test items:');
disp(format_test_items(s));
	1 of 8 chunks of length 3 appear in no grammatical items

levnames = 

    'LowChNov'
    'HighChNov'


tgts =

     0     1

 Choosing training item 1....
 Choosing training item 2....
 Choosing training item 3....
 Choosing training item 4....
 Updating potential items....
 Choosing test item 1 for each set... 2 1.
 Choosing training item 5....
 Updating potential items....
 Choosing test item 2 for each set... 2 1.
 Choosing training item 6....

Training items:
  Itm_num    Itm_name
       01        xyxy
       02         yxy
       03         xxy
       04     yxyxxxy
       05    xyxyxxxy
       06    yxyxxxxy

Test items:
 Tset_num     ChNov_cat  Itm_num    Itm_name    ChNov_1    ChNov_2    ChNov_3 ChNov_tgt_1 ChNov_tgt_2 ChNov_tgt_3
       01      LowChNov       01   xyxyxxxxx      0.000      0.000      0.000         NaN       0.000         NaN
       01      LowChNov       02     xxxyxxy      0.000      0.000      0.000         NaN       0.000         NaN
       02     HighChNov       01    yxxyyxyx      0.000      1.000      2.000         NaN       1.000         NaN
       02     HighChNov       02   xxxxyxxyy      0.000      1.000      1.000         NaN       1.000         NaN

Chunk size used to compute chunk novelty

CHNOV_FACTOR(aglss, 'fname', T, NAMES, NM, CHSIZE) specifies the length of chunks on which to compute chunk novelty.

The example below builds on the AGLSS object S_XXX created for a previous example. Here, chunk novelty is defined on trigrams rather than bigrams. A minimum of two trigrams (0 grammatical ones) are excluded from training items. The XXX_GRAMMAR leaves one non-grammatical trigram. That one plus one grammatical trigram will be excluded from training items to leave two novel trigrams available to items in the test set.

[s, levnames, tgts] = chnov_factor(s_xxx, 'TrigramNov', [], [], [2 0], 3);

levnames

tgts

s = factorial_testsets(s, {'TrigramNov', levnames{:}});
s = choose_items(s, 6, 2);

disp('Training items:');
disp(format_train_items(s));

disp('Test items:');
disp(format_test_items(s));
	1 of 8 chunks of length 3 appear in no grammatical items

levnames = 

    'LowTrigramNov'
    'HighTrigramNov'


tgts =

     0     1

 Choosing training item 1....
 Choosing training item 2....
 Choosing training item 3....
 Choosing training item 4....
 Updating potential items....
 Choosing test item 1 for each set... 2 1.
 Choosing training item 5....
 Updating potential items....
 Choosing test item 2 for each set... 1 2.
 Choosing training item 6....

Training items:
  Itm_num    Itm_name
       01  yxyxxxyxyy
       02        yxyy
       03     xxxyxyy
       04     yxyxxxy
       05    yxyxxxxy
       06      xxyxyy

Test items:
 Tset_num TrigramNov_cat  Itm_num    Itm_name TrigramNov_1 TrigramNov_2 TrigramNov_3 TrigramNov_tgt_1 TrigramNov_tgt_2 TrigramNov_tgt_3
       01  LowTrigramNov       01   xxxxyxxyy        0.000        0.000        0.000              NaN              NaN            0.000
       01  LowTrigramNov       02   xyxyxxxxx        0.000        0.000        0.000              NaN              NaN            0.000
       02 HighTrigramNov       01    yxxyyxyx        0.000        0.000        1.000              NaN              NaN            1.000
       02 HighTrigramNov       02   yxxxxxyyy        0.000        0.000        1.000              NaN              NaN            1.000