Prior stimuli

Compute characteristics of extant training and test sets, such as those from previous studies.

In addition to its ability to select sets of AGL training and test items that manipulate various characteristics, StimSelect also provides functions to evaluate the same characteristics of specified training and test sets. These functions require training and test items to be listed in character arrays or cell arrays of strings. Some of the functions also require a finite state grammar to be defined.

Contents

Importing a list of training items

To evaluate StimSelect characteristics of a specified AGL stimulus set, the training items must be listed as strings in a character array or a cell array. Often, the first step is to enter the set of training items into a text file (e.g. my_train.txt), with one item per line.

Once the file of training items has been created, the Import Wizard can be used to load the items into a corresponding variable in Matlab. Select Import Data from the File menu, browse to the appropriate file, and away you go. Sample data can be found in pb_train.txt, which is located in the Sample_Scripts directory of StimSelect.

Alternatively, the textscan() function can be used, as shown below.

Matlab provides many ways to load data from files or the clipboard. See the Data Import and Export chapter of the Matlab programming manual for details.

fid = fopen('pb_train.txt');
C = textscan(fid, '%s');
fclose(fid);
pb_train = C{1};

pb_train
pb_train = 

    'VJ'
    'VJTVJ'
    'VJTVTV'
    'VJTVX'
    'VJTVXJ'
    'VJTXVJ'
    'VT'
    'VXJJ'
    'VXJJJJ'
    'XVJTVJ'
    'XVT'
    'XVX'
    'XVXJ'
    'XVXJJ'
    'XVXJJJ'
    'XXVJ'
    'XXVT'
    'XXVXJ'
    'XXVXJJ'
    'XXXVT'
    'XXXVTV'
    'XXXVX'
    'XXXXVX'

Importing a list of test items

As with training items discussed above, a set of test items can be entered into a text file (e.g. my_test.txt), and the Import Wizard can be used to load the items into a Matlab variable. Sample test items can be found in pb_test.txt, which is located in the Sample_Scripts directory of StimSelect.

Alternatively, the textscan() function can be used, as shown below.

fid = fopen('pb_test.txt');
C = textscan(fid, '%s');
fclose(fid);
pb_test = C{1};

pb_test
pb_test = 

    'JXVT'
    'TVJ'
    'VJJXVT'
    'VJTV'
    'VJTVT'
    'VJTXVX'
    'VTV'
    'VTVJ'
    'VTVJJ'
    'VX'
    'VXJ'
    'VXJJX'
    'VXJTJ'
    'VXVJ'
    'XJJ'
    'XVJTVT'
    'XVJTVX'
    'XVTV'
    'XVTVJ'
    'XVTVJJ'
    'XVXT'
    'XVXV'
    'XVXVJ'
    'XXJJ'
    'XXTX'
    'XXV'
    'XXVJJJ'
    'XXVTV'
    'XXVTVJ'
    'XXVVJJ'
    'XXVXJ'

Remove trailing blanks if necessary

If the strings containing the training and test items include leading or trailing blanks, they should be removed using the strtrim() function.

pb_train = strtrim(pb_train);
pb_test = strtrim(pb_test);

Compute frequency of chunks in training items

The TRAINSET_CHFREQ function computes the frequencies of training item chunks (sub-strings). See help trainset_chfreq for details.

Q = TRAINSET_CHFREQ(train_items, max_size) returns the frequencies for chunks of size 1 to MAX_SIZE. The return value is cell array. Each Q{i} contains a matrix of frequency counts for chunks of size i.

[Q, S, CX] = TRAINSET_CHFREQ(...) also returns a string S listing the symbols from which the training items are composed, and a cell array CX in which each cell CX{i} is a cell array of strings listing the possible chunks of length i.

max_chsize = 3; % trigrams are big enough
[chfreq, syms, chunks] = trainset_chfreq(pb_train, max_chsize);

disp(' ');
disp(' chunk freq');
for sz = 1:max_chsize,
    if sz <= 2,
        ch = chunks{sz};
        fq = chfreq{sz};
        for i = 1:numel(fq),
            fprintf(1, '   %-2s   %2d\n', ch(i,:), fq(i));
        end
    else
        disp('Larger chunks computed but not shown in this example'),
    end
end
 
 chunk freq
   J    27
   T    12
   V    31
   X    40
   JJ    8
   TJ    0
   VJ   11
   XJ    8
   JT    6
   TT    0
   VT    6
   XT    0
   JV    0
   TV    7
   VV    0
   XV   15
   JX    0
   TX    1
   VX   12
   XX   13
Larger chunks computed but not shown in this example

Identify G and NG items

Test items can be identified as grammatical (G) or non-grammatical (NG) by attempting to parse them with a specified finite state grammar.

ISGRAMMATICAL(fsg, test_items) returns a vector of 1's and 0's indicating which of the specified test items are G (1) and which are NG (0), relative to the specified grammar (FSG).

See StimSelect_summary.html for information on how to specify a grammar.

The examples below use the grammar from Knowlton & Squire (1996, Exp. 1), which is pre-defined in a function included with StimSelect.

f = knowlton_squire_grammar();
isg = isgrammatical(f, pb_test);

disp(' ');
disp('item       gram');
for i = 1:length(pb_test),
    fprintf(1, '%-10s ', pb_test{i});
    if isg(i)
        fprintf(1, 'G');
    else
        fprintf(1, 'NG');
    end
    fprintf(1, '\n');
end
 
item       gram
JXVT       NG
TVJ        NG
VJJXVT     NG
VJTV       NG
VJTVT      G
VJTXVX     G
VTV        G
VTVJ       G
VTVJJ      G
VX         G
VXJ        G
VXJJX      NG
VXJTJ      NG
VXVJ       NG
XJJ        NG
XVJTVT     G
XVJTVX     G
XVTV       G
XVTVJ      G
XVTVJJ     G
XVXT       NG
XVXV       NG
XVXVJ      NG
XXJJ       NG
XXTX       NG
XXV        NG
XXVJJJ     NG
XXVTV      G
XXVTVJ     G
XXVVJJ     NG
XXVXJ      G

Determine similarity between test and training items

The TESTSET_SIM function computes whole-item similarities between test and training items.

TESTSET_SIM(test_items, train_items) returns the average similarity of each test item to the whole set of training items. See help testset_sim for more options.

sim = testset_sim(pb_test, pb_train);

disp(' ');
disp('item        sim');
for i = 1:length(pb_test),
    fprintf(1, '%-10s  %4.3f\n', pb_test{i}, sim(i));
end
 
item        sim
JXVT        0.459
TVJ         0.427
VJJXVT      0.440
VJTV        0.457
VJTVT       0.431
VJTXVX      0.489
VTV         0.401
VTVJ        0.476
VTVJJ       0.448
VX          0.410
VXJ         0.487
VXJJX       0.454
VXJTJ       0.517
VXVJ        0.541
XJJ         0.400
XVJTVT      0.499
XVJTVX      0.524
XVTV        0.485
XVTVJ       0.546
XVTVJJ      0.515
XVXT        0.514
XVXV        0.518
XVXVJ       0.582
XXJJ        0.459
XXTX        0.417
XXV         0.453
XXVJJJ      0.499
XXVTV       0.497
XXVTVJ      0.550
XXVVJJ      0.489
XXVXJ       0.566

Compute novel chunks in test items

The TESTSET_CHNOV function computes chunk novelty for a set of test items relative to specified training items.

TESTSET_CHNOV(test_items, train_items, sz) returns the number of chunks of length SZ in each test item that do not appear in any of the specified training items.

chsize = 2;
chnov = testset_chnov(pb_test, pb_train, chsize);

disp(' ');
disp('item        chnov');
for i = 1:length(pb_test),
    fprintf(1, '%-10s  %d\n', pb_test{i}, chnov(i));
end
 
item        chnov
JXVT        1
TVJ         0
VJJXVT      1
VJTV        0
VJTVT       0
VJTXVX      0
VTV         0
VTVJ        0
VTVJJ       0
VX          0
VXJ         0
VXJJX       1
VXJTJ       1
VXVJ        0
XJJ         0
XVJTVT      0
XVJTVX      0
XVTV        0
XVTVJ       0
XVTVJJ      0
XVXT        1
XVXV        0
XVXVJ       0
XXJJ        0
XXTX        1
XXV         0
XXVJJJ      0
XXVTV       0
XXVTVJ      0
XXVVJJ      1
XXVXJ       0

Compute chunk strength

The TESTSET_CHSTR function computes chunk strength for a set of test items relative to specified training items. It is ordinarily desirable to specify a particular grammar so that chunk strength can be normalized by the number of grammatical chunks.

TESTSET_CHSTR(test_items, train_items, fsg, sz) returns the chunk strength for each test item, considering chunks of length 2 to SZ. Chunk strength is normalized by the number of chunks allowed by grammar FSG.

max_chsize = 3;
f = knowlton_squire_grammar();
chstr = testset_chstr(pb_test, pb_train, f, max_chsize);

disp(' ');
disp('item        chstr');
for i = 1:length(pb_test),
    fprintf(1, '%-10s  %4.3f\n', pb_test{i}, chstr(i));
end
 
item        chstr
JXVT        0.403
TVJ         0.434
VJJXVT      0.349
VJTV        0.543
VJTVT       0.496
VJTXVX      0.519
VTV         0.394
VTVJ        0.420
VTVJJ       0.376
VX          0.540
VXJ         0.614
VXJJX       0.498
VXJTJ       0.428
VXVJ        0.445
XJJ         0.532
XVJTVT      0.511
XVJTVX      0.535
XVTV        0.488
XVTVJ       0.479
XVTVJJ      0.440
XVXT        0.519
XVXV        0.573
XVXVJ       0.556
XXJJ        0.470
XXTX        0.175
XXV         0.655
XXVJJJ      0.527
XXVTV       0.554
XXVTVJ      0.537
XXVVJJ      0.440
XXVXJ       0.636

Compute rule strength

The TESTSET_RULESTR function computes rule strength for a set of test items relative to specified training items.

TESTSET_RULESTR(test_items, train_items, [], sz) returns the rule strength of each test item based on the fraction of chunks of length 2 to SZ in the test item that also appear in at least one training item.

max_chsize = 3;
f = knowlton_squire_grammar();
rulestr = testset_rulestr(pb_test, pb_train, [], max_chsize);

disp(' ');
disp('item        rulestr');
for i = 1:length(pb_test),
    fprintf(1, '%-10s  %4.3f\n', pb_test{i}, rulestr(i));
end
 
item        rulestr
JXVT        0.600
TVJ         1.000
VJJXVT      0.556
VJTV        1.000
VJTVT       1.000
VJTXVX      1.000
VTV         1.000
VTVJ        1.000
VTVJJ       0.857
VX          1.000
VXJ         1.000
VXJJX       0.714
VXJTJ       0.571
VXVJ        0.800
XJJ         1.000
XVJTVT      1.000
XVJTVX      1.000
XVTV        1.000
XVTVJ       1.000
XVTVJJ      0.889
XVXT        0.600
XVXV        0.800
XVXVJ       0.857
XXJJ        0.800
XXTX        0.400
XXV         1.000
XXVJJJ      0.889
XXVTV       1.000
XXVTVJ      1.000
XXVVJJ      0.556
XXVXJ       1.000