Using AGL StimSelect

There are 6 basic steps in using StimSelect to identify training and test items for an AGL study:

The five steps are illustrated by the five groups of expressions below. The first three expressions define a finite state grammar that generates a language of letter strings composed of Xs and Ys. The next expression creates an AGLSS object based on the grammar, and focusing on strings that are four to seven letters in length. The next two expressions define factors to control the grammaticality of test items (grammatical or non-grammatical), and their degree of similarity to training items (low or high similarity). These factors are combined factorially by the next expression, to identify four combinations (low and high similarity grammatical items, and low and high similarity non-grammatical items). The next expression runs a constraint satisfaction process to identify an appropriate set of 12 training items plus a set of three test items for each combination of the two factors. Six training items are chosen first (more or less at random), before selection of test items begins. The final two expressions display the training and test items identified by the constraint satisfaction process.

 g = fsg([], 'xy', 2);
 [nil, g] = link(g, 1, 'xy.', [1 2 -1]);
 [nil, g] = link(g, 2, 'x.', [1 -1]);
 s = aglss(g, [4 7]);
 s = gram_factor(s, 'gram');
 s = sim_factor(s, 'Sim');
 s = factorial_testsets(s, {'gram', 'G', 'NG'}, ...
        {'Sim', 'LowSim', 'HighSim'});
 s = choose_items(s, 12, 3, 6);
 format_train_items(s)
 format_test_items(s)

Each step is described in more detail below.

Contents

Defining a finite state grammar

A finite state grammar can be defined by creating an empty FSG object with the necessary number of states, and adding links from one state to another.

To create an empty FSG object, call the FSG class constructor method. The call

 g = fsg([], 'letters', n_states);

creates an FSG object for a finite state grammar involving n_states states and the specified set of letters. The FSG object is empty at this stage in the sense that no state transitions are defined. When state transitions are added, each must be labelled with one of the letters in the specified set.

Transitions between states are added by calling the link method. For example, the expression

 [nil, g] = link(g, fromState, 'c', toState);

adds a transition from state fromState to state toState, labelled with the specified character. States are numbered from 1 to n_states, with state 1 being the start state. The label for the state transition must be chosen from the set of allowable letters declared when the FSG object was first created.

End states are declared by adding a placeholder link to state -1, labelled as '' or '.'.

 [nil, g] = link(g, endState, '.', -1);

It is often convenient to specify all the transitions out of a particular state with a single call to the link method. The expression

 [nil, g] = link(g, fromState, 'letters', [s1 s2 ... sN]);

adds transitions from state fromState to each of states s1 to sN, labelled with the corresponding letters.

The expressions below define a finite state grammar for a language based on the letters 'x' and 'y'. The grammar has two states. The start state (state 1) either accepts 'x' and stays in the same state, or 'y' and transitions to state 2. State 2 accepts 'x' and transitions to state 1. Both states can terminate.

 g = fsg([], 'xy', 2);
 [nil, g] = link(g, 1, 'xy.', [1 2 -1]);
 [nil, g] = link(g, 2, 'x.', [1 -1]);

A list of letter strings produced by a grammar can be obtained by calling the grammatical_strings method. The expression

 items = grammatical_strings(g, [minlen maxlen]);

returns all the strings of length minlen to maxlen produced by grammar g.

The grammaticality of strings can be tested by calling the isgrammatical method, with a character array or a cell array of strings:

 isG = isgrammatical(g, items);

A list of grammatical and non-grammatical items can be generated by calling the generate_sample_strings method. The call

 [items, isG] = generate_sample_strings(g, ...
                     [minLen maxLen], [maxG maxNG]);

returns cell array of strings items of length minLen to maxLen, including up to maxG grammatical items and maxNG non-grammatical items. The logical vector isG indicates which corresponding items are grammatical.

Instead of specifying maxG and maxNG, a single overall total number of items can be specified.

generate_sample_strings reports the following information:

The example below uses the grammar defined in pothos_bailey_grammar.m:

[items, isG] = generate_sample_strings(pothos_bailey_grammar(), [4 8], 10000);
 Potential items:
	Grammar involves 4 symbols (JTVX)
	87296 possible strings of length 4-8
		108 grammatical strings ( 0.12%)
		87188 ungrammatical strings (99.88%)
	Using all 108 grammatical strings
	Using sample of 9892 ungrammatical strings
 

Creating a StimSelect AGLSS object

Stimulus selection with StimSelect is controlled via an AGLSS object (for AGL StimSelect). The class constructor call

 s = aglss(grammar, [minLen maxLen], maxItems);

specifies an AGL object that uses the specified grammar to distinguish grammatical from non-grammatical letter strings. The other parameters are passed to generate_sample_strings as described above. If maxItems is not specified, StimSelect will consider at most 10,000 strings.

Alternatively, potential G and NG items can be specified explicitly to aglss instead of using a grammar. This gives the user control over NG items, rather than allowing them to be any old string of symbols, so StimSelect can work with the full set of potential items of interest rather than working with a random sample from a large set of unconstrained NG symbol strings.

The example below uses a toy grammar (xx_grammar.m) to generate potential items, and uses a more conservative grammar (xy_grammar.m) to determine which ones are to be treated as grammatical and which ones non-grammatical for purposes of stimulus selection.

items = grammatical_strings(xx_grammar(), [3 4]);
isG = isgrammatical(xy_grammar(), items);
s = aglss(items, isG);

disp(' ')
disp('Grammatical items')
disp(items(isG,:))
disp(' ')
disp('Non-grammatical items')
disp(items(~isG,:))
 
Grammatical items
xyx 
xyxy
yxy 
yxyx
 
Non-grammatical items
xxx 
xxxx
xxxy
xxy 
xxyx
xyxx
yxx 
yxxx
yxxy

Defining factors to control test item characteristics

StimSelect includes methods to define the following types of factors:

These methods are illustrated in detail by files in the Sample_Scripts directory (or see the formatted examples in Sample_Scripts/HTML).

Expert users can create their own factors, but this requires a thorough understanding of how StimSelect works.

Specifying combinations of factors

The method factorial_testsets specifies how levels of various factors combine to create different sets of test items.

To combine any number of factors factorially (all possible combinations), each factor is named as the first item in a cell array, along with the names of the levels of that factor that are to be used. The example below defines two levels of similarity and three levels of chunk strength and combines them factorially.

 s = aglss(pothos_bailey_grammar(), [4 8]);
 s = sim_factor(s, 'Sim');
 s = chstr_factor(s, 'ChStr', [1 2 3]);
 s = factorial_testsets(s, {'Sim', 'LowSim', 'HighSim'}, ...
         {'ChStr', 'ChStr1', 'ChStr2', 'ChStr3'});

As an alternative to complete factorial combinations, customized combinations of factor levels can be specified by putting parallel lists of corresponding factor levels inside an enclosing cell array. The example below combines low chunk strength with both low and high values of chunk novelty, while combining high chunk strength only with low chunk novelty. The dubious combination of high chunk strength and high chunk novelty is left out.

 s = aglss(pothos_bailey_grammar(), [4 8]);
 s = chstr_factor(s, 'ChStr');
 s = chnov_factor(s, 'ChNov');
 s = factorial_testsets(s, ...
     {{'ChStr', 'LowChStr', 'LowChStr', 'HighChStr'}, ...
      {'ChNov', 'LowChNov', 'HighChNov', 'LowChNov'}} );

The two methods of specifying combinations can be used together. The example below factorially combines grammaticality with the three combinations of chunk strength and chunk novelty described above.

 s = aglss(pothos_bailey_grammar(), [4 8]);
 s = gram_factor(s, 'gram');
 s = chstr_factor(s, 'ChStr');
 s = chnov_factor(s, 'ChNov');
 s = factorial_testsets(s, {'gram', 'G', 'NG'}, ...
     {{'ChStr', 'LowChStr', 'LowChStr', 'HighChStr'}, ...
      {'ChNov', 'LowChNov', 'HighChNov', 'LowChNov'}} );

Identifying training and test items

The method choose_items initializes and runs a constraint satisfaction process to identify appropriate training and test items based on the specifications of an AGLSS object. The expression

 s = choose_items(s, nTrain, nTest, headStart, allowFamiliar);

uses the specification in AGLSS object s to identify nTrain training items plus nTest test items for each combination of factor levels.

The parameter headStart determines how many training items are chosen before selection of test items begins. After this "head start", stimulus selection proceeds by identifying a training item then one test item for each combination of factor levels, then another training item, and so on.

The optional parameter allowFamiliar specifies whether training items are eligible to also be test items. The default value is false, ensuring that all test items are novel. The value true can be used in conjunction with the familiarity factor fam_factor.

The method choose_more_items adds additional training and/or test items, as desired. This is especially useful if it is necessary to choose at least two new training items for every test item (e.g. if there are at least two test sets that require familiar items). choose|more_items takes the same arguments as choose_items.

Displaying the selected training and test items.

The methods format_train_items and format_test_items return formatted strings describing the training and test items stored within an AGLSS object.

The method format_test_items allows an optional second argument, which can be either 'detail' or 'summary'. If this argument is omitted or specified as 'detail', the formatted string returned by format_test_items lists individual test items along with their actual and target values for each factor. If 'summary' is specified, the formatted string just lists the average factor values for each set of test items (that is, each combination of factor levels), along with the target values for comparison.

These methods are illustrated below.

s_xxx = aglss(xxx_grammar, [3 10]);
s = sim_factor(s_xxx, 'Sim');
s = factorial_testsets(s, {'Sim', 'LowSim', 'HighSim'});
s = choose_items(s, 6, 2);

disp('Training items:');
disp(format_train_items(s));

disp('Test items:');
disp(format_test_items(s));

disp('Test item summary:');
disp(format_test_items(s, 'summary'));
 Potential items:
	Grammar involves 2 symbols (xy)
	2040 possible strings of length 3-10
		180 grammatical strings ( 8.82%)
		1860 ungrammatical strings (91.18%)
	Using all 180 grammatical strings
	Using all 1860 ungrammatical strings
 
 Choosing training item 1....
 Choosing training item 2....
 Choosing training item 3....
 Choosing training item 4....
 Updating potential items....
 Choosing test item 1 for each set... 2 1.
 Choosing training item 5....
 Updating potential items....
 Choosing test item 2 for each set... 1 2.
 Choosing training item 6....

Training items:
  Itm_num    Itm_name
       01    xxyxyxxy
       02   xxxxyxyyx
       03  yxyyxyxxxx
       04      xyxyyx
       05      xyxyxy
       06  xyxyyxyxxx

Test items:
 Tset_num       Sim_cat  Itm_num    Itm_name        Sim    Sim_tgt
       01        LowSim       01  xyyyyyyxxy      0.585      0.588
       01        LowSim       02   yxyxxyyyy      0.584      0.588
       02       HighSim       01     xxyxyxx      0.757      0.760
       02       HighSim       02   xxyxyxyxx      0.762      0.760

Test item summary:
 Tset_num       Sim_cat        Sim    Sim_tgt
       01        LowSim      0.585      0.588
       02       HighSim      0.759      0.760