Using AGL StimSelect
There are 6 basic steps in using StimSelect to identify training and test items for an AGL study:
- Define an FSG object for the finite state grammar that generates the grammatical letter strings of interest.
- Create a StimSelect AGLSS object based on the finite state grammar.
- Define factors to control characteristics of the test items like grammaticality, chunk strength, or similarity to training items.
- Specify how levels of the various factors combine to create different sets of test items.
- Identify training and test items based on the specified combinations of factor levels.
- Display the selected training and test items.
The five steps are illustrated by the five groups of expressions below. The first three expressions define a finite state grammar that generates a language of letter strings composed of Xs and Ys. The next expression creates an AGLSS object based on the grammar, and focusing on strings that are four to seven letters in length. The next two expressions define factors to control the grammaticality of test items (grammatical or non-grammatical), and their degree of similarity to training items (low or high similarity). These factors are combined factorially by the next expression, to identify four combinations (low and high similarity grammatical items, and low and high similarity non-grammatical items). The next expression runs a constraint satisfaction process to identify an appropriate set of 12 training items plus a set of three test items for each combination of the two factors. Six training items are chosen first (more or less at random), before selection of test items begins. The final two expressions display the training and test items identified by the constraint satisfaction process.
g = fsg([], 'xy', 2); [nil, g] = link(g, 1, 'xy.', [1 2 -1]); [nil, g] = link(g, 2, 'x.', [1 -1]);
s = aglss(g, [4 7]);
s = gram_factor(s, 'gram'); s = sim_factor(s, 'Sim');
s = factorial_testsets(s, {'gram', 'G', 'NG'}, ... {'Sim', 'LowSim', 'HighSim'});
s = choose_items(s, 12, 3, 6);
format_train_items(s) format_test_items(s)
Each step is described in more detail below.
Contents
Defining a finite state grammar
A finite state grammar can be defined by creating an empty FSG object with the necessary number of states, and adding links from one state to another.
To create an empty FSG object, call the FSG class constructor method. The call
g = fsg([], 'letters', n_states);
creates an FSG object for a finite state grammar involving n_states states and the specified set of letters. The FSG object is empty at this stage in the sense that no state transitions are defined. When state transitions are added, each must be labelled with one of the letters in the specified set.
Transitions between states are added by calling the link method. For example, the expression
[nil, g] = link(g, fromState, 'c', toState);
adds a transition from state fromState to state toState, labelled with the specified character. States are numbered from 1 to n_states, with state 1 being the start state. The label for the state transition must be chosen from the set of allowable letters declared when the FSG object was first created.
End states are declared by adding a placeholder link to state -1, labelled as '' or '.'.
[nil, g] = link(g, endState, '.', -1);
It is often convenient to specify all the transitions out of a particular state with a single call to the link method. The expression
[nil, g] = link(g, fromState, 'letters', [s1 s2 ... sN]);
adds transitions from state fromState to each of states s1 to sN, labelled with the corresponding letters.
The expressions below define a finite state grammar for a language based on the letters 'x' and 'y'. The grammar has two states. The start state (state 1) either accepts 'x' and stays in the same state, or 'y' and transitions to state 2. State 2 accepts 'x' and transitions to state 1. Both states can terminate.
g = fsg([], 'xy', 2); [nil, g] = link(g, 1, 'xy.', [1 2 -1]); [nil, g] = link(g, 2, 'x.', [1 -1]);
A list of letter strings produced by a grammar can be obtained by calling the grammatical_strings method. The expression
items = grammatical_strings(g, [minlen maxlen]);
returns all the strings of length minlen to maxlen produced by grammar g.
The grammaticality of strings can be tested by calling the isgrammatical method, with a character array or a cell array of strings:
isG = isgrammatical(g, items);
A list of grammatical and non-grammatical items can be generated by calling the generate_sample_strings method. The call
[items, isG] = generate_sample_strings(g, ... [minLen maxLen], [maxG maxNG]);
returns cell array of strings items of length minLen to maxLen, including up to maxG grammatical items and maxNG non-grammatical items. The logical vector isG indicates which corresponding items are grammatical.
Instead of specifying maxG and maxNG, a single overall total number of items can be specified.
generate_sample_strings reports the following information:
- the letters used by the grammar,
- the number of ways these letters can be combined within the specified length limits,
- the number of those strings that are grammatical (in absolute terms and as a percentage),
- the number that are non-grammatical,
- the number of grammatical strings to be considered for stimulus selection (a random subset is chosen if necessary to restrict the total number of strings considered to maxItems), and
- the number of non-grammatical strings to be considered.
The example below uses the grammar defined in pothos_bailey_grammar.m:
[items, isG] = generate_sample_strings(pothos_bailey_grammar(), [4 8], 10000);
Potential items: Grammar involves 4 symbols (JTVX) 87296 possible strings of length 4-8 108 grammatical strings ( 0.12%) 87188 ungrammatical strings (99.88%) Using all 108 grammatical strings Using sample of 9892 ungrammatical strings
Creating a StimSelect AGLSS object
Stimulus selection with StimSelect is controlled via an AGLSS object (for AGL StimSelect). The class constructor call
s = aglss(grammar, [minLen maxLen], maxItems);
specifies an AGL object that uses the specified grammar to distinguish grammatical from non-grammatical letter strings. The other parameters are passed to generate_sample_strings as described above. If maxItems is not specified, StimSelect will consider at most 10,000 strings.
Alternatively, potential G and NG items can be specified explicitly to aglss instead of using a grammar. This gives the user control over NG items, rather than allowing them to be any old string of symbols, so StimSelect can work with the full set of potential items of interest rather than working with a random sample from a large set of unconstrained NG symbol strings.
The example below uses a toy grammar (xx_grammar.m) to generate potential items, and uses a more conservative grammar (xy_grammar.m) to determine which ones are to be treated as grammatical and which ones non-grammatical for purposes of stimulus selection.
items = grammatical_strings(xx_grammar(), [3 4]); isG = isgrammatical(xy_grammar(), items); s = aglss(items, isG); disp(' ') disp('Grammatical items') disp(items(isG,:)) disp(' ') disp('Non-grammatical items') disp(items(~isG,:))
Grammatical items xyx xyxy yxy yxyx Non-grammatical items xxx xxxx xxxy xxy xxyx xyxx yxx yxxx yxxy
Defining factors to control test item characteristics
StimSelect includes methods to define the following types of factors:
- gram_factor defines a grammaticality factor to distinguish between grammatical and non-grammatical items. See gram_factor_sample.html for more details.
- chstr_factor defines a chunk strength factor that measures the frequency with which test item bigrams, trigrams, etc., appear in training items. See chstr_factor_sample.html for more details.
- chnov_factor defines a chunk novelty factor that measures how many of a test item's bigrams, etc., were not present in any training items. See chnov_factor_sample.html for more details.
- rulestr_factor defines a rule strength factor that measures the fraction of a test item's bigrams, etc., that were present in training items. See rulestr_factor_sample.html for more details.
- sim_factor defines a similarity factor that measures the whole-item similarity between test and training items. See sim_factor_sample.html for more details.
- fam_factor defines a familiarity factor that distinguishes between familiar (Old) and novel (New) test items. See fam_factor_sample.html for more details.
These methods are illustrated in detail by files in the Sample_Scripts directory (or see the formatted examples in Sample_Scripts/HTML).
Expert users can create their own factors, but this requires a thorough understanding of how StimSelect works.
Specifying combinations of factors
The method factorial_testsets specifies how levels of various factors combine to create different sets of test items.
To combine any number of factors factorially (all possible combinations), each factor is named as the first item in a cell array, along with the names of the levels of that factor that are to be used. The example below defines two levels of similarity and three levels of chunk strength and combines them factorially.
s = aglss(pothos_bailey_grammar(), [4 8]); s = sim_factor(s, 'Sim'); s = chstr_factor(s, 'ChStr', [1 2 3]); s = factorial_testsets(s, {'Sim', 'LowSim', 'HighSim'}, ... {'ChStr', 'ChStr1', 'ChStr2', 'ChStr3'});
As an alternative to complete factorial combinations, customized combinations of factor levels can be specified by putting parallel lists of corresponding factor levels inside an enclosing cell array. The example below combines low chunk strength with both low and high values of chunk novelty, while combining high chunk strength only with low chunk novelty. The dubious combination of high chunk strength and high chunk novelty is left out.
s = aglss(pothos_bailey_grammar(), [4 8]); s = chstr_factor(s, 'ChStr'); s = chnov_factor(s, 'ChNov'); s = factorial_testsets(s, ... {{'ChStr', 'LowChStr', 'LowChStr', 'HighChStr'}, ... {'ChNov', 'LowChNov', 'HighChNov', 'LowChNov'}} );
The two methods of specifying combinations can be used together. The example below factorially combines grammaticality with the three combinations of chunk strength and chunk novelty described above.
s = aglss(pothos_bailey_grammar(), [4 8]); s = gram_factor(s, 'gram'); s = chstr_factor(s, 'ChStr'); s = chnov_factor(s, 'ChNov'); s = factorial_testsets(s, {'gram', 'G', 'NG'}, ... {{'ChStr', 'LowChStr', 'LowChStr', 'HighChStr'}, ... {'ChNov', 'LowChNov', 'HighChNov', 'LowChNov'}} );
Identifying training and test items
The method choose_items initializes and runs a constraint satisfaction process to identify appropriate training and test items based on the specifications of an AGLSS object. The expression
s = choose_items(s, nTrain, nTest, headStart, allowFamiliar);
uses the specification in AGLSS object s to identify nTrain training items plus nTest test items for each combination of factor levels.
The parameter headStart determines how many training items are chosen before selection of test items begins. After this "head start", stimulus selection proceeds by identifying a training item then one test item for each combination of factor levels, then another training item, and so on.
The optional parameter allowFamiliar specifies whether training items are eligible to also be test items. The default value is false, ensuring that all test items are novel. The value true can be used in conjunction with the familiarity factor fam_factor.
The method choose_more_items adds additional training and/or test items, as desired. This is especially useful if it is necessary to choose at least two new training items for every test item (e.g. if there are at least two test sets that require familiar items). choose|more_items takes the same arguments as choose_items.
Displaying the selected training and test items.
The methods format_train_items and format_test_items return formatted strings describing the training and test items stored within an AGLSS object.
The method format_test_items allows an optional second argument, which can be either 'detail' or 'summary'. If this argument is omitted or specified as 'detail', the formatted string returned by format_test_items lists individual test items along with their actual and target values for each factor. If 'summary' is specified, the formatted string just lists the average factor values for each set of test items (that is, each combination of factor levels), along with the target values for comparison.
These methods are illustrated below.
s_xxx = aglss(xxx_grammar, [3 10]); s = sim_factor(s_xxx, 'Sim'); s = factorial_testsets(s, {'Sim', 'LowSim', 'HighSim'}); s = choose_items(s, 6, 2); disp('Training items:'); disp(format_train_items(s)); disp('Test items:'); disp(format_test_items(s)); disp('Test item summary:'); disp(format_test_items(s, 'summary'));
Potential items: Grammar involves 2 symbols (xy) 2040 possible strings of length 3-10 180 grammatical strings ( 8.82%) 1860 ungrammatical strings (91.18%) Using all 180 grammatical strings Using all 1860 ungrammatical strings Choosing training item 1.... Choosing training item 2.... Choosing training item 3.... Choosing training item 4.... Updating potential items.... Choosing test item 1 for each set... 2 1. Choosing training item 5.... Updating potential items.... Choosing test item 2 for each set... 1 2. Choosing training item 6.... Training items: Itm_num Itm_name 01 xxyxyxxy 02 xxxxyxyyx 03 yxyyxyxxxx 04 xyxyyx 05 xyxyxy 06 xyxyyxyxxx Test items: Tset_num Sim_cat Itm_num Itm_name Sim Sim_tgt 01 LowSim 01 xyyyyyyxxy 0.585 0.588 01 LowSim 02 yxyxxyyyy 0.584 0.588 02 HighSim 01 xxyxyxx 0.757 0.760 02 HighSim 02 xxyxyxyxx 0.762 0.760 Test item summary: Tset_num Sim_cat Sim Sim_tgt 01 LowSim 0.585 0.588 02 HighSim 0.759 0.760