Blair - Interlanguage Parsing

PARSING AS ONE COMPONENT
OF THE L2 ENGLISH INTERLANGUAGE SYSTEM

R. Jeffrey Blair
jeffreyb@dpc.aichi-gakuin.ac.jp
Aichi Gakuin Junior College, Nagoya, Japan

This paper attempts to affirm the importance of sentential constituents and the parsing process itself to English language competence. It includes a brief review of various concepts of parsing and explores some of the most prominent theories of psycholinguistic parsing to arrive at a general framework that generates some interesting hypotheses to be investigated. It further proposes the investigation not only of the parsing system of native English speakers, but also the role of parsing in learners' Interlanguage systems. Finally two specific, concrete research tasks are presented.

Like Abney (1991) I have had the intuition that I and other English speakers speak in chunks and, perhaps, even think in chunks of language. According to Abney these chunks "correspond ... to prosodic patterns" in such a way that "the strongest stresses in the sentence fall one to a chunk, and pauses are most likely to fall between chunks." Typically they consist of "a single content word surrounded by a constellation of function words (Abney, 1991, 257)". He also points out that the ordering of chunks within a sentence is much more flexible than the word order within chunks.

The flow of language is generate and must be analyzed at multiple levels. Carroll (1994) gives four--(a) phonological, (b) lexical, (c) syntactic, and (d) discoursal, but the list could obviously be expanded. For purposes of comprehension, the flow has to be segmented at each of these various levels, the segments interpreted, and the relationship between segments identified for interpretation at the next level. We will refer to this process as parsing. At one end of the spectrum, clusters of phonemes must be isolated and identified as syllables and then words (a-b). At the other end, groups of sentences form paragraphs and larger discourse units (c-d). One possible psychological motivation for parsing is a limit on the cognitive capacity of Short Term Memory, which seems to have a limit of seven units of information plus or minus two (Miller, 1956 and Carroll, 1994, 51-57), though automatic processing--of common words, for example--may reduce some of the cognitive load involved in verbal communication.

The chunks of language to which Abney is referring, also known as [sentential] constituents, can be viewed as an intermediate unit of language somewhere between the word and sentence. They are occasionally marked in written discourse by commas, colons, or semi-colons; but such cases are the exception. Unlike words, which are consistently set off by blank spaces, or sentences, clearly marked by capital letters and periods, even in written communication these chunks have to be sorted out by the Human Parsing Mechanism from what is often ambiguous data. Clark and Clark (1977, 51) identify constituents, rather than words or sentences, as "the natural unit of perfectly fluent or 'ideal speech'."

Though English language students are not perfectly fluent, and their speech and writing would not be described as ideal, they, like native English speakers and writers, process and produce words and sentences in a systematic fashion (Corder, 1967 and Selinker, 1972). If chunks are "the natural unit" for the fluent discourse of native English speakers, would they not also be the natural unit for English Interlanguage? As such, it would seem that the parsing of sentences should be an important area of study in the field of Interlanguage. The first questions to ask are: (a) do English learners parse there sentences? and (b) where do they parse them? If it turns out that they parse their sentences differently--thus producing different chunks from native speakers, the next step would be to investigate how that difference affects their comprehension and production of English.

An extensive literature on parsing extends across a number of different fields (Karttunen and Zwicky, 1985). First chronologically is the prescriptive parsing of traditional grammar, with which we need not be concerned. In linguistics, we are interested in a description of grammars and how people process a language. Parsing has a dual role in language: an analytical role in the comprehension processes and a synthetic role in language production. It cannot be assumed that these two processes are the same process in reverse (Kay, 1985). Rather, that is something that must be examined empirically. In the mathematical field of formal language theory, a number of different parsing algorithms are currently being explored: top-down and bottom-up, deterministic and nondeterministic, parallel and sequential with backtracking. In their work, computer scientists are free to stipulate and shape the grammar and parsing principles of a language. Linguists, on the other hand, must discover them in their data. In artificial intelligence studies involving natural languages, computer engineers no longer exercise much control over the grammar of the input, but still have free rein in deciding how to process that input. In psycholinguistics, however, both the linguistic data and the psycholinguistic processes generating and analyzing it are viewed as social and biological forces of nature which are beyond any single person's control.

The fundamental question [at the sentence level of language comprehension] is this: when we encounter a string of words how do we make sense of it? There seems to be agreement that we identify and isolate constituent chunks from the string, and that we utilize two different sets of strategies to do this. One approach uses function words to syntactically identify phrase or clause boundaries (see Clark and Clark, 1977, 57-72), the other starts with the content words (Bever, 1970 cited in Clark and Clark, 1977), and then searches Memory for words to attach to them. Thus sentence parsing can be seen two different ways. The syntactic approach is a top-down process of dividing a sentence into meaningful chunks; whereas, the semantic approach posits a bottom-up process of grouping words into meaningful clusters.

The syntactic approach, according to Kimball's Seven Principles (1973), is a two-step process that involves (a) Short Term Memory and (b) a processing unit. Short Term Memory provides the initial processing. As words enter Short Term Memory a grammatical frame is constructed from the top down in anticipation of an entire sentence and words are connected. Though English is a look-ahead language, it only allows a delay in attachment of one or two words. When a verb appears its argument structure is then integrated into the framework. Function words--including articles, prepositions, modals, and the "to" before infinitives--signal the beginning of a new node, to which constituents are added as long as they can be assimilated into the phrase. When a word cannot be assimilated, the phrase may be closed and moved from Short Term Memory to the processing unit. If, on the other hand, it is still not complete, then a second node may be processed while the first node is placed on hold. Limits on the capacity of the Short Term Memory, however, prevent simultaneous processing of a third node. Incoming words are preferentially attached to the node being processed, then the node on hold, and only as a last resort to nodes in the processing unit, and higher up on the tree.

Although other researchers seem to agree that syntactic parsing takes place in two different stages (Frazier and Fodor, 1978), there is disagreement on the processes involved. Kimball's (1973) model processes strings of words top-down, assigning each word a place in the surface structure of the whole sentence, while the second stage assigns individual phrasal units of all sizes, as they are completed, to their locations in deep structure. Fodor, Bever, and Garrett (1974) propose a parser that first identifies clause boundaries and determines each clause's internal structure. The assignment of clauses to the sentence structure, however, is postponed until the second stage. The Frazier and Fodor's (1978) first stage--the Sausage Machine--operates in a similar manner except that the constituents that it identifies are determined by size, instead of syntactic status. Rather than identifying clauses per se, it slices up a language string into constituent strings of about seven words. These smaller strings, then, are sent to a second-stage processor and organized into a meaningful hierarchy.

Psycholinguistic research on parsing, as can be seen from the previous discussion, seems to concentrate almost exclusively on the process and ignore the product--the locations of the parsing boundaries. Researchers are interested in (a) the interaction of syntactic and semantic approaches, (b) the priority of strategies in the human parsing algorithm, and (c) the resolution of ambiguities (see Pereira, 1985 for a discussion of Kimball, 1973; Frazier and Fodor, 1978; and Wanner, 1980). Typically, these researchers look at processing times to see which of a pair of sentences is more difficult to process: complement verbs or noncomplement verbs (Fodor, Garrett, and Bever, 1968), embedded sentences with or without relative pronouns (Fodor and Garrett, 1967; Hakes and Cairns, 1970; Hakes and Foss, 1970), and various other combinations (Carroll, 1986, 183-191). The underlying assumption seems to be that the locations of the boundaries of Abney's chunks will be the same for all, who are assumed to be native-speakers or, at least, to have native-speaker competence. As English language teachers, however, we should ask ourselves if this assumption is an empirically valid one for our students. Where do language learners break up their English sentences? Do they parse them in the same locations as a native speaker would?

A Preliminary Study

Several years ago, when teaching classes as part of an ESL program at a university in Hawaii, I administered tests to my upper level classes--a grammar class and a writing class--in order to get some pedagogical feedback on verb usage. The students were asked to fill-in blanks in a reading passage with the appropriate form of the verbs. In some places the context required them to supply a verb followed by the infinitive form of a second verb. A few of the responses that came back surprised me, because the students seemed to be attaching the "to" to the end of the first (tensed) verb, not the infinitive. Here is an example:

Jack [like] _________ [go] _________ to the beaches in Hawaii.
In the first blank they had written "likes to", leaving the verb stem "go" all by itself in the second blank. This provided me with some unsolicited evidence that my students might have different concepts of what constitute a sentential constituent than I as a native speaker of English have. I decided to explore this area further by introducing my students to the process of parsing sentences.

I used sentences taken from the newspaper which, because we were studying modals in grammar class, all contained that structure, and I showed my students how these fairly long sentences could be broken down into manageable chunks that answered wh-questions about the sentence. Later I tested the 18 students [five Japanese speakers (j), seven Chinese speakers (ch), two Korean speakers (k), and five speakers of other languages (o)] on it by having them parse eight sentences themselves in the classroom under test conditions [closed book, no talking]. I specified in the test how many chunks each sentence was to be divided into.

In order to compare my non-native students' parsing with what a native speaker might do, I asked four TESL graduate students to take the same test. Then I identified, as a candidate for Interlanguage variation, places that non-native speakers had inserted a break, but no native speakers had--where the "natural unit" of fluent speech had been violated. These, of course, were balanced by an equal number of instances where the native speakers had inserted breaks, but a non-native speaker did not. Combining two unit does not seem to violate the unity of constituents.

A wide assortment of variations emerged. Here are some examples:

Kind of Break
--with example(s) j ch k o
infinitive / direct object breaks
-- to do / the assignments
-- to get / it 5 5 2 1
adjective/noun breaks
-- for the right-to-die / movement
-- medical / procedure 4 3 0 3
pronoun/verb breaks
-- it / would be
-- I / think 3 4 3 0
preposition/noun breaks
-- because of / their suffering
-- under / certain circumstances 2 3 0 2
clause-internal breaks
-- After / one or two physicians
-- people who / are dying
-- because / she refused 3 4 0 0
noun/adjective clause breaks
-- people / who are dying 3 2 1 0
be/predicate adjective breaks
-- was / in the seventh grade
-- will be / able 0 3 1 1
infinitive-internal breaks
-- to / do
-- to / get it 0 3 0 1
modal/verb breaks
-- should / be legal
-- might even / be convicted 1 0 1 1
passive-internal breaks
-- might even be / convicted 0 1 0 0
Totals 21 28 8 9

The data clearly shows non-native-like variation in parsing within the constraints of this task; and thus lends support to the hypothesis that second language learners may have their own principles for parsing sentences, which could be considered a part of their Interlanguage system. There doesn't seem to be any clear pattern of variation by language group, except that the miscellaneous group [o=others] had less than half as many errors per subject 9/5=1.8 as the three single language groups [j=Japanese, 21/5=4.2; ch=Chinese, 28/7=4.0; and k=Korean, 8/2=4.0].

One possible explanation for a small part of the data can be found in a study (Shimizu, 1993) that compared ten Japanese-as-a-foreign-language Americans and ten Japanese in parsing Japanese sentences. All Japanese made a break between nominals joined by the particle "no", but the non-native Japanese speakers were much less sensitive to that break (Shimizu, 1993, 13-15). The syntactic equivalent in English would be two nouns joined by "of", which can also be parsed--between the first noun the prepositional phrase. More often than not, however, the semantic equivalent is a single noun phrase in which an adjective fills the role of the first Japanese nominal and its particle. The native English speakers in our study never parsed such noun phrases. Those Japanese speakers who placed a break between the adjective and noun might have been parsing English noun phrases much like they would parse the semantic equivalent in Japanese.

Methodological Problems and Solutions

The pilot study had both native English speakers and non-native speakers parse Target Language sentences. Parsing, however, is intimately related to the syntactic structure of the sentence (see Perfetti, 1990). So, perhaps, Interlanguage parsing would be best demonstrated on Interlanguage sentences, as a part of the complete Interlanguage system.

Another interesting twist to this question is to ask: how would native speakers parse ungrammatical Interlanguage sentences? This would complement the work that has been done on Foreigner Talk, focusing on the comprehension in interactions between native and non-native speakers. Since this would be a natural context for divergent syntactic and semantic cues, the Competition Model (see MacWhinney and Bates, 1989) could probably be applied here to a realistic language situation. Each of the four combinations of native and non-native parser segmenting Target language and Interlanguage sentences, then, has enough substance to warrant investigation.

... Target Lang ...Interlanguage
Native speaker parsing + +
Non-native speaker parsing + +

A second, and more disturbing, weakness in this study was the method of data collection. Not only were the subjects aware of what was being tested, they had to be instructed how to do it. Even some of the native speakers needed an explanation of how to parse sentences; it's not something many people consciously think about. I would now like to propose two tasks that could be used by researchers or teachers themselves to find out where language learners parse (a) the sentences they read and (b) the sentences they write. These two methods of eliciting parsed sentences are designed to be less artificial and less transparent than simply asking learners to divide up sentences. The proposed methods should also obviate the need to instruct those whose language systems are being investigated how to do parse sentences, or even to explain what parsing is.

Parsing in Learner Generated Sentences

Abney's description of sentential constituents as "a single content word surrounded by a constellation of function words (Abney, 1991, 257)", suggests a relatively natural task, which might mimic some of the psycholinguistic processes involved in sentence construction. The idea is to provide the learners with the content words and have them supply the "constellations" of function words--thus utilizing semantic, bottom-up, synthetic parsing strategies. If each content word is in the center of a separate line (Figure 1) with ample space on both sides for the insertion of function words, then hopefully the learners would naturally place them on the same line as the associated content word so that constituent boundaries would fall between the lines. Yet the learners are never asked to consciously identify the boundaries, only to construct sentences appropriate for the listed content words. The researcher or teacher might want to provide a translation in the learners' native language of the target sentence in order to get as much data as possible on a predetermined sentence, or might choose to let the subjects make their own sentences, thus allowing them more freedom of expression. Using the sentence from the previous study as the target sentence, for example, would give us:

_______________________________ Jack _______________________________ _______________________________ like _______________________________ ________________________________ go ________________________________ _______________________________ beach ______________________________ ______________________________ Hawaii ______________________________
Figure 1

Parsing in the Analysis of Sentences

The various two-stage parsing models discussed above--particularly Kimball's model involving (a) Short Term Memory and (b) a processing unit--suggest another approach to accessing a learner's parsing strategies. Following the tradition of eye fixation times (Frazier and Rayner, 1982) and self-paced word-at-a-time reading (Britt, 1987) in psycholinguistic studies of reading I would propose the use of simple computer programs to determine where learners and native speakers parse English sentences for the purposes of comprehension. I have already written a prototype program in Basic. Unfortunately, the computers that fill school computer labs these days do not seem capable of executing programs written in such an archaic language. Still since those of you familiar with the newer languages and software--such as C Language or Hypercard, perhaps--might be able to figure out how to accomplish the same routine using these more accessible modern formats, let me describe how my prototype works.

This time the idea is to mimic the two stages of processing in verbal comprehension, particularly the first stage, involving Short Term Memory, as a way of discovering what gets passed from the first stage to the processing stage. The chunks that get passed from one stage to the next, then, should be our sentential constituents--giving us insight into syntactic, top-down, analytical parsing strategies.

First, a reading passage is printed on the computer screen, one word at a time. Each word disappears when the next one appears, and there is no way to backtrack. The task for the reader is to type what she reads, phrase by phrase, before forgetting the exact wording. The theory is that the lengths of the phrases are limited by the capacity of Short Term Memory, just as they are in the Human Parsing Mechanism, and the location of the breaks between phrases will be mediated by language comprehension processes.

The reader controls both the pace of the reading--by pressing the space bar to get the next word--and at what point she stops to type each phrase--by pressing the return key to toggle between the reading mode and typing mode. The computer can keep track of (a) what was read during each interval and (b) each phrase the reader types. Reading comprehension questions could also be inserted at the end of the reading passage in order to find out how well the reader was able to comprehend the passage and in order to conceal the fact that it is the reader�s parsing that is under investigation.

Hypotheses

What might we expect to discover about English parsing from these two investigations? From the discussion earlier in this paper we might draw the following hypotheses:

H1 Native speakers will parse sentences in the same fixed locations. There will be a single system, which is part of the native speakers� competence.
H2 Each learner's parsing strategies will be systematic, but will deviate from the system used by native speakers.
H3 The parsing systems of advanced learners will show less deviation from the native-speaker norms than the systems of beginners will.
H4 Despite differences in the features (bottom-up vs. top-down), strategies (semantic vs. syntactic), and possible differences in pyscholinguistic processes, the locations of the boundaries between constituents for generative parsing and analytical parsing will be the same. Keep in mind, however, that generative parsing may exhibit a greater number of (smaller) chunks due to the nature of the research tasks.
H5 The average length of sentential constituents for the analytical parsing task will be very close to seven words, with a standard deviation of one or two words.
H6 Learners will read only one or two words ahead of the phrases that they type.

Conclusions

Our review of the literature indicates the importance of sentential constituents and the parsing process itself in the English linguistic system. Psycholinguistic theories of parsing propounded by Kimball (1973); Fodor, Bever, and Garrett (1974); Frazier and Fodor (1978); and Wanner (1980) provide a general framework that suggests some relatively simple research tasks and hypotheses. Such an investigation should not be confined to the parsing system of native English speakers. It should be extended also to include the role of parsing in the English Interlanguage systems of learners from various L1 backgrounds. I invite other teacher-researchers in countries throughout the world to utilize the above and similar research tasks to explore their students' segmenting of English sentences and to share the results in order to formulate an empirical description of Interlanguage parsing.

Acknowledgments

The author wishes to express his sincere thanks to Ray Donahue (Nagoya Gakuin University), William Bonk (Kanda University of Foreign Studies), and Judy Yoneoka (Kumamoto Gakuen University) for their valuable critical comments on an earlier draft of this paper. Not all of the advice received was necessarily heeded, however, and I retain full responsibility for the final product.

References

Abney, S. (1991). Parsing by chunks. In R. Berwick et al. (Eds.). Principle-Based Parsing: Computation and Psycholinguistics. Dordrecht: Kluwer, 257-278.

Bever, T. (1970). The cognitive basis for linguistic structures. In J. Hayes (Ed.). Cognition and the Development of Language. NY: John Wiley Sons, 279-352.

Britt, A. (1987). Parsing in context. Unpublished master's thesis, University of Pittsburgh.

Britt, M.A., C. Perfetti, S. Garrod, and K. Rayner (1992). Parsing in discourse: Context effects and their limits. Journal of Memory and Language, 31, 293-314.

Carroll, D. (1994). Psychology of Language. Pacific Grove, CA: Brooks/Cole.

Clark, H. and E. Clark (1977). Psychology and Language. New York: Harcourt, Brace, Jocanovich.

Corder, S. (1967). The significance of learners' errors. IRAL, 5(4), 161-170.

Fodor, J. (1975). The Language of Thought. New York: Thomas Y. Cromwell.

Fodor, J., T. Bever, and M. Garrett (1974). The Psychology of Language: An Introduction to Psycholinguistics and Generative Grammar. NY: McGraw-Hill.

Fodor, J. and M. Garrett (1967). Some syntactic determinants of sentential complexity. Perception and Psychophysics, 2, 289-296.

Fodor, J., M. Garrett, and T. Bever (1968). Some syntactic determinants of sentential complexity, II: Verb structure. Perception & Psychophysics, 3, 453-461.

Frazier, L. and J. Fodor (1978). The sausage machine: A new two-stage parsing model. Cognition, 6, 291-325.

Frazier, L. and K. Rayner (1982). Making and correcting errors during sentence comprehension: Eye movements in the analysis of structurally ambiguous sentences. Cognitive Psychology, 14, 178-210.

Karttunnen, L. and A. Zwicky (1985). Introduction. In Dowty, D., L. Karttunnen, and A. Zwicky (Eds., 1985). Natural Language Parsing. Cambridge: Cambridge University Press, 1-25.

Hakes, D. and Cairns (1970). Fr

Hakes, D. and D. Foss (1970). Decision processes during sentence comprehension: Effects of surface structure reconsidered. Perception and Psychophysics, 8, 413-16.

Kay, M. (1985). Parsing in functional unification grammar. In Dowty, D., L. Karttunnen, and A. Zwicky (Eds., 1985). Natural Language Parsing. Cambridge: Cambridge University Press, 251-278.

Kimball, J. (1973). Seven principles of surface structure parsing in natural language. Cognition, 2(1), 15-47.

MacWhinney, B. (1987). Applying the competition model to bilingualism. Applied Psycholinguistics, 8, 315-327.

MacWhinney, B. and E. Bates (1989). Functionalism and the competition model. In B. MacWhinney and E. Bates (Eds.). The Cross-Linguistic Study of Sentence Processing. Cambridge: Cambridge University Press, 3-73.

Miller, G. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychology Review, 63, 81-97.

Pereira, F. (1985). A new characterization of attachment preferences. In Dowty, D., L. Karttunnen, and A. Zwicky (Eds., 1985). Natural Language Parsing. Cambridge: Cambridge University Press, 307-319.

Perfetti, C. (1990). The cooperative language processors: Semantic influences in an autonomous syntax. In D. Balota et al. (Eds.), Comprehension Processes in Reading. Hillsdale, NJ: Lawrence Erlbaum Associates, 205-230.

Sasaki, Y. (1991). English and Japanese interlanguage comprehension strategies: An analysis based on the competition model. Applied Psycholinguistics, 12, 47-73.

Selinker, L. (1972). Interlanguage. IRAL, 10(3), 209-231.

Shimizu, T. (1993). Native/non-native differences in segmentation of Japanese sentences written in hiragana. An unpublished term paper. University of Hawaii, Department of ESL. Spring 1993.

Wanner, E. (1980). The ATN and the Sausage Machine: Which one is baloney? Cognition, 8, 209-225.

Research and Working Papers
http://www.aichi-gakuin.ac.jp/~jeffreyb/research/

Kind of Break --with example(s)	j	ch	k	o
infinitive / direct object breaks -- to do / the assignments -- to get / it	5	5	2	1
adjective/noun breaks -- for the right-to-die / movement -- medical / procedure	4	3	0	3
pronoun/verb breaks -- it / would be -- I / think	3	4	3	0
preposition/noun breaks -- because of / their suffering -- under / certain circumstances	2	3	0	2
clause-internal breaks -- After / one or two physicians -- people who / are dying -- because / she refused	3	4	0	0
noun/adjective clause breaks -- people / who are dying	3	2	1	0
be/predicate adjective breaks -- was / in the seventh grade -- will be / able	0	3	1	1
infinitive-internal breaks -- to / do -- to / get it	0	3	0	1
modal/verb breaks -- should / be legal -- might even / be convicted	1	0	1	1
passive-internal breaks -- might even be / convicted	0	1	0	0
Totals	21	28	8	9

PARSING AS ONE COMPONENT OF THE L2 ENGLISH INTERLANGUAGE SYSTEM