Analysis of Linear Equating of Senior School Certificate Multiple-Choice Examination Papers in Economics

ABSTRACT


INTRODUCTION
In the teaching-learning process, learners are faced with a series of instructions. There is the need to constantly gauge the extent to which they respond to instructions, which could be termed students' performance [1]. To ascertain this, there is the need to have a device in doing it. This, therefore, brings about the use of the test. A test is an assessment instrument, tool, technique or method used systematically to measure a sample of behavior by posing a set of questions or to complete certain tasks for the students to react to gauge mastery of a skill or knowledge in the curriculum content [2] [1]. Tests are designed to measure the quality, ability, skill or knowledge of a sample against a given standard [1]. Tests can take the form of standardized and non-standardized.
Standardized tests are tests administered to millions of candidates each year at once. It uses controlled standard procedures for administration and scoring. It is used to evaluate students' learned skills in academic subjects, teachers' and schools' performance, quality of the curriculum, and the educational system. Most of the standardized tests are external to the school environment and have predetermined conditions such as attempting similar questions of the same difficulty level, having the same supervised test conditions, and scoring in the same way. The interpretations are consistent across schools and years [3] [4] and used for making comparisons about students' achievement and promoting accountability of education. The primary purpose of standardized tests is to achieve fairness of assessment of learning outcomes, upon which important decisions are based.
In any standardized test, especially where two or more examining bodies perform similar functions (e.g., the West African Examination Council, the National Examinations Council, and the National Business and Technical Examinations Board (NABTEB) conducting Senior School Certificate Examination), test equating are highly essential. Equating is a process that converts scores on one form of a test to the score scale of another form. It is a family of psychometric approaches to align the numerical format of different measurement instruments [5]. It is a linkage between two or more scores generated from tests. Equating may be viewed as a form of scale aligning in which very strong requirements are placed on the tests being linked. The goal of equating is to produce a linkage between scores on two test forms so that the scores from each test form can be used as if they have come from the same test [6]. It is a technical procedure or process conducted to establish comparable scores on different versions of a test, allowing them to be used interchangeably [7]. It is an important aspect of establishing and maintaining the technical quality of a testing program by directly impacting the validity of assessments [7]. When two tests or items have been successfully equated, educators can validly interpret performance on one test form as having the same substantive meaning compared to the equated score of the other test form [5] [7].
Test equating methods are statistical tools used to produce exchangeable scores across different test forms [7]. The two recognized measurement theories, Classical Test Theory (CTT) and Item Response Theory (IRT), can be used to compute test equating. In CTT, mean scores and standard deviation are used to equate the performance in two forms. The Tucker and Levine Observed Score are the two recognized equating methods. The Tucker estimates the relationship between observed scores on two forms of test scores. Observed-score equating method refers to the transformation of the raw scores of a new test, "X", into the raw scores of an old test, "Y". It is used to ensure that test scores from different test forms are comparable and that the scores can be used interchangeably. The Levine True Score method estimates the relationship between true scores on the two forms. Usually, in any of the above methods, In IRT, equating means the process of placing scores from two parallel test forms onto a common score scale. The scores from the two different forms can be compared directly or treated as if they come from the same test form. Horizontal and vertical test equating are recognized. Vertical equating refers to the process of equating tests administered to groups of test-takers with different abilities, in different years of schooling (e.g., Senior Secondary students 1 and Senior Secondary students 2) at a time. Horizontal equating refers to the equating of tests administered to test-takers of different groups (Male and female in Senior Secondary students 1) with similar abilities. However, different tests are used to avoid practice effects. Test equating establishes validity and reliability of the instrument across forms and years, fairness of items, test security, and continuity of the program.
Linear equating is implemented by reflecting the ability level of the students and the spread of scores onto the reference scale scores. It provides a transformation so that scores from two tests will be considered equated if they correspond to equal standard score deviates. It is useful with small samples, and the accuracy of the results is most important near the mean [5]. Five requirements are widely viewed as necessary for a linking to be an equating [8]. Those requirements are: 1. The Equal Construct Requirement: The two tests should both be measured of the same construct (e.g., latent trait, skill, ability). 2. The Equal Reliability Requirement: The two tests should have the same level of reliability. 3. The Symmetry Requirement: The equating transformation for mapping the scores of "Y" to those of "X" should be the opposite of the equating transformation for mapping the scores of "X" to those of "Y". 4. The Equity Requirement: It should be a matter of indifference to an examinee as to which of two tests the examinee takes. 5. The Population Invariance Requirement: The equating function used to link the scores of X and Y should be the same regardless of the choice of (sub) population from which it is derived. Concerning the best practices, Requirements 1 and 2 mean that the tests need to be built to the same specifications, while Requirement 3 precludes regression methods from being a form of test equating. [9] argues that Requirement 4 implies both Requirements 1 and 2. Requirement 4 is, however, hard to evaluate empirically and its use is primarily theoretical (Hanson, 1991;Lord, 1980). As noted by [8], Requirement 5, which is easy to assess in practice, also can be used to explain why Requirements 1 and 2 are needed. If two tests measure different things or are not equally reliable, then the standard linking methods will not produce results that remain unchanged when a particular transformation is applied to it across certain subpopulations of examinees.
Proposed [10] for the NEAT design, the anchor test has a central role as a proxy of ability because the conditional mean and variances over anchor scores are used to obtain a family of equating transformations. Linear equating is implemented by reflecting the ability level of the students and the spread of scores onto the reference scale scores [11]. To equate scores on the new form to scores on the reference form in a group of test-takers, each score on the new form is to be transformed into the score on the reference form that has the same number of standard deviations above or below the mean of the group. It is called Linear equating because the relationship between the raw scores and the adjusted scores appears on a straight line [12]. 10.12198/spekta.v3i1.4447 There have been studies on linear equating both within and outside Nigeria. For instance is, [13] compared mean equating, linear equating, and equipercentile equating using various degrees of pre-smoothing (including none at all) in samples ranging in size from 25 to 200, using data from only one test. Mean equating is the most accurate of the small sample methods for below-average scores, but the least accurate for above-average scores. Linear equating is more accurate than equipercentile equating for below-average and near average scores, but less accurate for scores more than one standard deviation above the mean. In all test equating studies, according to IRT, alternate forms should be balanced in terms of equivalent test information functions (TIF). To be specific, an examinee who takes Form A should not be more or less advantaged than one who takes Form B or Form C [14]. [6] carries out a study that compares linear equating and Rasch equating. It is conducted from the study that Rasch equating provides essentially the same results as linear equating.
The motivating factor for carrying out this study is that people believe it is impossible to develop multiple forms of tests that have the same psychometric properties. Some stakeholders seem to have the impression that anyone with NECO or NABTEB SSCE results is half-baked and to them, only WASCE is qualitative enough for acceptance. In Nigeria, some studies have indicated doubt about the quality of the Senior Secondary School Certificate Examinations. [15] points out that there are vast differences in the quality of certificate examinations conducted by the various examination bodies. [16, 17, 18, 19, & 20] remarks that the standard of SSCE conducted by NECO is low compared to SSCE conducted by WAEC.
Moreover, some tertiary institutions and employers of labour tend to prefer students with credit passes in the SSCE conducted by WAEC to those conducted by NECO and NABTEB. They believed that the SSCE conducted by WAEC has a higher standard than the SSCE conducted by NECO [21]. Thus, they conduct a study on the conversion of units of WAEC, NECO, and NABTEB for common comparison to establish whether this assertion is true. Therefore, this study statistically analyzes linear equating of WAEC, NECO, and NABTEB Senior School Certificate Multiple-choice Test Items in Economics.
The Multi-stage sampling technique is adopted for the study. The first sampling procedure adopted is stratification of schools to the three Senatorial Districts found in Kwara State. The second sampling procedure adopted is proportionate sampling. Ten per cent (10%) of the total public senior secondary school in each Senatorial District is proportionally selected. Hence, eight (8), seven (7), and fifteen (15) public Senior Secondary Schools are selected in Kwara North, Kwara Central, and Kwara South respectively. This amounts to thirty (30) public schools selected. A purposive sampling technique is also used to select Senior Secondary Schools 3. This is because they are in the final stage of their secondary school programme and are ready to write their external and final examinations. The researchers hope that they ought to have cover much ground of the syllabus. In the Kwara central is 478 (42.7%); Kwara North is 264 (23.6%); while Kwara South has 377 (33.7%) respondents. A total of one thousand hundred and nineteen (1,119) students participate in the study.
The researchers adopt 2009 WAEC (Form A), NECO (Form B), and NABTEB (Form C) Economics multiple-choice items for data collection composed into independent and anchor tests. Each test form has unique items shared, a set of twenty common items located at numbers 11-30 in each test form. Test forms A, B, and C contain 30, 40, and 30 WAEC, NECO, and NABTEB multiple-choice items (unique items) respectively and each form also contains 20 multiple-choice items (common items).
The researchers determine the instruments' (Forms A, B and C) content validity with a formula r ts = 1 − ∑│ │ 100 (rts is the correlation between test and syllabus contents) where d is the difference (ignoring signs) between corresponding percentage weightings of each of the WAEC, NECO and NABTEB economics topics and the contents of the test. Also, measures of internal consistency with split-half method of estimating reliability is employed. The following Coefficients of content validity 0.67, 0.64, and 0.60 and reliability coefficients of 0.79, 0.76, and 0.70 are respectively obtained for forms A, B, and C.
This shows that the instruments are good and reliable. The researchers use these Economics multiple-choice papers because they are interested in equating the Nigerian Senior Secondary School Certificate Examination (SSCE) Economics multiple-choice papers of the three different examination bodies. Research questions one and two generated are answered using means and standard deviation while research questions three is answered using percentile rank.  11.30 and 11.28 respectively. Also, 17 and 5 are shown as the test-takers' highest and least scores in common items in the three test forms while standard deviations of 2.44, 2.38, and 2.38 are obtained for forms A, B, and C respectively. This implies that there is no difference in examinees' ability or proficiency in the subject, thus scores equating among WAEC, NECO, and NABTEB multiple-choice items in Economics is upheld. The finding also implies that differences in difficulty of items in the tests are equal since they have the same distribution of common items scores in the three forms.

RESULTS AND DISCUSSION
To ensure that condition for equating using Non-equivalent method is not violated, respondents' scores in common items are subjected to One-way ANOVA to know if there is any significant difference in the test-takers' performance at 0.05 level of significant as demonstrated in table 2.    Table 3 shows the performance of the students on the unique items. Respondents in form C with a mean performance of 41.54, skewness of 0-1.83, and kurtosis of 4.295 have the best performance because the skewness value of -1.83 indicates that the mass of the scores is clustered to the right at the high values. Kurtosis value of 4.295 indicates a sharp peak of which the distribution concentrates on the right values. Test-takers in form B have better performance with a mean of 36.40, skewness of 0.203, and kurtosis of -0.220 while the least mean performance of 35.45 is recorded by the respondents in form A with skewness of .136 and kurtosis of -0.822. Positive skewnesses indicate that mass of the scores clusters to the left at the low values. This implies that there is a difference in the pattern of students' performance in unique items across the test forms. Higher mean performance and negative skewness value of students in test form C indicates that it is relatively easy compared to test form A & B.

Research Question Three:
What are the results of linear equating of WAEC, NECO, and NABTEB Senior School Certificate Economics multiple choice papers with the use of standard score deviates?
This finding corroborates the submission of Alfred (2011) that there is a significant difference in the difficulty level of Economics multiple-choice items conducted by WAEC, NECO, and NABTEB with a mean of 37.30, 35.48, and 30.66 respectively. In the same vein, the finding supports that of [22] that there is a difference in difficulty index of Mathematics examination conducted by WAEC, NECO, NABTEB and JAMB (Joint Admmision Matriculation Board) in Nigeria. The finding is contrary to [23] submission that there is no significant difference in the difficulty level, reliability, and validity coefficients of mathematics items constructed by WAEC, NECO, and NABTEB. The finding does not also substantiate the submission of [24] that WAEC and NECO have the same difficulty indices.
To establish the outcome of Linear Equating the summary of equated scores of WAEC, NECO and NABTEB are presented in Table 5 39  39  40  40  40  41  41  42  45  46  50  49  51  57  50  52  60  53  56  65 It is shown in table 5 that a score of 39 and 40 in WAEC are also equivalent to 39 and 40 both in NECO and NABTEB, while, the score of 41 in WAEC and NECO is equivalent to 42 in NABTEB. A score of 45 in WAEC is equivalent to 46 and 50 in NECO and NABTEB respectively, and a score of 49 in WAEC is equivalent to 51 and 57 in NECO and NABTEB respectively. A score of 50 in WAEC is equivalent to 52 and 60 in NECO and NABTEB respectively and a score of 53 in WAEC is equivalent to 56 and 65 in NECO and NABTEB respectively.
Findings on Linear equating of WAEC, NECO, and NABTEB Senior School Certificate Economics multiple choice papers with the use of standard score deviates reveal that a Score of 39 in WAEC (Form A) is also equivalent to 39 in NECO (Form B) and NABTEB (Form C) respectively because they correspond to the same standard score deviate (34) in table 4. It is shown on Table 5 that there are closely equivalent scores in WAEC and NECO compared to NABTEB. This can be attributed to a relatively long period of operation of these two Senior School Certificate Examination bodies. This finding supports that of [25,26] who says that there are no significant differences in the difficulty level of WAEC and NECO multiple-choice items in mathematics. The finding is not consistent with [27] submission that NECO is inferior to WAEC in all standards. It also disagrees with the submission of [18] that WAEC SSCE multiple choice 10.12198/spekta.v3i1.4447 Biology items have more difficult items than NECO SSCE multiple choice Biology items. Finding on difference in NABTEB scores equivalence to WAEC and NECO recorded in this study supports the submission of [22] that there is a difference in difficulty index of Mathematics examination conducted by WAEC, NECO, NABTEB and JAMB (Joint Admmision Matriculation Board) in Nigeria.

CONCLUSION
The examinees are not different in proficiency because mean performance of respondents in common items in test form A, B and C is not significantly different. Respondents perform differently in unique items across the test forms. Score of 39 in WAEC is also equivalent to 39 both in NECO and NABTEB, while, score of 41 in WAEC and NECO is equivalent to 42 in NABTEB because they correspond to the same standard score deviate (34). A score of 45 in WAEC is equivalent to 46 and 50 in NECO and NABTEB respectively because they correspond to the same standard score deviate (47), and a score of 53 in WAEC is equivalent to 56 and 65 in NECO and NABTEB respectively because they correspond to the same standard score deviate (64). The researchers therefore conclude that the 2009 WAEC and NECO SSCE Economics multiple-choice items tend to be equivalent while that of NABTEB is different. Equating method with a lower coefficient of variation (Linear equating) should be employed for equating scores. A regulatory body to standardize and monitor examinations conducted by the examining bodies should be established in Nigeria. Private owned examination bodies should be allowed to come on board; the proliferation would lead to a healthy competition which could result in the achievement of standards across the examination bodies.