Birkman International conducted a test-retest reliability analysis in April-May 2018. This document highlights the procedures followed, and reports the results obtained.

Birkman International (BI), founded by Dr. Roger Birkman, is a behavioral and occupational assessment company with a global reach. The Birkman Method® (TBM), our industry-leading personality assessment, is widely accepted and critically acclaimed. The prototype of The Birkman Method was created in 1951 and attained scientific form in 1965. Over the years, TBM has been constantly reviewed, updated, and improved by qualified psychometricians and organizational psychologists. TBM demonstrates strong psychometric and statistical properties.

The Birkman Method is unique within the behavioral assessment arena in that it compares a person’s own perceptions about self with his/her perceptions about most people. In effect, it reports a person’s social worldview. Another unique contribution of TBM is that it reports each respondent’s social desirability bias. It is well-known that individuals have a tendency to present themselves in a socially desirable light when answering questions about themselves. TBM integrates social desirability into its scoring processes and effectively deals with it, instead of ignoring the concept or applying “correction techniques” like many personality assessments that are available today.


With more than forty scales (measurands), TBM reports characteristics of a respondent, characteristics of interaction between a respondent and other people, and characteristics of interaction between a respondent and situations, all with one single assessment. To conduct testretest reliability analysis, the nine Components (Social Energy, Physical Energy, Emotional Energy, Self-Consciousness, Assertiveness, Insistence, Incentives, Restlessness, and Thought), the six Perspectives (Distinctiveness, Alignment, Image Management, Social Acuity, Self-Affirming, and Others-Affirming), and the ten Occupational Interests (Musical, Scientific, Artistic, Literary, Social Service, Numerical, Technical, Administrative, Outdoor, and Persuasive) are chosen as measurands.

Test-Retest Reliability

An important element of the reliability testing for any psychometric instrument is the degree of agreement between the results of successive measurements, separated in time, of the same measurand carried out under the same conditions of measurement [1,2]. The test-retest reliability coefficient measures the intensity and direction of such association between two successive values of measurands. It also indicates the overall consistency of measurands over time. To conduct test-retest reliability, the assessment is administered to a group of people twice, with a fixed amount of time between each administration. The results from each session are then compared to evaluate the relationship between each set of scores.


BI’s overriding goal for this effort was to minimize or eliminate any inherent bias in either the form or the process. Toward this end, BI hired a third-party market research firm to recruit potential respondents with no existing knowledge of TBM, and who could reasonably be presumed to be willing to complete the questionnaire twice (although that was not part of the initial request). Keeping in mind that TBM is a behavioral and occupational assessment tool, BI also stipulated that the sample population should consist only of working-age people, and that the demographic makeup of the sample should – as closely as possible – match the demographics of the working population in the United States. A link to TBM assessment was forwarded to the selected respondents (583), of which 422 successfully completed the questionnaire. Please note that in this study the test takers were financially incentivized to participate. It is also worth noting that at this point the test takers did not know that they would be asked to complete the assessment a second time. After 15 days, a new link to the assessment was forwarded to those who had completed it the first time. Successful completion yielded higher incentive for this second iteration. Out of 422 respondents, 414 successfully completed the task.


Following successful completion of the data collection phase, BI analyzed the data for initial insights. Almost immediately an anomaly was discovered – the amount of time used by some of the respondents to complete the questionnaire was either extremely small or extremely large in comparison to the time taken by the average Birkman respondent. Historically, most respondents took anywhere between 15 to 45 minutes to complete, with an average time of around 30 minutes. Clearly many (if not all) of the respondents with very small time-to-complete figures were not fully engaged in the process, likely not even fully reading the questions. The data associated with those instances are spurious and serve to arbitrarily and negatively impact the results. Out of those concerns BI imposed filtering criteria based only on the total time taken to complete the assessment. Considering only those respondents who took anywhere between 15 to 45 minutes to complete the assessment yielded 120 respondents. It was deemed necessary to take such an action in order to ensure a reasonable level of data quality. It was presumed that the final sample of 120 respondents represented people who were “serious” about their charge, and who were honest in their answers to the questionnaire. The actual analysis of test-retest reliability was conducted on this sample. Tables 1-6 highlight the demographics of the final set of respondents.


The range of possible values for the test-retest reliability coefficient is 0 to 1. In practice however, scores of “0” and “1” do not occur in the domain of psychometrics. Generally speaking higher values indicate higher levels of reliability, and lower values are associated with lower levels of reliability. Essentially, we are concerned with answering the question, “Does TBM consistently and reliably measure the same thing(s) over time?” Or, in other words, is the Method stable? Scales with a reliability coefficient closer to 1 are considered more stable than scales with a reliability coefficient closer to 0. Table 7 provides a general guideline about the interpretation of test-retest reliability coefficient(s). Tables A-C highlight the reliability coefficients of the scales under scrutiny. One can observe that most of the TBM scales exhibit good reliability – Among Components, Emotional Energy and Physical Energy; among Perspectives, Self-Affirming and Alignment; and among Interests, Musical and Outdoor top the list.


From the scores observed for each independent TBM scale, the reliability coefficients indicate that scale values result from systemic rather than chance or random factors [3], and that the measurands are stable over time. These results are very consistent with historical analyses of test-retest reliability (even though in this case the test conditions were less than ideal) and provide us with high confidence that TBM is quite stable and consistent. In summary, these results indicate that a given respondent would get essentially the same results from one administration to another, given similar testing conditions and minimal knowledge acquisition between the administratio

