By • 4/21/09
In Why Race Matters, Michael Levin distinguishes the morality of whites from that of blacks on the basis of respect for moral principles, or ideal standards of behavior that are binding for all people. As one example, whites are relatively likely to behave in accordance with the principle of the Golden Rule: do unto others as you would have them do unto you. The behavior of blacks, by contrast, is less influenced by principle and more by immediate self-interest. This racial difference in respect for principle results in lower crime rates among whites than blacks and greater respect for human rights and norms of civility.1
While Levin’s case for racial differences in principled moral reasoning is intuitively plausible, he references no research that directly addresses this question—indeed, little such research was available in mid-1990s, when he was writing his book. Today, however, there exists a substantial body of research that documents differences in moral reasoning by race, and it not only confirms Levin’s insight, but broadens its application. Since much of this research has received little attention and has never been synthesized in a single article, all relevant findings are reviewed here.
Psychologists working in the tradition of Lawrence Kohlberg have devised tests that measure moral maturity and have conducted surveys on the moral reasoning of different racial populations. Their research leads to the following conclusions:
Lawrence Kohlberg was a University of Chicago and Harvard psychology professor who worked out a theory of moral development between the 1950s and the 1980s. Kohlberg’s theory has been enormously influential among psychologists, and a robust research tradition based on his work continues today. Kohlberg believed that moral development universally progressed through a series of definable stages that ended in principled moral reasoning, although only a minority reached this stage. Though they have retained the essentials of his theory, Kohlberg’s successors have found that a simplified version of it has greater empirical validity.
For Kohlbergians, morality becomes ever more encompassing and rational as people progress through the stages: people begin by viewing morality in terms of narrow self-interest, then begin to see themselves in the context of a community. Fully mature moral reasoners view morality in terms of rights and duties that all men, including themselves, are obligated to obey. What follows is a very brief summary of the Kohlberg’s theory (see here for a more extensive one).
At Stage 1 of the Kohlbergian progression, people take an obedience and punishment attitude to morality: one should obey the law simply because one will be punished if one does not. Stage 2 thinkers reason about moral issues in terms of immediate self-interest: the right decision is the one that best serves their own personal interests and happiness. Stage 3 thinkers grasp the need to sacrifice immediate self-interest in order to maintain good relations with other people in their sphere of acquaintance. Good behavior at this stage means showing emotions like love, empathy, trust, and concern for others that result in cooperative relations with one’s family and community.
At Stage 4, people come to understand the need for the maintenance of universal norms such as laws and religious codes. Norms are necessary to avoid the conflict and chaos that would result if everyone were loyal only to themselves (Stage 2) or their family and friends (Stage 3). Stage 4 moral reasoners, however, are incapable of engaging in principled criticism of the conventional social order because they recognize no principles of morality in the name of which that order can be criticized. People at this stage are thus conventional, dogmatic, and authoritarian.
Kohlbergians call the final stage in moral development the “P” stage and take P to stand either for “principled” or “post-conventional.” At this stage, people reason about moral problems from principles, or ideal standards of behavior. Examples of such principles are working for the greatest good of society as a whole, mandating fair treatment, guaranteeing human rights, providing for the needy, and so forth. Moral reasoners at this stage believe the law and other social norms ought to conform to these ideal standards and are capable of formulating principled criticisms of the social order.3
The instrument most commonly used to measure moral reasoning today is the Defining Issues Test (DIT). This test presents subjects with moral dilemmas and asks them to choose among a number of different responses to the dilemmas. There is an example of one section of the DIT test here. Each response represents the perspective of one of the stages in Kohlberg’s scheme. The statistic most frequently reported from the DIT test is the percentage of subjects scoring at the P stage, or the “P-score.”
Decades of research have established that the DIT is an effective measure of moral reasoning. High scores on the DIT are linked to behaviors that are generally considered to be morally admirable like community involvement, professional honesty, good job performance, and low rates of delinquency. People with high DIT scores are also rated as more moral by their peers than those with low scores.4
Tests on white and Asian populations have found that as people grow older and more educated, they normally proceed through the progression described by Kohlberg.5 In fact, education level is the best predictor of P-score. Furthermore, P-scores correlate with scores on intelligence and aptitude tests in the 0.2 to 0.5 range. Nevertheless, population variations in moral reasoning cannot be fully accounted for by variations in education, intelligence, personality traits, or political attitudes; moral reasoning is thus an independent personality factor.6
Psychologists have found consistent variances in intelligence by race and ethnicity, with Asians scoring highest on IQ tests, followed by whites, then Arabs and Hispanics, and blacks at the bottom. Other research has found a similar racial continuum across an array of personality traits related to morality, with blacks and Hispanics scoring highest in aggressiveness, proneness to crime, and recklessness, and lowest in work commitment and ability to delay gratification; Asians and whites score at the opposite extremes.7
Given the correlation of moral maturity to intelligence and blacks’ and Hispanics’ relative proneness to irresponsible behavior, one would expect moral maturity scores to show the same distribution. There is now an extensive body of research on moral reasoning scores in different nations proving that this is in fact the case. Every cross-national study comparing the P-scores of white and Asian populations to those of other populations has found that the former are significantly higher.8
A 2001 article by Uwe Gielen and Diomedes Markoulis collected the results of studies of moral reasoning using the DIT in 14 nations around the world.9 The chart below takes the average of the early high school, late high school, and college P-scores in the 12 nations for which this information was available and the racial makeup of the populations tested was clear. At the right, averages for the four racial groups tested are presented. The relative performance by race shows the familiar racial continuum, with Asians scoring highest in moral maturity and blacks lowest.

The black score in the chart above is based only on the Nigerian sample because it is the only black sample for which we have scores for college as well as high school students. However, studies of predominantly black samples of high school students in Jamaica and Trinidad-Tobago have confirmed that they have significantly lower P-scores than other races do.10
There are two reasons to suspect that $racial differences$ in P-scores are larger than indicated in the Gielen and Markoulis review. The first flows from the fact that educational attainment is the best predictor of moral stage. Researchers have found that people’s moral development tends to stop at the average score for the last level of education that they completed. If someone drops out of high school, his moral stage score is not likely to increase during the rest of his life.11 Since the large majority of people in white and Asian nations attend at least some high school and a third or more go on to college, the moral reasoning scores above are likely quite a good estimate of the averages in those countries. However, in many developing nations, including most of the ones in which moral reasoning has been studied, only small minorities attend college, and fewer than half attend high school. Consequently, the high-school and college samples studied in the developing nations represent a relatively elite segment of the population and may have high moral reasoning scores in comparison to their compatriots.
The second reason why the cross-national studies may underestimate $racial differences$ stems from the large number of invalid tests in the developing nations samples. The inventors of the DIT devised an ingenious way of ascertaining whether subjects understand the test. The test requires that subjects both rate and rank the various possible responses to the moral dilemmas posed in the test. First they rate a response as being of great, moderate, or little importance in assessing the moral choice, then they rank which of the choices are most important to them—see the sample DIT question here. If the ratings and rankings are inconsistent, psychologists judge that the subjects have not understood the test and throw it out. Information about the number of tests disqualified is not available in all cases, but in two of the three studies of Arab samples and studies of high school students in Belize and Trinidad-Tobago, more than half of the tests were thrown out.12 What this means is that for large percentages, possibly the majority, of people in black and Arab nations, the DIT is not an effective measurement of moral reasoning. Since moral judgment is related to intelligence, it stands to reason that people who cannot understand a fairly simple psychological test are not likely to be advanced moral reasoners.
The most common statistic we have about in moral reasoning is the P-score, but there is also some information about how the different nationalities scored at each stage in Kohlberg’s progression. The chart below shows stage scores for samples of high-school and college students in four nations, one white, one Asian, one Arab, and one black.13 The most striking differences are blacks’ relatively high score at Stage 2, the self-interested stage, and Arabs high score at Stage 4, the authoritarian maintaining norms stage.

Are $racial differences$ in moral reasoning due to genes or environment? It would strengthen the innatist case if these differences were found within nations as well as cross-nationally. After all, people growing up in the same nation usually share a substantially similar environment. If $racial differences$ appear both cross-nationally and within nations, it is likely that they have an innate basis.
There has been some testing of $racial differences$ in moral reasoning scores within nations, but far less than one would expect given the importance of this subject and the abundance of DIT testing in the past three decades. Almost all studies that address this subject are buried in obscure journals and based on small and unrepresentative samples of racial populations. Besides, most of the studies were not designed to test for $racial differences$ at all; the authors collected information about these differences in the course of investigating some other topic. The lack of comprehensive and definitive surveys of moral reasoning by race testifies to the cowardice of the social science community when it comes to the subject of $racial differences$. Over the past 30 years, James R. Rest, the major contemporary scholar in the Kohlbergian tradition, has written several summaries of research on differences in DIT scores among American demographic groups. Differences by gender, political ideology, profession, religious denomination, and so forth are covered in detail. However, we never learn anything about differences by race. That Rest would ignore such a salient group divide in American life while investigating so many less important ones is clear evidence of a deliberate refusal to explore the subject of $racial differences$.14
Most studies of within-nations $racial differences$ in moral reasoning have found that white scores are significantly higher than those of blacks and Hispanics. Moreover, in all of the best studies—those based on racial samples that are both large and representative of racial populations as a whole—whites get higher scores.
The table below summarizes the results of the best within-nations studies:15
| Study | Nation | Test | Subjects | Race/ethnicity | Results |
| Ferns and Thom 2001 | South Africa | RAQ | Adolescents aged 12 to 19 | 293 white and 295 black | White scores significantly higher. |
| Cortese | USA | Based on DIT | Children aged 7 to 15 | 100 white and 69 Mexican-American | White scores significantly higher. |
| Ji 2004 | USA | DIT | Undergraduates | 125 white and 150 non-white | White scores significantly higher. |
The Ferns and Thom article is probably the best study of $racial differences$ in moral reasoning ever conducted. The study used the Reasons for Action Questionnaire (RAQ), a multiple-choice pencil and paper test that is based on Kohlberg’s progression. The researchers chose the test because they found that, as in many of the other samples discussed above, large numbers of participants did not understand the DIT. The researchers made sure that most of the subjects understood the test items beforehand, so the large numbers of exclusions that plague studies using the DIT in non-white nations were not a problem.
The results were striking: not only did whites score significantly higher in principled moral reasoning, but there was not a single black score in the principled range. Consistent with the Nigerian sample discussed above, black South Africans scored significantly higher in Stage 2, or self-interested, moral reasoning than whites.
Anthony Cortese’s 1982 study found significant differences between white and Mexican-American children on a test “similar to” the DIT—no other information about the test is given. Eighty-three percent of the principled moral scores were obtained by white children. It should be noted that Cortese controls for IQ in his experimental design, meaning that the effect of IQ on moral reasoning scores is removed from the results. Since whites have significantly higher IQ scores than Mexican-Americans, $racial differences$ in moral reasoning are probably underestimated. The Hispanic children were fluent in English and were at least the third generation in the United States, so the differences cannot be ascribed to difficulties of language and cultural assimilation.
Finally, the Ji 2004 study was conducted on a sample of white and non-white undergraduates at a “Protestant university in an urban community in California.” Ji does not specify the race/ethnicity of the non-white subjects, but since most non-whites in California are either Hispanic or black, especially in urban areas, they probably made up the majority of the non-white subjects.
In the Appendix to this article, I have summarized the results of other research on within-nations differences in moral reasoning. These studies are less conclusive than the three summarized above because they are based on samples of racial groups that are either small (often less than 20) or very non-representative (graduate students or gifted high school seniors), making the finding of significant differences in moral reasoning unlikely. Nevertheless, five of the eight relevant studies found higher moral reasoning scores among whites than blacks.
While Kohlberg and many of his followers believed that the progression they described was universal, cross-cultural testing suggests otherwise. White and Asian populations pass through the Kohlbergian progression as they mature, but Arabs do not appear to, and it is uncertain whether blacks and other races do.
The chart below shows P-scores for various racial groups by education level. The chart is based on the national data used in Figure 1; additionally, information about the P-scores of pre-high school and high school students from a majority-black Trinidad-Tobago sample has been added to the black scores.16 Whereas whites and Asians show clear evidence of moral development as they mature, the Arab scores show no progression.

Although the black scores in the chart above seem roughly to follow the same pattern of moral development as the white and Asian scores, other studies cast doubt on this conclusion. In the Ferns and Thom study of South African adolescents cited above, the authors found that the white sample did show moral development as they grew older, but that in the black sample no moral development pattern emerged. A study of predominantly black Jamaican high school students aged 11 to 18 also found no evidence of moral development. In a sample of high school students aged 12 to 20 from Belize, who were of mixed black, Amerindian, and white descent, significant differences in moral development were found in one measure of moral development but not in another.17 These contradictory results make it impossible to draw conclusions about moral maturation among blacks and other races.
Fig. 3 also makes clear that whites mature more slowly than Asians; however, by college age, the P-scores of the two races are equal. Generally speaking, the gap between whites and Asians on the one hand and blacks and Arabs on the other increases with age. There is scarcely any difference among the races before high-school.
We cannot be certain that $racial differences$ in moral reasoning have an innate basis since the wide range of tests that have proved the innateness of racial IQ differences—adoption and twin studies, for example—have not been performed on moral reasoning. That this is so is more testimony to the refusal of the social science community to confront the subject of innate $racial differences$.
As I wrote in The Reality of Racial Differences, given that social scientists betray their responsibility to provide the world with important information on the subjects of their expertise, we are entitled to make our best guess as to the innateness of $racial differences$ on the basis of the information that is available. I proposed a simple test: do the same $racial differences$ crop up in different environments? That is, if $racial differences$ in intelligence, moral reasoning, time-preferences, or any other trait appear in different cultures, then, in the absence of other evidence to the contrary, it is reasonable to assume that they have an innate basis.
By this standard, it is beyond any reasonable doubt that there are innate differences in moral reasoning between whites and blacks, as these differences show up consistently both in cross-national and within-nations studies. It is also reasonable to conclude more broadly that there are innate differences between whites and Asians on the one hand and Arabs and blacks on the other, as Asian and white scores are consistently high across nations and those of Arabs and blacks consistently low.
Other facts about the peoples under consideration strengthen this assessment. We know that there are innate differences among the races in IQ. Since moral reasoning correlates with IQ and shows the same racial hierarchy, it is reasonable to assume that $racial differences$ in moral reasoning are innate as well. Finally, to echo Michael Levin, the relatively high rates of crime, disorder, and tyranny in black, Arab, and Hispanic societies suggest that these races are less likely to reason in a principled manner than whites and Asians. Overall then, the best guess is that the differences discussed here have some innate basis.
Principled moral reasoning is the basis of a just society, that is, a society in which human rights are respected, people are treated impartially, and a concern for the general good prevails. Only principled moral reasoners are capable of creating a just society because only they can rationally evaluate the morality of the existing order.
Racial differences in moral reasoning are, consequently, of the greatest significance both for the study of behavior and public policy. We all know that there are persistent disparities in rates of crime among the races in the United States, and, looking around the world, we are dismayed by the flagrant disregard for human rights that prevails in many African, Arab, and Hispanic nations. The findings presented here at least partially explain the reasons for these differences in behavior. High levels of principled moral reasoning are undoubtedly one of the sources of the distinctive traits of Western culture.
Many recommendations for policy could be drawn from the facts detailed in this article. A straightforward one is that Western nations curtail or eliminate immigration from black, Arab, and Hispanic nations. The evidence suggests that these immigrants will probably not assimilate to Western moral norms and may import the crime, injustice, and tyranny that plague their homelands.
Research on differences in moral reasoning also leaves us with questions. If Asians show higher levels of principled reasoning than whites, why is it that the idea of inalienable human rights and other foundations of a just society emerged first in the West? And why do blatantly unjust regimes still prevail in China and North Korea? Clearly, while moral reasoning partially accounts for the uniqueness of Western culture, other factors are at play, and future research should aim to discern them.
Excluding the three summarized in the main text, below are all published studies of within-nations $racial differences$ in moral reasoning that were accessible to me through research libraries, including the Library of Congress, and the major databases of social science articles, like PsycArticles and PsycInfo. None of the experimental designs used in these studies is very well-suited to the detection of $racial differences$ in psychology. The sample sizes by race are often less than 20, for example, so very large differences are necessary to attain statistical significance. Moreover, the populations tested—registered nurses, graduate students—are often non-representative of the broader racial populations. Since members of these categories are likely to share educational or other credentials, $racial differences$ within these categories are likely smaller than they are in the general population.
Despite these disadvantageous conditions, the majority of relevant studies find significant or somewhat significant differences in moral reasoning between whites and other non-Asian racial groups. The five studies confirming the thesis are marked in green and the three disconfirming it are marked in red. The non-colored studies are not relevant to the thesis.18
| Study | Test | Nation | Subjects | Race/ethnic breakdown | Results |
| Gielen et al. 1986 | DIT | Belize | High school students | 68 Garifuna (black/Amerindian hybrid), 26 Creoles (white/black hybrid), 13 Hispanic, 11 other | Difference between Creole and Garifuna scores “somewhat significant” (p<.07). |
| Gielen et al. 1986 | DIT | Trinidad & Tobago | High school students | 94 blacks, 19 East Indians, 31 Dougla (Black/East Indian hybrid), 3 other | No significant differences. |
| Gielen et al. 1986 | DIT | USA | High school students | 37 Italian background, 19 Irish, 17 other white, 16 Hispanic, 16 black | No significant differences. |
| Murk and Addleman | DIT | USA | Under-graduates | 173 white, 14 black, 3 Hispanic, 2 Oriental, 3 other | Whites scores significantly higher than black. |
| Tudin et al. | RAQ | South Africa | Under- graduates | 35 white and 33 black | White scores significantly higher than black. |
| Wilson | DIT | USA | Registered nurses | 183 white, 29 black, 2 Asian, 4 Hispanic | No significant differences. |
| Howard-Hamilton and Franks | DIT | USA | Gifted high school seniors | 75 white, 20 black, 19 Asian, 8 East Indian, 14 Hispanics, 7 other | White and Asian scores significantly than black and East Indian scores. |
| Ji 1997 | DIT | USA | Graduate students | 95 white and 70 Asian | White scores significantly higher. |
| Ji 2004 | DIT | USA | Graduate students | 72 white and 64 non-white | No significant differences. |
| Sias et al. | DIT | USA | Substance abuse counselors | 168 white, 14 black, 2 Hispanic, 1 Asian, 1 Amerindian, 1 other | Whites significantly higher than non-white. |
The disconfirming studies carry little weight for reasons beyond those already mentioned. In the Gielen et al. 1986 study of American high school students, not only were the samples of blacks and Hispanics small, but the group of white students was divided up into those of Irish, Italian, and other background, making the detection of significant differences among racial groups less likely. Wilson 1995’s sample was based on the fraction of registered nurses who replied to a questionnaire including the DIT test that she sent them in the mail; only 30 percent did so. It seems likely that people who fill out and return psychological tests sent to them in the mail differ in moral character from those who do not; consequently, this sample is not very useful for detecting differences in moral reasoning.
The Tudin et al. study used an idiosyncratic method of scoring the Reasons for Action Questionnaire that placed a large segment of the population a hybrid 2/5 stage. Combining these students with others in Stage 5, they found no significant differences by race. Using the normal method of scoring, the P-score for white students was 14 percent and for black students, 3 percent, which is certain to be a significant difference. It should also be noted that the relatively high black scores on Stage 2 moral reasoning found elsewhere were confirmed in this sample—31 percent for blacks and 20 percent for whites.
Since cross-cultural studies find that Asians obtain slightly higher P-scores than whites, Ji 1997’s finding that an American sample of white graduate students had significantly higher P-scores than their Asian peers is intriguing. Since Ji does not give us relevant information about the samples, however, no meaningful interpretation of the discrepancy is possible. It may be that the Asians included in the sample were southeast Asians rather than the northeast Asians discussed in this paper.