Problematic khat use has been poorly defined and measured. This study aimed to develop a problematic khat use screening tool in the Gurage Community, South-central Ethiopia. We have used a series of methods (systematic review, qualitative study and cognitive interviewing) to generate a pool of items for problematic khat use screening tool. Classical test theory (inter-item, and item-total correlation, intraclass correlation, Cronbachalpha and exploratory factor analysis) and a 2-parameter item response theory statistics were used for initial psychometric evaluation of the scale. Initially, we developed a pool of 50 items. IRT analysis indicated that the item, ’’during the past three months, how often have you experienced irritability when you didn’t chew khat?” showed the highest discrimination (α thresholds =2.44). The item about “during the past three months, to what extent have you failed to do what was normally expected of you because of your use of khat? was the most difficult (first β thresholds =0.25) item, which was the only positive coefficient. The test information function graph depicts that the scale provided a lot of information in the latent trait’s moderate range. We found that employing series of methods is very important for item selection and refinement. We developed a Problematic Khat Use Screening Test with 17 items which will have a good utility for the general population or primary health care setting more than for the clinical population. Future studies should do full-scale validation of the scale.

Khat is considered as a type of natural amphetamine because it has active ingredients, such as cathinone and cathine (Kalix, 1988). Khat, like amphetamines, induce increased heart rate and blood pressure, increased energy and concentration, insomnia, hyperventilation, decreased appetite, pupillary dilation, and constipation after recent use (Cox & Rampes, 2003; Dagne et al., 2010). Problematic khat use, which has different indicators including increased frequency of chewing and long hours of khat sessions, was associated with other public health problems more than khat use itself (Thomas & Williams, 2013). Although there is a dearth of evidence related to physical health harms, problematic khat use is associated with mental health problems (psychosis and depression) and social harms (Cox & Rampes, 2003; Thomas & Williams, 2013).

Early detection of problematic substance use may help in the treatment of substance use disorders and prevent long-term substance-related problems (Kaner et al., 2009). Thus, screening tools are essential to facilitate early detection and treatment of substance use disorders. Screening tools should have characteristics such as being brief, clear, adaptable to different patterns of use, valid, reliable, and applicable to daily practice (Piontek et al., 2008).

For psychoactive substances with well-established harm and dependency nature, the application of screening tools is a common practice (Saunders et al., 1993).

There are screening tools for problematic alcohol use (AUDIT and FAST) (Saunders et al., 1993), problematic cannabis use (CPQ, CAST, CUDIT,MSI-X, MPS, RCQ-M), and problematic substance use more broadly (DUDIT and CRAFFT) (López-Pelayo et al., 2014). However, there are no screening tools for problematic khat use.

Problematic khat use is a burden in Ethiopia and other East African countries, and the Arabian Peninsula. In East Africa, the prevalence of daily khat use is estimated to be 80-90% among males and 10-60% among female adults, respectively (Numan, 2004; Odenwald et al., 2005). The current prevalence of khat use is estimated to be 67.9% in Yemen and 59% in Somalia (Elmi, 1983; Numan, 2004). In Ethiopia, khat was the second most widely used psychoactive substance (Fekadu et al., 2007). The national prevalence of khat use is 15.3% (Haile & Lakew, 2015); in some regions where there is cultivation, such as Harari, the prevalence increases to 50% (Teklie et al., 2017).

The scientific community is not clearly aware of “problematic khat use,” which is why some aspects of problematic khat use, such as withdrawal experiences, have not been explicitly addressed in the different diagnostic statistical manual (DSM) or International classification of disease (ICD) versions. Developing a feasible, acceptable, and validated problematic khat use screening tool would facilitate future epidemiological studies. Developing a brief and valid problematic khat use screening tool would also serve as a basis to establish a cohort and conduct prospective follow-up studies about different aspects of problematic khat use. Such screening tools may have clinical utility towards the early detection of problematic khat use.

Prior work on problematic khat use have frequently adapted the Severity of Dependence Scale (SDS) or DSM-IV criteria developed for other drugs of abuse and dependency (Kassim et al., 2010; Nakajima et al., 2014).

In Ethiopia, a study by Duresso et al. (2016) offers some preliminary evidence that khat use is a valid diagnostic category, within the DSM-5 framework, but to our knowledge, this effort has not been further replicated or extended. The SDS also appears valid to identify individuals experiencing a khat use disorder syndrome (Duresso et al., 2018). But the construct of problematic khat use is more expansive than the DSM-5 criteria for stimulate use disorders or SDS (Mihretu et al., 2017).

In another study, a harmful khat use screening tool (Gebrehanna et al., 2014) was developed by assembling items from AUDIT, DSM, ICD, and others. The use of problematic alcohol use indicators and problematic cannabis use may lack domain specificity and contextual relevance when applied to problematic khat use (Cronbach & Meehl, 1955). Both of the previous studies in Ethiopia attempted to measure problematic khat use targeting university students. The studies also lacked conceptual and content validity because the studies were not informed by qualitative conceptulisation to develop the measures (Higgins & Green, 2011), and the adaptation process was also not rigorous (Mihretu et al., 2019). The use of poorly validated screening tools could be the main source of discrepancies in the findings of existing literature about the harms of khat use (Warfa et al., 2007).

Previous studies reported indicators of problematic khat use such as frequency of use(twice or more per week) (Gebrehanna et al., 2014) and duration of chewing sessions(≥6 hours) (El-Setouhy et al., 2016). But none of them have employed modern psychometric methods such as item response theory or latent profile analysis which allows to get better evidence about psychometric properties of problematic khat use screening tool (Pontes et al., 2014).

The existing studies on measuring problematic khat use, particularly local studies, are atheoretical and confirmatory in their approach and thus lacked specificity (Billieux et al., 2015). Previous studies tried to determine the construct validity of problematic khat use, but this was not informed by a clear, testable and explanatory hypothesis or theory.

Previous studies are also graded poor against recommended psychometric properties of a given screening tool (Piontek et al., 2008).According to COSMIN (COnsensus-based Standards for the selection of health Measurement INstruments), domains of reliability, validity and responsiveness of a given tool need to be evaluated. But previous studies missed to considered content validity of problematic khat use which is one of the most important measurement properties. The COSMIN working group also recommended that a given tool, first of all, should be clear that all contents are relevant, comprehensive, and comprehensible with respect to the construct of interest and target population (Mokkink et al., 2010).

Thus, we attempted to address the gaps we identified above through designing the different phases of the study systematically. Systematic review and qualitative study were conducted prior to the current study to identify indicators of problematic khat use and to explore and understand how people conceptualise it (Mihretu et al., 2019); (Mihretu, et al., 2020). The prior studies focused on defining and validating the construct problematic khat use. The current study aimed to develop and then conduct initial psychometric evaluation of a screening tool for problematic khat use in Gurage, South-central Ethiopia. In another study, we conducted a validation of the tool and found that it is useful and valid (Mihretu et al., 2022).

Study design

The development and validation of problematic khat use screening tool involved the following methods and phases: i) Systematic review to identify any existing screening tools that have been used to measure problematic khat use (Mihretu et al., 2019) ii) qualitative study on conceptualization and domains relevant to problematic khat use (Mihretu, et al., 2020) iii) pilot phase after the development of the PKU screening tool iv) full-scale validation study to assess the psychometric properties of the PKU screening tool (Mihretu et al., 2022). All the methods are conducted according to the recommended rigor and clearly described in detail in the respective publications. The current study focused on the development and initial psychometric evaluation of the scale using qualitative, cross-sectional, and cohort study designs. Thus, our approach for scale development was both inductive and deductive.

Setting: The study was conducted in Gurage zone, southern-central part of Ethiopia. Farming is the main occupation in rural areas, while petty trading is common in urban areas. False banana is used as the staple diet which is locally called kocho. The zone is a known khat-producing area and residence for many individuals who use khat (Alem et al., 1999). The study setting is described in detail in a previous publication (Mihretu, et al., 2020).

Participants, sample size determination, and sampling

This pilot study’s target population was all adults (18 years and above) who had been living in Wolkitie town and Kebena woreda, Gurage zone, for at least six months. A sampling frame was found from health posts in the kebele (the smallest administrative unit in the study setting). Two-stage random sampling was used (first random selection of households, and then a random selection of a person who uses khat in the household when there is more than one person who uses khat in the household). When the selected household has no member with khat use behavior, the next household was selected. A total of 352 individuals who use khat (one per household) participated in the pilot cross-sectional study. Throughout the course of data collection, we found that almost half of the visited households were not eligible, but only 8 households decline our invitation to take part in the study. A sub-sample 65 participants were selected to determine test-retest reliability. Sample size was determined, in line with sample size recommendation for factor analysis and IRT (i.e. 5 to 10 times the number of items in the scale) (Schumacker & Lomax, 2004). Accordingly, the sample size for this study was 6 times of the initial pool of items and about 20 times of the number of items in the final scale.

Process of the development of the screening tool and Item reduction

Figure 1.
Process of problematic khat use screening tool development
Figure 1.
Process of problematic khat use screening tool development
Close modal

The criteria we used to select items include 1) high discrimination and location parameters that represented a wide range of the latent trait for IRT; 2) factor loading; 3) adequacy of descriptive findings, different adequacy criteria set in Figure 1, such as reliability, item scale correlation, item-item correlation, mean and standard deviation; 4) consideration of salience to the study participants or experts (clinical and epidemiological utility). Items performing better for all criteria were considered for the scale.

Items were also chosen on the basis of face validity, clinical relevance, and coverage of relevant conceptual domains (i.e., khat use, khat dependence, and adverse consequences of khat use). Finally, special attention in item selection was given to gender and rural-urban appropriateness.

Data collection methods and procedures

Before the pilot survey, we interviewed participants to list all potential items that constitute problematic khat use. We wrote and recorded all the data. The interview findings were presented to panel of experts for evaluation. Then, cognitive interviewing with individuals who use khat was conducted to determine if the items cover related issues, understandable, and possible ways of rephrasing. Final pools of items, Likert scale format, were administered among individuals who use khat. Data were collected at the respondents’ home using trained data collectors, supervised by the principal investigator.

Data analysis

Expert consensus meetings and cognitive interviewing were audio-recorded and a note-taker documented all the data. The analysis was conducted at the item level. Items with the highest consensus were included for the next version of the scale. We used descriptive (Mean, standard deviation, inter-item, and item-total correlation, intraclass correlation, Cronbach’s alpha) and multivariate statistics (exploratory factor analysis and item response theory) to refine the scale and select items that should be included in the final scale. We choose maximum likelihood factoring (Fabrigar et al., 1999) and varimax rotation (Nunnally & Bernstein, 1994) considering the data distribution for the exploratory factor analysis. We used Item Response Theory (IRT) (the 2-parameter IRT model), to examine the relations between a participant’s position on a latent trait and the probability of disclosing experience on the different items. [18]. The first parameter is estimated for the “difficulty” of each item or the probability of endorsing an item given varying levels of the latent trait. The second parameter estimated the “discrimination” of each item or the item’s ability to discriminate among people with various levels of the latent trait. IRT assumptions of unidimensionality, local independence, and monotonicity were examined (Wirth & Edwards, 2007). We used STATA 16 (Stata Corporation, n.d.) and SPSS 23 AMOS computer software packages (Arbuckle, 2019) for analyzing the data.

Ethical considerations

The Institutional Review Board (IRB) of the College of Health Sciences, Addis Ababa University (Ref 008/18/Psy) provided us ethical clearance. Written informed consent was sought and obtained from all the participants before data collection.

Item generation, selection, and refinement

Table 1.
Process of item pool generation, item selection, and refinement
Method Results 
Decision about the theoretical framework A systematic review and qualitative study -7 domains; khat-related problems, khat dependency, adverse psychological reactions, the pattern of use, the reason of khat use; khat use with other psychoactive substances, the context of khat use. 
Categorization and decision on the concepts related to problematic khat use Experts panel -Retain the 6 domains. - Khat use with other psychoactive substances was dropped because it is poly substance misuse than an independent domain of problematic khat use 
Item generation Item selection -Item listing -From other problematic substance use screening tools -54 items -Separate file 1 
Experts panel -reduced to 45 items -Experts also reflect 3 months’ time frame -Experts also comment to have a uniform response category for simplicity 
Preliminary content and face validity of the items Cognitive interviewing -46 items -5 unclear items were dropped, and another 6 were added -amount of khat was found not appropriate and another proxy measure-session of khat use was suggested 
Experts panel -54 items; separating double-barred items and restating - Items measuring non-problematic khat use, such as How often do you chew khat for recreation? were dropped. 
Pretest -4 items dropped; endorsed by few participants (<2%). Example, abdominal pain, headache, staying home when not chewing khat. 
Item Analysis to inform refinement and selection Pilot study -20 items which qualify more than one statistical criterion (mean, IIC, factor loading, ISC, ICC, difficulty and severity) were retained. 
Final item refinement and scale evaluation Experts panel and cognitive interviewing -Finally, 17 items were retained after evaluated against the mean, IIC, factor loading, ISC, ICC, difficulty, and severity. -Item 6-how much you believe that you are addicted to khat use? was dropped because experts commented that it reflect a broader concept than a single item. -Experts add a new item- how often do you use khat in a risky situation but was not found valid by the participants 
Method Results 
Decision about the theoretical framework A systematic review and qualitative study -7 domains; khat-related problems, khat dependency, adverse psychological reactions, the pattern of use, the reason of khat use; khat use with other psychoactive substances, the context of khat use. 
Categorization and decision on the concepts related to problematic khat use Experts panel -Retain the 6 domains. - Khat use with other psychoactive substances was dropped because it is poly substance misuse than an independent domain of problematic khat use 
Item generation Item selection -Item listing -From other problematic substance use screening tools -54 items -Separate file 1 
Experts panel -reduced to 45 items -Experts also reflect 3 months’ time frame -Experts also comment to have a uniform response category for simplicity 
Preliminary content and face validity of the items Cognitive interviewing -46 items -5 unclear items were dropped, and another 6 were added -amount of khat was found not appropriate and another proxy measure-session of khat use was suggested 
Experts panel -54 items; separating double-barred items and restating - Items measuring non-problematic khat use, such as How often do you chew khat for recreation? were dropped. 
Pretest -4 items dropped; endorsed by few participants (<2%). Example, abdominal pain, headache, staying home when not chewing khat. 
Item Analysis to inform refinement and selection Pilot study -20 items which qualify more than one statistical criterion (mean, IIC, factor loading, ISC, ICC, difficulty and severity) were retained. 
Final item refinement and scale evaluation Experts panel and cognitive interviewing -Finally, 17 items were retained after evaluated against the mean, IIC, factor loading, ISC, ICC, difficulty, and severity. -Item 6-how much you believe that you are addicted to khat use? was dropped because experts commented that it reflect a broader concept than a single item. -Experts add a new item- how often do you use khat in a risky situation but was not found valid by the participants 

Content validity

Experts evaluated each item for its clarity and content relevance. The initial number of items were 50 and these items fall under seven themes of problematic khat use. 1) khat-related problems; 2) khat dependency; 3) adverse psychological reactions; 4) Pattern of use (frequency, duration of session, amount of khat, time of use); 5) reason of khat use; 6) khat use with other psychoactive substances; 7) context of khat use.

Face validity and scale refinement

Based on the cognitive interviewing and pre-pilot, we improved the readability of the scale, made the style of writing, the scale layout, and scale formatting to be more consistent and clear. We dropped items that do not sound acceptable, applicable, or usable to the participants (Table 1).

The mean age of participants was 33 years. Only forty-one (12%) participants were females, and 226 (64%) were urban residents. Regarding khat use behavior, the mean age of onset for khat use was 20 years. The average consecutive years of khat use was ten years, where the minimum was one year, and the maximum was 55 years. The average daily amount of khat use costs 39.5ETB or 1USD during the time of data collection. On average, participants stayed feeling high/state of mirqanna for about 3 hours. Additional information is presented in Table 2.

Table 2.
Socio-demographic and general khat use characteristics of pilot study participants
Characteristics Frequency Percentage 
Mean age in years(±SD) 32.66(±12.2) 
Gender Male 311 88.4 
Female 41 11.6 
Residence Urban 226 64.2 
Rural 124 35.2 
Marital status Single 154 43.8 
Married 193 54.8 
Widowed or divorced 1.4 
Education Cannot read and write 46 13.1 
Read and write only 30 8.5 
Primary 129 36.6 
Secondary 74 21.0 
College and above 67 19.0 
Average monthly income (±SD) 2077 (SD±1909) 
Religion Orthodox 95 27.0 
Muslim 241 68.5 
Protestant 10 2.8 
Catholic 1.7 
Employment status House-wife  2.3 
Farmer 97 27.6 
Own business 35 9.9 
Student 35 9.9 
Employee 57 16.2 
Daily worker 113 32.1 
Others 2.0 
Relative wealth Low 184 52.3 
Medium 153 43.5 
High 15 4.3 
With whom you chew With others  245  69.6 
Alone 107 30.4 
Family history of chewing Yes 212 60.2 
No 134 38.1 
The average age of onset of khat use (±SD) 20.27 (±5.8) 
Average consecutive years of khat use (±SD) 10.28 (±9.39) 
The average daily amount of khat in ETB/USD 39.5/1USD 
Hours stayed feeling high/mirqanna 3(±1.6) 
Characteristics Frequency Percentage 
Mean age in years(±SD) 32.66(±12.2) 
Gender Male 311 88.4 
Female 41 11.6 
Residence Urban 226 64.2 
Rural 124 35.2 
Marital status Single 154 43.8 
Married 193 54.8 
Widowed or divorced 1.4 
Education Cannot read and write 46 13.1 
Read and write only 30 8.5 
Primary 129 36.6 
Secondary 74 21.0 
College and above 67 19.0 
Average monthly income (±SD) 2077 (SD±1909) 
Religion Orthodox 95 27.0 
Muslim 241 68.5 
Protestant 10 2.8 
Catholic 1.7 
Employment status House-wife  2.3 
Farmer 97 27.6 
Own business 35 9.9 
Student 35 9.9 
Employee 57 16.2 
Daily worker 113 32.1 
Others 2.0 
Relative wealth Low 184 52.3 
Medium 153 43.5 
High 15 4.3 
With whom you chew With others  245  69.6 
Alone 107 30.4 
Family history of chewing Yes 212 60.2 
No 134 38.1 
The average age of onset of khat use (±SD) 20.27 (±5.8) 
Average consecutive years of khat use (±SD) 10.28 (±9.39) 
The average daily amount of khat in ETB/USD 39.5/1USD 
Hours stayed feeling high/mirqanna 3(±1.6) 

Descriptive statistics and CTT analysis reports

Table 3.
Descriptive statistics and CTT analysis reports
Items Items code Mean SD IIC=number of items correlated with( >0.30 or <0.9) ISC ICC Alpha Factor loading 
In the past three months, how often have you chewed khat? pk1 3.35 1.13 16 0.55 0.05 .924 .54 
Marked diminished effect with continued use of the same amount of khat pk50 1.63 1.53 24 0.62 0.18 .924 .61 
Restlessness pk49 1.36 1.55 25 0.66 0.13 .924 .62 
Increased amount of khat over time pk40 1.45 1.43 22 0.6 0.42 .924 .55 
During the past three months, how often have you experienced distressing emotional or behavioral symptoms when you didn’t chew khat? pk18 1.49 1.63 18 0.62 0.55 .922 .65 
During the past three months, how often have you felt depressed when you didn’t chew khat? pk21 2.23 1.56 20 0.54 0.17 .926 .77 
During the past three months, how often have you experienced irritability when you didn’t chew khat? pk22 1.79 1.57 19 0.7 0.52 .922 .76 
During the past three months, how often have you experienced vivid, unpleasant dreams when you didn’t chew khat? pk23 1.60 1.68 16 0.6 0.45 .923 .65 
During the past three months, how often have you experienced tear falling that blurred your vision when you didn’t chew khat? pk26 1.55 1.56 21 0.67 0.44 .923 .72 
During the past three months, how often have you experienced frequent yawning when you didn’t chew khat? pk27 2.13 1.59 17 0.62 0.35 .922 .67 
During the past three months, how often have your motivation to work reduced when you didn’t chew khat? pk28 1.79 1.59 16 0.62 0.44 .923 .63 
During the past three months, how often have you experienced fatigue or reduced energy when you didn’t chew khat? pk30 1.45 1.55 15 0.55 0.22 .924 .58 
During the past three months, how often have you had a strong craving to chew khat? pk32 1.99 1.52 23 0.63 0.4 .926 .68 
During the past three months, how often have you chewed khat to treat distressing or dysphoric experiences because you didn’t chew khat? pk5 2.32 1.5 21 0.68 0.25 .922 .7 
During the past three months, when you chewed khat, on average, how much time have you spent chewing khat without doing what was normally expected of you? pk4 2.66 .92 19 0.57 0.23 .92 .51 
During the past three months, how much have khat led to financial problems? pk33 1.66 1.62 13 0.52 0.4 .923 .45 
During the past three months, to what extent have you failed to do what was normally expected of you because of your use of khat? pk47 1.07 1.61 16 0.55 0.05 .924 .52 
Items Items code Mean SD IIC=number of items correlated with( >0.30 or <0.9) ISC ICC Alpha Factor loading 
In the past three months, how often have you chewed khat? pk1 3.35 1.13 16 0.55 0.05 .924 .54 
Marked diminished effect with continued use of the same amount of khat pk50 1.63 1.53 24 0.62 0.18 .924 .61 
Restlessness pk49 1.36 1.55 25 0.66 0.13 .924 .62 
Increased amount of khat over time pk40 1.45 1.43 22 0.6 0.42 .924 .55 
During the past three months, how often have you experienced distressing emotional or behavioral symptoms when you didn’t chew khat? pk18 1.49 1.63 18 0.62 0.55 .922 .65 
During the past three months, how often have you felt depressed when you didn’t chew khat? pk21 2.23 1.56 20 0.54 0.17 .926 .77 
During the past three months, how often have you experienced irritability when you didn’t chew khat? pk22 1.79 1.57 19 0.7 0.52 .922 .76 
During the past three months, how often have you experienced vivid, unpleasant dreams when you didn’t chew khat? pk23 1.60 1.68 16 0.6 0.45 .923 .65 
During the past three months, how often have you experienced tear falling that blurred your vision when you didn’t chew khat? pk26 1.55 1.56 21 0.67 0.44 .923 .72 
During the past three months, how often have you experienced frequent yawning when you didn’t chew khat? pk27 2.13 1.59 17 0.62 0.35 .922 .67 
During the past three months, how often have your motivation to work reduced when you didn’t chew khat? pk28 1.79 1.59 16 0.62 0.44 .923 .63 
During the past three months, how often have you experienced fatigue or reduced energy when you didn’t chew khat? pk30 1.45 1.55 15 0.55 0.22 .924 .58 
During the past three months, how often have you had a strong craving to chew khat? pk32 1.99 1.52 23 0.63 0.4 .926 .68 
During the past three months, how often have you chewed khat to treat distressing or dysphoric experiences because you didn’t chew khat? pk5 2.32 1.5 21 0.68 0.25 .922 .7 
During the past three months, when you chewed khat, on average, how much time have you spent chewing khat without doing what was normally expected of you? pk4 2.66 .92 19 0.57 0.23 .92 .51 
During the past three months, how much have khat led to financial problems? pk33 1.66 1.62 13 0.52 0.4 .923 .45 
During the past three months, to what extent have you failed to do what was normally expected of you because of your use of khat? pk47 1.07 1.61 16 0.55 0.05 .924 .52 

A high mean score was observed for items about the frequency of khat use/pk1 (X̅ =3.35), amount of hours spent while chewing khat/pk4 (X̅ =2.66), chewing khat to relieve distress/pk5 (X̅ =2.32), and depressed mood when the khat is withdrawn/pk21 (X̅ =2.23). Item scale or item total correlation range from 0.13 to 0.7. Eleven items were below correlation coefficient of 0.3 for item-item correlation. Item-item correlation findings are presented in Table 3. The Item-scale correlation (ISC) ranged from 0.52 to 0.7, reliability (Cronbach alpha), about 0.92 on average, and EFA (factor loading) parameters ranges from 0.45 to 0.77 of the entire final 17 items were above the acceptable range. On the other hand, the test-retest reliability criterion was low (ICC <0.3 for some items). Descriptive statistics and CTT analysis reports of the 50 items are in separate file 2.

Dimensionality

Exploratory factor analysis (EFA) was carried out. Kaiser-Meyer-Olkin (KMO) was 0.94, indicating the sample is adequate. The Bartlett’s Test of Sphericity was also significant, indicating an adequate correlation among the items for factor analysis. EFA, with maximum likelihood factoring and varimax rotation, indicates high eigenvalue ratio of the first (eigenvalue = 7.3) to the second (eigenvalue = 1.2) factor.

Initially, the data resulted in two factors with eigenvalues greater than 1. The first factor contributed about 43% of the variance, which is more than a 20% difference from the second factor. Only one item about increased amount over time (pk40) loads on the two factors, but the loading was not significantly different. The scree plot also suggested a one factor solution. Using different criteria; scree test, the Kaiser’s eigenvalue-greater-than-one rule, parallel test, we settled on a unidimensional factor solution called problematic khat use.

Figure 2.
Scree Plot

Item response theory (IRT)

Table 4.
Slope (α) and category threshold parameters (β) estimated by the Graded Response Model
Items Item code Slope α Threshold β1 Threshold β2 Threshold β3 Threshold β4 
Frequency pk1 1.69 -2.88 -1.7 -0.90 -0.75 
Absence of feeling high for reduced amount pk50 1.7 -0.57 0.15 0.63 1.42 
Restlessness pk49 1.87 -0.26 0.35 0.89 1.52 
Increased amount over time pk40 1.31 -0.7 0.37 1.24 1.77 
Emotional or behavioral symptoms when not chewing pk18 2.21 -0.26 0.33 0.66 1.19 
Depressed mood pk21 2.35 -0.99 -0.42 0.20 0.62 
Irritability pk22 2.44 -0.51 -0.07 0.5 0.8 
Vivid unpleasant dreams-dukak pk23 1.93 -0.22 0.15 0.61 .93 
Tear falling pk26 2.27 -0.3 0.14 0.61 .98 
Yawning pk27 2.01 -0.93 -0.42 0.16 .7 
Unmotivation pk28 1.78 -0.51 -0.11 0.43 1.04 
Fatigue or reduced energy pk30 1.55 -0.14 0.26 0.77 1.42 
Craving-harara pk32 1.84 -0.91 -0.26 0.34 .86 
Chewing for self-treatment pk5 1.87 -1.26 -0.47 0.003 .62 
Time spent pk4 1.46 -4.36 -2.72 0.02 1.31 
Financial problems pk33 1.39 -0.64 0.03 0.73 1.57 
Chewing to do routines pk47 1.64 0.25 0.77 1.32 1.84 
Items Item code Slope α Threshold β1 Threshold β2 Threshold β3 Threshold β4 
Frequency pk1 1.69 -2.88 -1.7 -0.90 -0.75 
Absence of feeling high for reduced amount pk50 1.7 -0.57 0.15 0.63 1.42 
Restlessness pk49 1.87 -0.26 0.35 0.89 1.52 
Increased amount over time pk40 1.31 -0.7 0.37 1.24 1.77 
Emotional or behavioral symptoms when not chewing pk18 2.21 -0.26 0.33 0.66 1.19 
Depressed mood pk21 2.35 -0.99 -0.42 0.20 0.62 
Irritability pk22 2.44 -0.51 -0.07 0.5 0.8 
Vivid unpleasant dreams-dukak pk23 1.93 -0.22 0.15 0.61 .93 
Tear falling pk26 2.27 -0.3 0.14 0.61 .98 
Yawning pk27 2.01 -0.93 -0.42 0.16 .7 
Unmotivation pk28 1.78 -0.51 -0.11 0.43 1.04 
Fatigue or reduced energy pk30 1.55 -0.14 0.26 0.77 1.42 
Craving-harara pk32 1.84 -0.91 -0.26 0.34 .86 
Chewing for self-treatment pk5 1.87 -1.26 -0.47 0.003 .62 
Time spent pk4 1.46 -4.36 -2.72 0.02 1.31 
Financial problems pk33 1.39 -0.64 0.03 0.73 1.57 
Chewing to do routines pk47 1.64 0.25 0.77 1.32 1.84 

Note: The higher the slope parameter α, the better the item discriminates per latent trait. Between-category threshold parameters β represent the points along the latent trait at which the probability of responding to a certain category passes 0.5. The greater the variety of threshold parameters β (negative to positive), the more descriptive the item is.

Next, we fit the 2-parameter IRT model to the data. Discrimination (slope) parameters ranged from 1.31(item about increased amount over time-PK40) to 2.44(item about irritability when the khat is withdrawn-PK22). This indicates each item had a strong ability to delineate less problematic vs. high problematic khat use. An item about irritability when the khat is withdrawn (PK22) showed the highest discrimination. Among others, items such as depressed mood when the khat is withdrawn (PK21), tear falling when the khat is withdrawn (PK26), emotional or behavioral symptoms when not chewing (PK18), and yawning when not chewing(PK27) had also high discrimination (αthresholds >2.0).

An item about chewing khat to do routines (PK47) was the most difficult (first category β threshold =0.25), which was the only item with a positive coefficient for the first category. Fatigue or reduced energy (PK30), vivid unpleasant dreams-dukak (PK23), restlessness (PK49), emotional or behavioral symptoms when not chewing (PK18), and tear falling/blurred vision (PK26) were also relatively more difficult (first β thresholds>-0.5) compared to other items. The discrimination estimate of 2.44 for item about irritability (PK22) indicates that an increase of one standard deviation on the latent trait(problematic khat use) results in a 2.44 fold increase in the odds of having irritability experience. The severity (difficulty) parameters calibrated to the observed sample estimate the point above or below the mean of the latent trait (in standard deviations) at which 50% of the population will endorse that criterion. In this sample, the severity estimates of ≤0.0 (all items for category 1, except an item about chewing to do routines) indicate that even participants below average on the latent trait have at least a 50% probability of endorsing the item. For the item about the frequency of khat use all the participants, including below-average, had a 50% probability of endorsing the item, which means it is the least difficult item(β has a negative coefficient for all response categories). Items about irritability (PK22), depressed mood (PK 21), tear falling/blurred vision (PK26), emotional or behavioral symptoms when not chewing (PK18), restlessness (PK 49), yawning (PK 27), chewing for self-treatment (PK 5), vivid unpleasant dreams-dukak (PK 23), craving –harara (PK 32), and reduced motivation (PK 28) had a lot of information than other items as indicated from item information function graphs, Figure 3.

Figure 3.
Item information function
Figure 3.
Item information function
Close modal

Test Information Function (TIF) graph, Figure 4, depicts that a lot of information is gained from medium ability individuals. Thus, the current test provides a lot of information in the latent trait’s moderate range, and their TIF is declining steeply at both extremes. All IRT analysis reports and graphs are presented in separate file 3.

Figure 4.
test information function
Figure 4.
test information function
Close modal

No non-uniform DIF was found for sex, residence, and mean split age category (p> 0.05) except a single item (item about reduced motivation) flagged for DIF regarding age category (p<0.05). Uniform DIF was also not found for sex (p>0.05). Uniform DIF was found for 5 items for residence and mean split age category (p<0.05). Items about reduced motivation, amount of time spent chewing khat, fatigue or reduced energy, and increased khat over time were flagged for both residence and age category. The non-uniform and uniform DIF reports are found at separate file 4.

Through a systematic review, qualitative study, experts’ panel, and item listing, we first generated a pool of items to measure problematic khat use. Then, item reduction was done using several criteria, including CTT (inter-item and item-total correlation, Cronbach’s alpha and exploratory factor analysis) and IRT analysis reports of the pilot study. Finally, we developed a Problematic Khat Use Screening Test with 17 items (PKUST-17).

The current problematic khat use screening test has a unidimensional factor solution similar to other problematic substance use measures (Budney, 2006). But the proportion of variance (43%) accounted for by the factor was lower for problematic khat use compared to measures of other substances misuse. Other studies reported 67.4%, 68.1%, 69.2%, 71.8%, and 78.6% proportion of variance for cannabis, alcohol, stimulants, and cocaine dependency syndrome, respectively (Budney, 2006).The lower proportion of variance for problematic khat use screening tool could be related to novelty of the test, slightly broader dimensions (not limited to dependency as many other measures) and targets the general population rather than a clinical population. This resembles other studies that assert strong support for DSM-5 approach for diagnosing a unidimensional substance use disorder that combines abuse and dependency criteria (Hasin et al., 2013).

The items in the scale have high to very high discrimination (Baker, 2001). According to Baker (2001), values from 1.35–1.69 are high and 1.70 and above very high discrimination. TIF, Figure 4, graph depicts that the scale provides the most information for medium ability individuals. They provide a lot of information in the moderate range of the latent trait, and their TIF is declining steeply at both extremes. Information drops fairly quickly above 2 as well as below -2, suggesting a fairly uniform TIF (Reise et al., 2005). IRT scale scores in this region, Figure 4, have a standard error near 0.35, which corresponds roughly to reliability of 0.9 (Reise et al., 2005). According to Embretson and Reise (2000), information refers to the precision with which an item measures a specific factor to a particular level. Thus, our TIF graph indicates that the problematic khat use screening test is more precise for medium ability individuals, and provides less information for individuals with latent trait estimates exceeding + or -3, which implies that the screening tool will have may have good utility for the general population and the primary healthcare setting rather than for the clinical population.

The problematic khat use screening tool we developed differs from many other problematic substance use measures in general and khat in particular, mainly for not endorsing social harms and impairment of control as essential criteria (APA, 2013). We found this because khat use has a more sociocultural and functional reason in the study setting, which prevents social harm. We dropped items about guilt or remorse after chewing, others’ concern, and hazardous use, which are essential items for other measures such as the Alcohol Use Identification Test (Saunders et al., 1993). The reasons mentioned earlier for khat use are not ego-dystonic or customary in some frameworks in the study setting (Mihretu, et al., 2020). The study found that withdrawal experiences were statistically significant indicators of problematic khat use. These withdrawal criteria are similar to stimulant withdrawals and some cannabis withdrawals or opioid withdrawals (APA, 2013). Our findings suggest that problematic khat use is mainly defined by khat dependency, although there are items about the frequency of khat use, financial harms, and time spent chewing khat.

The amount of use has been an essential indicator in many substances, but a precise unit of measurement was found invalid in khat in the study setting. Another valid proxy measure for an increased amount of khat over time could be length of session of khat use. Length of session over increased amount of khat could be important to reduce stigma (Rehm et al., 2013).

We found the frequency of khat use as the most important indicator of problematic khat use, but the frequency threshold needs to be established by future studies. All the items were also simple, easy to understand, brief, capable of being self-administered, and they do not require specific administration training.

The study has strengths. We used both inductive and deductive approaches to develop the scale and refine items that overcome many of the previous studies’ limitations focusing on measuring problematic khat use, which used the deductive approach. Most extant instruments were also constructed using classical test theory methodology, which has its psychometric limitations that directly bear on severity measurement. According to (Reise et al., 2005) the use of factor analysis, for example, does not utilize the full power of modern scaling techniques that take item properties into account (e.g., item-level severity). Thus, while producing scales with high internal consistency levels, the use of factor analytic methods alone does not allow one to build instruments to discriminate individuals along with selected ranges of an underlying trait. As a limitation of the study, differential item functioning requires increased sample size than the current study’s sample size to get the real functioning of the items across different characteristics of participants.

We developed a Problematic Khat Use Screening test (PKUST-17), consisting of 17 items. The tool has items related to frequency of khat use, amount of time spent chewing khat, increased amount of khat over time, financial problems due to khat use, and other withdrawal experiences of khat use. Initial EFA found that the 17 items load on to a single factor. IRT suggested that all the 17 items have acceptable difficulty, severity and contribute useful information to the total score. According to the IRT test information curve, the current problematic khat use screening tool will have a very good utility for the general population or in the primary health care setting. Future studies should evaluate the psychometric properties (all domains of validity and reliability) of the final scale.

The authors would like to acknowledge the study participants and the African Mental Health Research Initiative (AMARI) of DELTAS Africa Initiative.

This work was supported through the DELTAS Africa Initiative (DEL-15- 01). The DELTAS Africa Initiative is an independent funding scheme of the African Academy of Sciences’ (AAS) Alliance for Accelerating Excellence in Science in Africa (AESA) and supported by the New Partnership for Africa’s Development Planning and Coordinating Agency (NEPAD Agency), with funding from the Wellcome Trust (DEL-15-01) and the UK Government. The views expressed in this publication are those of the authors and not necessarily those of AAS, NEPAD Agency, Wellcome Trust or the UK Government

The authors have no competing interests.

All relevant data are analyzed and included in the manuscript and we can deliver the full dataset upon request.

Alem, A., Kebede, D., & Kullgren, G. (1999). The prevalence and socio-demographic correlates of khat chewing in Butajira, Ethiopia. Acta Psychiatrica Scandinavica, 100, 84–91. https://doi.org/10.1111/j.1600-0447.1999.tb10699.x
APA. (2013). Diagnostic and statistical manual of mental disorders (DSM-5®). American Psychiatric Pub.
Arbuckle, J. (2019). IBM SPSS Amos 23 user’s guide. Amos Development Corporation.
Baker, F. B. (2001). The basics of item response theory: ERIC.
Billieux, J., Schimmenti, A., Khazaal, Y., Maurage, P., & Heeren, A. (2015). Are we overpathologizing everyday life? A tenable blueprint for behavioral addiction research. Journal of Behavioral Addictions, 4(3), 119–123. https://doi.org/10.1556/2006.4.2015.009
Budney, A. J. (2006). Are specific dependence criteria necessary for different substances: how can research on cannabis inform this issue? Addiction, 101, 125–133. https://doi.org/10.1111/j.1360-0443.2006.01582.x
Cox, G., & Rampes, H. (2003). Adverse effects of khat: a review. Advances in Psychiatric Treatment, 9(6), 456–463. https://doi.org/10.1192/apt.9.6.456
Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52(4), 281–302. https://doi.org/10.1037/h0040957
Dagne, E., Adugna, Y., Kebede, E., & Atilaw, Y. (2010). Determination of Levels of Cathine in Khat (Catha edulis) Leaves and its Detection in Urine of Khat Chewers: A Preliminary Report. Sous presse.
Duresso, S. W., Matthews, A. J., Ferguson, S. G., & Bruno, R. (2016). Is khat use disorder a valid diagnostic entity? Addiction, 111(9), 1666–1676. https://doi.org/10.1111/add.13421
Duresso, S. W., Matthews, A. J., Ferguson, S. G., & Bruno, R. (2018). Using the Severity of Dependence Scale to screen for DSM-5 khat use disorder. Human Psychopharmacology: Clinical and Experimental, 33(2), e2653. https://doi.org/10.1002/hup.2653
Elmi, A. S. (1983). The chewing of khat in Somalia. Journal of Ethnopharmacology, 8(2), 163–176. https://doi.org/10.1016/0378-8741(83)90052-1
El-Setouhy, M., Alsanosy, R. M., Alsharqi, A., & Ismail, A. A. (2016). Khat Dependency and Psychophysical Symptoms among Chewers in Jazan Region, Kingdom of Saudi Arabia. BioMed Research International, 2016, 1–6. https://doi.org/10.1155/2016/2642506
Emberston, S. E., & Reise, S. P. (2000). Item Response Theory for Psychologists.
Fabrigar, L. R., Wegener, D. T., MacCallum, R. C., & Strahan, E. J. (1999). Evaluating the use of exploratory factor analysis in psychological research. Psychological Methods, 4(3), 272–299. https://doi.org/10.1037/1082-989x.4.3.272
Fekadu, A., Alem, A., Hanlon, C. (2007). Alcohol and drug abuse in Ethiopia: Past, present and Future. African Journal of Drug Alcohol Studies, 61.
Gebrehanna, E., Berhane, Y., Worku, A. (2014). Prevalence and predictors of harmful khat use among university students in ethiopia. Substance Abuse: Research and Treatment, 8, 45–51. https://doi.org/10.4137/sart.s14413
Haile, D., Lakew, Y. (2015). Khat chewing practice and associated factors among adults in Ethiopia: Further analysis using the 2011 demographic and health survey. PLoS ONE, 10(6), e0130460. https://doi.org/10.1371/journal.pone.0130460
Hasin, D. S., O’Brien, C. P., Auriacombe, M., Borges, G., Bucholz, K., Budney, A., Compton, W. M., Crowley, T., Ling, W., Petry, N. M., Schuckit, M., Grant, B. F. (2013). DSM-5 criteria for substance use disorders: Recommendations and rationale. American Journal of Psychiatry, 170(8), 834–851. https://doi.org/10.1176/appi.ajp.2013.12060782
Higgins, J. P., Green, S. (2011). Cochrane handbook for systematic reviews of interventions (Vol. 4). John Wiley Sons.
Kalix, P. (1988). Khat: A plant with amphetamine effects. Journal of Substance Abuse Treatment, 5(3), 163–169. https://doi.org/10.1016/0740-5472(88)90005-0
Kaner, E. F. S., Dickinson, H. O., Beyer, F., Pienaar, E., Schlesinger, C., Campbell, F., Saunders, J. B., Burnand, B., Heather, N. (2009). The effectiveness of brief alcohol interventions in primary care settings: A systematic review. Drug and Alcohol Review, 28(3), 301–323. https://doi.org/10.1111/j.1465-3362.2009.00071.x
Kassim, S., Islam, S., Croucher, R. (2010). Validity and reliability of a Severity of Dependence Scale for khat (SDS-khat). Journal of Ethnopharmacology, 132(3), 570–577. https://doi.org/10.1016/j.jep.2010.09.009
López-Pelayo, H., Batalla, A., Balcells, M. M., Colom, J., Gual, A. (2014). Assessment of cannabis use disorders: A systematic review of screening and diagnostic instruments. Psychological Medicine, 45(6), 1121–1133. https://doi.org/10.1017/s0033291714002463
Mihretu, A., Fekadu, A., Norton, S., Habtamu, K., Teferra, S. (2022). Validation of the Problematic Khat Use Screening Test: A Cross-Sectional Study. European Addiction Research, 28(4), 275–286. https://doi.org/10.1159/000522618
Mihretu, A., Nhunzvi, C., Fekadu, A., Norton, S., Teferra, S. (2019). Definition and Validity of the Construct “Problematic Khat Use”: A Systematic Review. European Addiction Research, 25(4), 161–172. https://doi.org/10.1159/000499970
Mihretu, A., Teferra, S., Fekadu, A. (2017). What constitutes problematic khat use? An exploratory mixed methods study in Ethiopia. Substance Abuse Treatment, Prevention, and Policy, 12(1), 17. https://doi.org/10.1186/s13011-017-0100-y
Mihretu, et al. (2020). Exploring the concept of problematic khat use: A qualitative study in Gurage community. South-central Ethiopia BMJ Open.
Mokkink, L. B., Terwee, C. B., Patrick, D. L., Alonso, J., Stratford, P. W., Knol, D. L., Bouter, L. M., de Vet, H. C. W. (2010). The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. Journal of Clinical Epidemiology, 63(7), 737–745. https://doi.org/10.1016/j.jclinepi.2010.02.006
Nakajima, M., Dokam, A., Alsameai, A., AlSoofi, M., Khalil, N., al’Absi, M. (2014). Severity of khat dependence among adult khat chewers: The moderating influence of gender and age. Journal of Ethnopharmacology, 155(3), 1467–1472. https://doi.org/10.1016/j.jep.2014.07.030
Numan, N. (2004). Exploration of adverse psychological symptoms in Yemeni khat users by the Symptoms Checklist-90 (SCL-90). Addiction, 99(1), 61–65. https://doi.org/10.1111/j.1360-0443.2004.00570.x
Nunnally, J. C., Bernstein, I. (1994). Validity. Psychometric Theory, 3, 99–132.
Odenwald, M., Neuner, F., Schauer, M., Elbert, T., Catani, C., Lingenfelder, B., Hinkel, H., Häfner, H., Rockstroh, B. (2005). Khat use as risk factor for psychotic disorders: A cross-sectional and case-control study in Somalia. BMC Medicine, 3(1), 5. https://doi.org/10.1186/1741-7015-3-5
Piontek, D., Kraus, L., Klempova, D. (2008). Short scales to assess cannabis-related problems: A review of psychometric properties. Substance Abuse Treatment, Prevention, and Policy, 3(1), 25. https://doi.org/10.1186/1747-597x-3-25
Pontes, H. M., Király, O., Demetrovics, Z., Griffiths, M. D. (2014). The conceptualisation and measurement of DSM-5 Internet Gaming Disorder: The development of the IGD-20 Test. PLoS ONE, 9(10), e110137. https://doi.org/10.1371/journal.pone.0110137
Rehm, J., Marmet, S., Anderson, P., Gual, A., Kraus, L., Nutt, D. J., Room, R., Samokhvalov, A. V., Scafato, E., Trapencieris, M., Wiers, R. W., Gmel, G. (2013). Defining substance use disorders: Do we really need more than heavy use? Alcohol and Alcoholism, 48(6), 633–640. https://doi.org/10.1093/alcalc/agt127
Reise, S. P., Ainsworth, A. T., Haviland, M. G. (2005). Item response theory: Fundamentals, applications, and promise in psychological research. Current Directions in Psychological Science, 14(2), 95–101. https://doi.org/10.1111/j.0963-7214.2005.00342.x
Saunders, J. B., Aasland, O. G., Babor, T. F., De la Fuente, J. R., Grant, M. (1993). Development of the alcohol use disorders identification test (AUDIT): WHO collaborative project on early detection of persons with harmful alcohol consumption-II. Addiction, 88(6), 791–804. https://doi.org/10.1111/j.1360-0443.1993.tb02093.x
Schumacker, R. E., Lomax, R. G. (2004). A beginner’s guide to structural equation modeling. Psychology Press. https://doi.org/10.4324/9781410610904
Stata Corporation, C. S. (n.d.). Texas, USA.
Teklie, H., Gonfa, G., Getachew, T., Defar, A., Bekele, A., Bekele, A., Taye, G. (2017). Prevalence of Khat chewing and associated factors in Ethiopia: Findings from the 2015 national Non-communicable diseases STEPS survey. Ethiopian Journal of Health Development, 31(1), 320–330.
Thomas, S., Williams, T. (2013). Khat (Catha edulis): A systematic review of evidence and literature pertaining to its harms to UK users and society. Drug Science, Policy and Law, 1, 205032451349833. https://doi.org/10.1177/2050324513498332
Warfa, N., Klein, A., Bhui, K., Leavey, G., Craig, T., Stansfeld, S. A. (2007). Khat use and mental illness: A critical review. Social Science Medicine, 65(2), 309–318. https://doi.org/10.1016/j.socscimed.2007.04.038
Wirth, R. J., Edwards, M. C. (2007). Item factor analysis: Current approaches and future directions. Psychological Methods, 12(1), 58–79. https://doi.org/10.1037/1082-989x.12.1.58
This is an open access article distributed under the terms of the Creative Commons Attribution License (4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Supplementary data