A systematic review of interrater reliability of ... - Semantic Scholar

03.07.2008 - neo us tissu e that may extend dow n to, but not throug h underlying fascia . The ulcer clinically presen ts itself a s a deep crater with or withou.
232KB Größe 19 Downloads 455 Ansichten
REVIEW

A systematic review of interrater reliability of pressure ulcer classification systems Jan Kottner, Kathrin Raeder, Ruud Halfens and Theo Dassen

Aims. To review systematically the interrater reliability of pressure ulcer classification systems to find out which classification should be used in daily practice. Background. Pressure ulcer classification systems are important tools in research and practice. They aim at providing accurate and precise communication, documentation and treatment decisions. Pressure ulcer classifications are criticised for their low degree of interrater reliability. Design. Systematic review. Methods. The data bases MEDLINE, EMBASE, CINAHL and the World Wide Web were searched. Original research studies estimating interrater reliability of pressure ulcer classification systems were included. Study selection, data extraction and quality assessment was conducted independently by two reviewers. Results. Twenty-four out of 339 potentially relevant studies were included in the final data synthesis. Due to the heterogeneity of the studies a meaningful comparison was impossible. Conclusions. There is at present not enough evidence to recommend a specific pressure ulcer classification system for use in daily practice. Interrater reliability studies are required, in which comparable raters apply different pressure ulcer classification systems to comparable samples. Relevance to clinical practice. It is necessary to determine the interrater reliability of pressure ulcer classifications among all users in clinical practice. If interrater reliability is low the use of those systems is questionable. On the basis of this review there are no recommendations as to which system is to be given preference. Key words: classification, diagnosis, nurses, nursing, pressure ulcer, systematic review Accepted for publication: 3 July 2008

Pressure ulcers (PUs) are serious health problems (Allman 1997). In Europe the prevalence of pressure-related damage to the skin ranges from 10Æ5% in hospital patients (Bours et al. 2002) to 6Æ1% in nursing home residents (Lahmann

et al. 2005). PUs are associated with pain and distress for the individuals affected and their treatment causes extensive health care costs (Graves et al. 2005). Furthermore, pressure-related injuries are considered as an important indicator for the quality of care (Calianno 2007). PU classification systems are important tools in PU research and

Authors: Jan Kottner, MA, RN, Associate Professor, Department of Nursing Science, Centre for Humanities and Health Sciences, Charite´-Universita¨tsmedizin Berlin, Berlin, Germany; Kathrin Raeder, RN, Master Student, Certified Wound Expert, Department of Nursing Science, Centre for Humanities and Health Sciences, Charite´-Universita¨tsmedizin Berlin, Berlin, Germany; Ruud Halfens, PhD, Faculty of Health, Medicine and Life Sciences, Department of Health Care and Nursing Sciences, Universiteit Maastricht,

Maastricht, The Netherlands; Theo Dassen, PhD, RN, Director of the Department of Nursing Science, Centre for Humanities and Health Sciences, Charite´-Universita¨tsmedizin Berlin, Berlin, Germany Correspondence: Jan Kottner, Charite´-Universita¨tsmedizin Berlin, Department of Nursing Science, Centre for Humanities and Health Sciences, Charite´platz 1, 10117 Berlin, Germany. Telephone: +49 30 450 529 054. E-mail: [email protected]

Introduction

 2009 The Authors. Journal compilation  2009 Blackwell Publishing Ltd, Journal of Clinical Nursing, 18, 315–336 doi: 10.1111/j.1365-2702.2008.02569.x

315

J Kottner et al.

management. They aim at providing consistent and accurate PU assessment that promotes accurate communication, precise documentation and treatment decisions (Banks 1998, Stotts 2001). The first well documented PU classification system was proposed by Shea in 1975. Shea differentiated five categories: the category ‘Closed Pressure Sore’ indicating a deep tissue injury beneath intact skin and the grades I–IV with increasing numbers indicating more severe tissue damage (Table 1). During the following years this classification was modified several times and new classification systems (Table 1) were introduced with varying numbers of categories (Reid & Morison 1994, Healey 1996, Haalboom et al. 1997). The National Pressure Ulcer Advisory Panel of the United States (1989, 1997) and the European Pressure Ulcer Advisory Panel (1998) proposed the two most widely used PU classifications. Apart from some differences both classifications have a lot in common. The National Pressure Ulcer Advisory Panel (2007) has recently published updated definitions of PU grades. PU classification systems are generally criticised for their low level of interrater reliability, therefore questioning their use in practice, research and quality assurance (Haalboom et al. 1997, Russell 2002b, Dealey & Lindholm 2006). Interrater or interobserver reliability indicates the degree to which two or more independently operating raters or observers agree when rating the same subject or object. It is affected by properties of the measurement instrument (e.g. number of categories, operational definitions of items), qualification, knowledge and training of observers, conditions of observation and population characteristics (e.g. prevalence) (Kraemer 1979, Suen 1988, Shrout 1998). Although interrater reliability is a major criterion for assessing the quality and adequacy of an instrument, the criterion of validity is important as well. However, an instrument that is unreliable cannot be valid. A high level of interrater reliability is the prerequisite for validity (Shrout 1998, Polit & Beck 2004). At present there are many studies investigating the interrater reliability of different PU classifications, however, no attempts have been made to systematically review the available research evidence regarding their degree of interrater reliability.

Aims The purpose of this study was to systematically review the interrater reliability among health care workers using PU classifications systems to find out which classification can be recommended for daily use. We also aimed at determining the factors affecting interrater reliability. 316

Methods Search The search was conducted by the first investigator (JK). It was supported by a librarian specialised in medical databases. The databases MEDLINE (1965–June 2007), EMBASE (1989– June 2007) and CINAHL (1995–June 2007) were systematically searched. The search included synonyms for PU in combination with different terms for classification. To identify research studies, which examined PU classifications as part of larger studies or trials, we added terms for various cross-sectional or prospective study designs (Fig. 1). The search software WEB SPIRS Version 5.12 (Ovid Technologies, NY, USA) was used. Studies from reference lists that seemed to be relevant were obtained and analysed as well. In addition, Google was used to search the World Wide Web combining terms for PUs and classifications. Authors of conference abstracts found on the Internet were contacted for detailed information. No other steps were taken to locate unpublished material or grey literature.

Study selection The results from the literature search were screened by the first (JK) and the second investigator (KR) independently by reading the title and/or the abstract. To gain as many relevant studies as possible we determined broad inclusion criteria: 1 Original research studies estimating interrater reliability of PU classifications 2 Language: English or German. Exclusion criteria were: 1 Retrospective study designs, reviews, discussion papers 2 Studies using data from patient records (chart review). Afterwards, the results were compared and any discrepancies discussed until an agreement was achieved.

Data extraction Data from relevant studies were selected using a data extraction sheet. It contained: names of authors; year of publication; name of country or region; descriptions of PU classification systems including the number of categories; methods used to conduct the interrater reliability study; numbers, training and qualification of raters; numbers and characteristics of subjects or cases; numbers of observations per rater; inclusion of normal skin (yes/no). Calculated interrater reliability coefficients and 95% CI were taken over from the original study or calculated whenever raw data or

 2009 The Authors. Journal compilation  2009 Blackwell Publishing Ltd, Journal of Clinical Nursing, 18, 315–336

Grade 1: Most apparent clinical presentation is an irregular, ill-defined area of soft tissue swelling and induration with associated heat and erythema overlying a bony prominence. Extreme Grade I is a moist superficial irregular ulceration limited to epidermis exposing underlying dermis and resembling an abrasion. Anatomic limit: Dermis Grade II: Clinically presented as a shallow full thickness skin ulcer whose edges are more distinct. Anatomic limit: Subcutaneous fat

Shea (1975)

Stirling 2 digit (Reid & Morison 1994) Stage 0: No clinical evidence of a pressure sore: 0.0 Normal appearance, intact skin 0.1 Healed with scarring 0.2 Tissue damage, but not assessed as a pressure sore

Stage 1: Discoloration of intact skin – light finger pressure applied to the site does not alter the discolouration: 1.1 Non-blanchable erythema with increased local heat 1.2 Blue/purple/ black discolouration. The sore is at least Stage 1

Yarkony-Kirk (Yarkony et al. 1990) 1: Red area Present longer than 30 minutes, but less than 24 hours Present longer than 24 hours

2: Epidermis and/ or dermis ulcerated with no subcutaneous fat observed

Torrance (1983) in Edwards & Banks (1999) Grade 1: Area of blanching hyperaemia. Light finger pressure will cause blanching of the erythema

Grade 2: Nonblanching hyperaemia. Erythema remains when light finger pressure is applied. Superficial damage may be present as blistering, induration or swelling. Epidermal ulceration may expose dermis.

Surrey (1983) in Healey (1995) Stage 1: Nonblanching erythema

Stage 2: Superficial break in the skin

Table 1 Description of pressure ulcer classifications included in the review

Grade 1: Nonblanchable erythema of intact skin. Discolouration of the skin, warmth, oedema, induration or hardness may also be used as indicators, particularly on individuals with darker skin

Stage I: An observable pressure-related alteration of intact skin the indicators of which as compared to an adjacent or opposite area on the body may include changes in skin colour (red, blue, purple tones), skin temperature (warmth or coolness), skin stiffness and/or sensation (pain) Stage II: Partialthickness skin loss involving epidermis and/or dermis. The ulcer is superficial and clinically presents itself as an abrasion, blister or shallow crater

Grade 2: Partialthickness skin loss involving epidermis, dermis or both. The ulcer is superficial and clinically presents itself as an abrasion or blister

EPUAP (1998)

NPUAP* (1989)

Review Pressure ulcer classifications

 2009 The Authors. Journal compilation  2009 Blackwell Publishing Ltd, Journal of Clinical Nursing, 18, 315–336

317

318 Grade 4: Ulceration extends into the relatively avascular subcutaneous fat and underlying muscle becomes swollen and inflamed. Progress is temporarily impeded resulting in lateral extension of the wounding. There is a distinct edge to the ulcer, but deeper areas are distorted.

Stage 4: Destruction of the skin with cavity

Grade IV: Clinical presentation resembles that of a Grade III except that bone can be identified in the base of the ulceration which is more extensively undermined with profuse drainage and necrosis. Anatomic limit: no limit

4: Muscle/fascia observed, but no bone observed

3: Subcutaneous fat observed, no muscle observed

Grade 3: Ulceration progresses through the dermis to the junction of the subcutaneous tissue. Ulceration has distinct edges, but erythema and induration is present around the wound.

Stage 3: Destruction of the skin without cavity

Grade III: Irregular full thickness skin defect extending into the subcutaneous fat exposing a draining, foul smelling, infected, necrotic base which has undermined the skin for a variable distance. Anatomic limit: Deep fascia

Shea (1975)

Yarkony-Kirk (Yarkony et al. 1990)

Torrance (1983) in Edwards & Banks (1999)

Surrey (1983) in Healey (1995)

Table 1 (Continued)

Stage 2: Partial thickness skin loss or damage involving epidermis and/or dermis: 2.1 Blister 2.2 Abrasion 2.3 Shallow ulcer, without undermining of adjacent tissue 2.4 Any of these with underlying blue/purple/black discolouration or induration. The sore is at least Stage 2. Stage 3: Full-thickness skin loss involving damage or necrosis of subcutaneous tissue but not extending to underlying bone, tendon or joint capsule: 3.1 Crater, without undermining of adjacent tissue 3.2 Crater, with undermining of adjacent tissue 3.3 Sinus, the full extent of which ins not certain 3.4 Full thickness skin loss, but wound bed covered with necrotic tissue (hard or leathery black/brown/broken tissue or softer or softer yellow/cream/grey slough) which masks the true extent of tissue damage. The sore is at least Stage 3. Until debrided it is not possible to observe whether damage extends into muscle or involves damage to bone or supporting structures

Stirling 2 digit (Reid & Morison 1994)

Grade 3: Fullthickness skin loss involving damage to or necrosis of subcutaneous tissue that may extend down to, but not through underlying fascia

Stage III: Full-thickness skin loss involving damage or necrosis of subcutaneous tissue that may extend down to, but not through underlying fascia. The ulcer clinically presents itself as a deep crater with or without undermining of adjacent tissue Stage IV: Full-thickness skin loss with extensive destruction, tissue necrosis or damage to muscle, bone or supporting structures (e.g. tendon, joint capsule, etc.)

Grade 4: Extensive destruction, tissue necrosis, or damage to muscle, bone, or supporting structures with or without fullthickness skin loss

EPUAP (1998)

NPUAP* (1989)

J Kottner et al.

 2009 The Authors. Journal compilation  2009 Blackwell Publishing Ltd, Journal of Clinical Nursing, 18, 315–336

Review

Pressure ulcer classifications

EPUAP (1998)

contingency tables were presented. Data were extracted by the first (JK) and the second investigator (KR) independently. Afterwards the results were compared. Any discrepancies were discussed until an agreement was achieved.

Study quality assessment

*National Pressure Ulcer Advisory Panel.  European Pressure Ulcer Advisory Panel.  Updated 1997.

Stage 4: Full-thickness skin loss Grade 5: Infective necrosis 5: Bone observed, but with extensive destruction and affects the deeper fascia tissue necrosis extending to no involveand muscle. Tissue underlying bone, tendon or ment destruction is rapid, of joint space capsule: joints, bursae and body 4.1 Visible exposure of bone, cavities can be involved tendon or joint capsule and there is a risk of 4.2 Sinus assessed as extending to osteomyelitis. Sores may bone, tendon or capsule communicate, resulting in massive tissue destruction. 6: Involvement of joint space Closed Pressure Sores: Ischemic necrosis in the subcutaneous fat without skin ulceration leading to the development of a bursa-like cavity filled with necrotic debris. Resembling a Grade III sore in extent and depth. Overlying pigmented, thickened and fibrotic skin eventually ruptures, creating a small skin defect draining a large base. Anatomic limit: Deep fascia

Shea (1975)

Table 1 (Continued)

Surrey (1983) Torrance (1983) in Edwards in Healey & Banks (1999) (1995)

Yarkony-Kirk (Yarkony et al. 1990)

Stirling 2 digit (Reid & Morison 1994)

NPUAP* (1989)

An intensive search for instruments evaluating the quality of interrater reliability studies provided no results. To our knowledge there are no proposed guidelines or statements presenting any criteria for the quality evaluation of interrater reliability studies. Based on previous studies dealing with the same problem (Audige´ et al. 2004, Brorson & Hro´bjartsson 2008) and known factors influencing the interrater reliability (Kraemer 1979, Shrout 1998, Dunn 2004) nine criteria were considered as being important: (1) Were the classification system(s) and the number of categories clearly described? (2) Were the repeated assessments conducted independently? (3) Was the number of subjects/targets stated? (4) Were the assessment conditions clearly described (e.g. skin assessment, use of photographs)? (5) Was the number of raters stated? (6) Were rater characteristics described? (7) Was the number of observations per rater stated? (8) Was there a description as to whether normal skin was included? (9) Were the computations of interrater reliability coefficients clearly stated and comprehensible? With regard to the established quality criteria each study was carefully evaluated as to whether the required information was reported or not. One fulfilled criterion corresponded to one point. After adding up all points the overall quality score ranged from zero to nine points. Only those studies with a quality score of seven or more were included in the final data synthesis. We decided to use this cut-off point because we were unable to validly interpret the results of those studies where more than two criteria were missing. Both reviewers (JK & KR) evaluated each study independently. Any discrepancies were resolved by consensus.

Data synthesis and interpretation Studies were divided into two groups depending on whether skin inspection was actually conducted or whether the assessment results were based on images. This classification seemed to be appropriate, because ‘real-life situations’ (Healy 1996) are hardly comparable to rather artificial ones (Defloor & Schoonhoven 2004, Hart et al. 2006). The use of images limits clinical information which is useful for pressure ulcer classification (Defloor et al. 2006). Results of the original studies were summarised qualitatively. A single summary measure was considered to be

 2009 The Authors. Journal compilation  2009 Blackwell Publishing Ltd, Journal of Clinical Nursing, 18, 315–336

319

J Kottner et al.

#1

Pressure ulcer or pressure sore or decubitus or bed sore or bedsore or decubitus ulcer

#2

Classification or grade* or stage*

#3

la = German or la = English

#4

#1 and #2 and #3

#5

Diagnosis* or diagnostic* or systematic or clinical trial or cohort studies or group or prevalence or incidence or reliability or validity

#6

#4 and #5

inappropriate because different statistical approaches were used to calculate the interrater reliability coefficients, such as the overall proportion of agreement (po), Cohen’s kappa (j), weighted kappa (jw), proportion of differences in per cent, Spearman’s rho, intraclass correlation coefficient (ICC) and non-specified correlation coefficients. Further measures of precision, such as standard errors or confidence intervals, are prerequisites for meta-analysis but were missing in almost all studies. Finally, the way of study design and data analysis varied considerably between the studies. The obtained j-values were interpreted according to the widely used benchmarks of Landis and Koch (1977). They labelled the ranges of j-values as follows: 0Æ00–0Æ20 slight, 0Æ21–0Æ40 fair, 0Æ41–0Æ60 moderate, 0Æ61–0Æ80 substantial, 0Æ81–1Æ00 almost perfect. The interpretation of jw was like that of j (Fleiss et al. 2003). ICC-values were interpreted as proportions of variance explained by within-subject differences. However, j- and ICC-values are highly influenced by the proportion of the trait of the rated sample. That means obtained coefficients could be low when the measures are applied to a very homogenous sample (e.g. high proportion of normal skin in the sample). Values are further influenced by the number of categories. po-values were regarded as valuable coefficients. However, an interpretation and comparison of

Potentially relevant studies from electronic search (n = 327)

Studies identified from reference lists (n = 5)

po between the studies was difficult as well. Similar to j, its value is influenced by the presence of the trait and the number of categories. Above all, it has not been corrected for chance agreement. Correlation coefficients, such as Spearman’s rho, were regarded as inappropriate measures, because values indicate the degree and direction of association but not of agreement (Bland & Altman 1986).

Results Search and selection of studies The search and selection process of included studies is shown in Fig. 2. A total of 1650 references were found in the searched databases. A total of 417 references were duplicate copies and 906 references were excluded based on their titles or abstracts. A total of 327 studies were screened for inclusion criteria. The majority of these studies were pressure ulcer prevalence or incidence studies lacking interrater reliability estimation. Furthermore, seven studies were selected from the World Wide Web search and five studies from the reference lists. Eight studies were not considered for the review, because results were either lacking (n = 4) or the same study was

Studies found in the World Wide Web (n = 7)

Studies examining interrater reliability but not reporting results (n = 4)

Studies not meeting inclusion criteria after evaluation of full-text (n = 286)

Same study published in different journals (n = 3)

Original study unobtainable (n = 1)

Relevant studies included in systematic review (n = 47)

320

Figure 1 Search strategy on databases Medline, Embase, Cinahl using ERLWebSPIRS.

Figure 2 Selection process of included studies.

 2009 The Authors. Journal compilation  2009 Blackwell Publishing Ltd, Journal of Clinical Nursing, 18, 315–336

Review

published in different sources (n = 3) and one study was unobtainable. In the end, 47 studies were deemed relevant.

Study characteristics All 47 studies included in the review were published in English in the period from 1977–2007. Most studies were conducted in the USA (n = 16) and the UK (n = 9). Nineteen studies originated from other European countries. One study was identified as being from China and two studies were conducted in Canada. Actual skin assessment in a clinical setting was conducted in 30 studies. In 15 studies the interrater reliability was investigated based on images (e.g. slides, photographs). Both assessment methods were investigated in two studies. Overall, six different classification systems were examined, which included modifications and usage of different numbers of categories. Four studies applied PU classifications that could not be explicitly assigned to one specific system. Two studies examined the interrater reliability for non-blanchable erythema only.

Quality assessment Results of the quality assessment are shown in Table 2. A quality score of seven or above was assigned to 24 out of 47 relevant studies (51%) and only those studies were included in the final data synthesis. The most frequently missing details were number of observations per rater (n = 30), information as to whether normal skin was included (n = 21) and a description whether assessments were conducted independently (n = 22). In four cases data of two distinct studies were presented. In Table 2 they are referred to as ‘study 1’ and ‘study 2’.

Final data synthesis Interrater reliability studies based on skin examination Ten studies were included in the final data synthesis describing assessment results of a real skin or wound examination. Study characteristics, methods and results are shown in Table 3. Studies are very heterogeneous regarding applied classification systems, methods, number of categories, rates and rated subjects. Therefore they are hardly comparable. Interrater reliability for the EPUAP classification including the additional category ‘no PU’ was investigated in three studies. Coefficients ranged from j = 0Æ97 (95% CI 0Æ92– 1Æ00) (Bours et al. 1999) to j = 0Æ31 (Pedley 2004). Corresponding po-values ranged from 1Æ00 (Bours et al. 1999) to 0Æ49 (Pedley 2004). The study of Bours et al. (1999) was a large scale testing of a data collection form which included

Pressure ulcer classifications

the EPUAP system. Numerous raters with different background were involved. Only one pair of raters and fewer patients participated in the studies by Halfens et al. (2001) and Pedley (2004). The NPUAP system was examined in three studies. The proportion of agreement ranged from 100% (Gawron 1994) to 58% (Lyder et al. 1999). Even though Allman et al. (1995) applied the complete NPUAP system, agreement was only measured for the category ‘non-blanchable erythema’ (po = 0Æ79). For the complete NPUAP system no j-values were reported. Interrater reliability coefficients for the Stirling classification (Pedley 2004) were comparably low with j-values between 0Æ37 (po = 0Æ54) for the 1-digit-version and 0Æ48 (po = 0Æ54) for two-digit version indicating fair to moderate agreement. When the adapted Shea scale was applied interrater reliability between all raters was j = 0Æ42 (95% CI 0Æ10–0Æ74) (po = 0Æ67) indicating moderate agreement (Buntinx et al. 1986). Using the adapted Torrance scale agreement among trained nurses ranged from 92–98% (Nixon et al. 1998). After application of an adapted version of the EPUAP/NPUAP classification the interrater reliability between one research nurse team leader and trained, experienced clinical research nurses was j = 0Æ97 (95% CI 0Æ93–1Æ00), which could be labelled as almost perfect. The interrater reliability between the research nurses and trained ward nurses was much lower [j = 0Æ63 (95% CI 0Æ61–0Æ66)], which corresponds to substantial agreement. Vanderwee et al. (2006) measured interrater reliability for the distinction between blanchable and non-blanchable erythema. When using a transparent disc j was 0Æ72 (po = 0Æ97). When the ‘finger method’ was applied j was 0Æ69 (po = 0Æ92). Results indicate that there was no difference between both assessment methods. All investigated raters in these studies were specialised, trained or experienced in pressure ulcer diagnosis. Buntnix et al. (1986), Bours et al. (1999), Pedley (2004), Nixon et al. (2005) and Vanderwee et al. (2006) put their major emphasis of their studies on the investigation of interrater reliability. Remaining studies conducted smaller-scale interrater reliability examinations as part of larger studies. Interrater reliability studies based on images Fourteen studies measuring interrater reliability based on images were included in the final data synthesis (Table 4). Studies differed considerably regarding the applied PU classification system, the version of the classification, the number and qualification or raters, the sample and the computation of results.

 2009 The Authors. Journal compilation  2009 Blackwell Publishing Ltd, Journal of Clinical Nursing, 18, 315–336

321

322

Barbenel et al. (1977) Warner & Hall (1986) Yarkony et al. (1990) Allcock et al. (1994) Braden & Bergstrom (1994) Gawron (1994) (study 1) Gawron (1994) (study 2) Allman et al. (1995) Arnold & Watterworth (1995) (study 1) Arnold & Watterworth (1995) (study 2) Ferrell et al. (1995) Healey (1995) Bergstrom et al. (1996) Buntinx et al. (1986) Bergstrom et al. (1998) Nixon et al. (1998) (study 1) Nixon et al. (1998) (study 2) Bergquist & Frantz (1999) (Bergquist 2001)§ Bours et al. (1999) Carlson et al. (1999) Derre et al. (1999) Lyder et al. (1999) Halfens et al. (2001) Russell & Reynolds (2001) Bours et al. (2002) (Vanderwee et al. 2007a)§ Schoonhoven et al. (2002a) (Schoonhoven et al. 2002b)§ Marrie et al. (2003) Verdu (2003) Defloor & Schoonhoven (2004) Groeneveld et al. (2004) Pedley (2004)

Authors Yes ns Yes ns ns ns Yes Yes ns Yes ns Yes ns Yes ns ns ns No Yes No ns Yes ns Yes Yes No Yes Yes Yes Yes Yes

Yes No Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes No Yes Yes No Yes Yes Yes Yes

(2) Independency of assessments

Yes Yes Yes Yes Yes Yes Yes Yes Yes

(1) Description of classification

Quality criteria*

Yes Yes Yes Yes Yes

Yes

Yes No No Yes Yes Yes Yes

Yes Yes No Yes No Yes Yes Yes

No

No Yes Yes Yes No No Yes Yes Yes

(3) Number of subjects / targets

Yes Yes Yes Yes Yes

Yes

Yes Yes No Yes Yes Yes Yes

Yes Yes No Yes No Yes Yes Yes

Yes

Yes Yes No No No Yes Yes Yes Yes

(4) Description of assessment conditions

Yes Yes Yes No Yes

Yes

No No No Yes Yes Yes No

Yes Yes No Yes No Yes Yes No

Yes

Yes No Yes No No Yes Yes Yes Yes

(5) Number of raters

Table 2 Assessment of methodological quality of reliability studies of pressure ulcer classification systems

No Yes Yes Yes Yes

Yes

Yes Yes No Yes Yes Yes Yes

No Yes Yes Yes Yes Yes Yes Yes

Yes

Yes Yes Yes Yes Yes Yes Yes Yes Yes

(6) Description of rater characteristics

No Yes Yes Yes Yes

No

No No No No Yes Yes No

Yes Yes No Yes No No No No

No

No No No No No No Yes Yes Yes

(7) Observation per rater

Yes Yes Yes No Yes

No

Yes No Yes No No No No

Yes No No Yes No Yes Yes Yes

Yes

No Yes Yes No No Yes Yes No Yes

(8) Inclusion of normal skin

No Yes Yes Yes Yes

Yes

Yes Yes No Yes Yes Yes No

No Yes Yes Yes Yes Yes Yes Yes

ns

No No No No No Yes Yes Yes Yes

(9) Computations of interrater reliability coefficients

5 9 9 7 9

6

7 4 2 7 7 7 5

5 8 3 9 3 7 7 6

6

5 5 6 3 2 6 9 8 8

QS

J Kottner et al.

 2009 The Authors. Journal compilation  2009 Blackwell Publishing Ltd, Journal of Clinical Nursing, 18, 315–336

Buckley et al. (2005) Kwong et al. (2005) Nixon et al. (2005) (study 1) Nixon et al. (2005) (study 2) Vanderwee et al. (2005) Alves (2006) Briggs (2006) Defloor et al. (2006) Feuchtinger et al. (2006) Gunningberg (2006) Hart et al. (2006) Noonan et al. (2006) Localio et al. (2006) Vanderwee et al. (2006) Feuchtinger et al. (2007) Gajewski et al. (2007) Schoonhoven et al. (2007) Stausberg et al. (2007) Vanderwee et al. (2007b) Vanderwee et al. (2007c)

Yes ns No No Yes Yes Yes Yes Yes Yes ns ns Yes Yes ns ns Yes Yes Yes Yes

(2) Independency of assessments

*Full description of quality criteria in text.  Quality score.  Not stated. § Same study. Bold letters: studies included in final data synthesis.

Yes No Yes Yes No No Yes Yes No Yes Yes Yes Yes Yes No Yes Yes Yes Yes Yes

Authors

Quality criteria*

(1) Description of classification

Table 2 (Continued)

Yes No Yes Yes No Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes No Yes

(3) Number of subjects / targets Yes No Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes No No No Yes Yes Yes

(4) Description of assessment conditions Yes No Yes Yes No Yes Yes Yes No Yes Yes Yes Yes Yes Yes Yes Yes Yes No No

(5) Number of raters Yes Yes Yes Yes Yes No Yes Yes Yes Yes Yes Yes Yes Yes Yes No Yes Yes Yes No

(6) Description of rater characteristics Yes No Yes No No No Yes Yes No Yes Yes Yes Yes No No No No Yes No No

(7) Observation per rater No No Yes Yes Yes Yes Yes Yes No No Yes Yes Yes Yes No Yes No Yes No Yes

(8) Inclusion of normal skin Yes Yes Yes Yes Yes No Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes

(9) Computations of interrater reliability coefficients

8 2 8 7 5 5 9 9 5 8 8 8 9 8 4 5 6 9 5 6

QS

Review Pressure ulcer classifications

 2009 The Authors. Journal compilation  2009 Blackwell Publishing Ltd, Journal of Clinical Nursing, 18, 315–336

323

324

NPUAP, category ‘necrosis’ was added (5)

Shea, category ‘blanching erythema’ was added (5)

Paired assessments in Nixon et al. (1998), Adapted Torrance UK scale, category ‘no pre- (1) and main study (2) skin discolouration’ was added, ‘Grade 2’ was divided in ‘Nonblanching redness’ and ‘superficial skin damage’ (7) Bours et al. (1999), EPUAP, category ‘no Two independent Netherlands PU’ was added (5) assessments in hospital, nursing home, home health care

Buntinx et al. (1986), Belgium

Raters (k)

Subjects/Targets (n)

Observations per rater

Yes

Not Hospital and nursing Hospital patients clearly home: trained staff (n = 45, 674 stated observations) nurses (k not Nursing home resistated) and one dents (n = 23, 344 researcher observations) Home health care: Home health care trained primary patients (n = 90, nurses (k not stated), one wound 1348 observations) care specialist

Study 1: Trained nurses (k = 94) Study 2: Trained nurses (k not stated)

Yes

Hospital: po = 1Æ00, Part of validation study 95% CIs j = 0Æ97 (95% CI 0Æ92– recalculated 1Æ00) Nursing home: po = 0Æ94, j = 0Æ81 (95% CI 0Æ73– 0Æ90) Home health care agency: agreement between nurses and wound care specialist: po = 0Æ98, j = 0Æ49 (95% CI 0Æ35–0Æ63) Most disagreements occurred in diagnosing grade 1

No stage 0 PU was observed Agreement for pairs (excluding grade 0): po = 0Æ40 to 0Æ80, j=0Æ12 to 0Æ65 Average agreement between all raters (excluding grade 0): po = 0Æ67, j = 0Æ42 (95% CI 0Æ10–0Æ74) Study 1: po = 0Æ98, disPart of incidence agreement in 15 skin sites study po affecting 12 patients recalculated Study 2: po = 0Æ92, disagreement in 72 skin sites Disagreement mainly in ‘no skin discoloration’ and ‘blanching erythema’

No

Part of prevalence study

Notes

Non-blanchable erythema: Part of incidence po = 0Æ79 study

po = 0Æ90 to 1Æ00

Results

Not stated

No

Normal skin

Not Study 1: Skin sites of hospital patients stated (n = 664) Study 2: Skin sites of hospital patients (n = 851)

2 Assessment of patients Principal investiga- Hospital patients (n = 2) with PUs with PUs by principal tor and trained registered nurses investigator and (k = 23) nurses Research nurses Hospital patients 26 Two independent (n = 26) assessments of patients (k = 2) within two days Independent assessNurses (k = 3) and PUs (n = 20) among 20 ments of PUs physicians (k = 3) 20 geriatric patients, convenience with chronic wound experience, sample without special training in assessment

Classification system (number of categories) Methods

Allman et al. (1995), NPUAP (4) USA

Gawron (1994), USA

Author (year), Country/Region

Table 3 Characteristics and findings of interrater reliability studies based on real skin assessment included in final data synthesis

J Kottner et al.

 2009 The Authors. Journal compilation  2009 Blackwell Publishing Ltd, Journal of Clinical Nursing, 18, 315–336

Stirling 1-digit (5) Stirling 2-digit (15) EPUAP scale, category ‘PU yes/ no’ was added (5) Nixon et al. (2005), Adapted EPUAP/ UK NPUAP, categories ‘no skin changes’, ‘blanching erythema’, ‘black eschar’ were added (7)

Pedley (2004), UK

Simultaneous and independent assessments of PU areas using different scales Simultaneous and independent paired assessments of seven skin sites Study 1: Clinical research nurse team leader (k = 1) and trained and experienced clinical research nurses (k = 4) Study 2: Trained and experienced clinical research nurses (k = 6) and trained ward nurses (k = 109)

Study 1: hospital patients, 107 skin sites (n = 16) Study 2: hospital patients, 2396 skin sites (n = 362)

PU areas (n = 35) Trained registered nurses experienced on 30 adult hospital patients in tissue viability (k = 2)

Halfens et al. (2001), Netherlands

Lyder et al. (1999), NPUAP (4) USA

Black and Latino/ Trained research Comparison between skin assessment results nurse 1 conducted Hispanic Elders (n = 24) melanocentric between research (observing for nurse 1 and 2 and changes in hue principal investigator (purple), temperaor wound/skin care clinical nurse specialist ture, induration, use of natural lighting) skin assessment Trained research nurse 2 conducted skin assessment according to current nursing practice EPUAP, category ‘no Two independent Trained staff nurses Hospital patients PU’ was added (5) assessments of patients (k = 2) (n = 28) using transparent disc

Raters (k)

Subjects/Targets (n)

Classification system (number of categories) Methods

Author (year), Country/Region

Table 3 (Continued)

Yes

35

Study 1: Yes 16 Study 2: not stated

Not stated

Not stated

Not stated

28

Normal skin

Observations per rater Notes

Part of prospective study Stirling 1 digit: j = 0Æ37, po = 0Æ54 Stirling 2 digit: j = 0Æ48, po = 0Æ54 EPUAP scale: j = 0Æ31, po = 0Æ49; Pressure ulcer (yes/no): j = 0Æ22, po = 0Æ86 Study 1: Diagnoses of PU (yes/no): j and 95% po = 1Æ00, j = 1Æ00 (all skin sites); CIs recalculated Classification across all categories: po = 0Æ98, j = 0Æ97 (95% CI 0Æ93– 1Æ00) (two disagreements concerning normal skin and blanching erythema) Study 2: Diagnoses of PU (yes/no): po = 0Æ97, j = 0Æ77 (95% CI 0Æ72– 0Æ82) (all skin sites); Classification across all categories: po = 0Æ79, j = 0Æ63 (95% CI 0Æ61–0Æ66) (508 disagreements ranging from 1 to 3 categories)

j = 0Æ90

Agreement between research nurse 1 Part of and principal investigator or wound/ incidence skin care clinical nurse specialist: study po = 0Æ78 Agreement between research nurse 2 and Principal investigator or wound/ skin care clinical nurse specialist: po = 0Æ58

Results

Review Pressure ulcer classifications

 2009 The Authors. Journal compilation  2009 Blackwell Publishing Ltd, Journal of Clinical Nursing, 18, 315–336

325

326

Finger method: agreement between Finger and researcher and nurses: po = 0Æ92 transparent disk method (range: 0Æ83–1Æ00), j = 0Æ69 (range were used in 0Æ48–1Æ00) Transparent disc method: agreement random order between researcher and nurses: Sample size po = 0Æ92 (range: 0Æ83–1Æ00), calculation j = 0Æ72 (range 0Æ44–1Æ00) Agreement between finger and trans- was parent disc method: po = 0Æ97 (95% conducted CI 0Æ95–0Æ98), j = 0Æ88 (95% CI 0Æ83–0Æ94) More erythemas were classified as non-blanchable wtih transparent disc method (n = 28) No Researcher: Pressure points 503, Nurwith erythema of ses: not geriatric hospital stated patients (n = 503)

po = proportion of agreement, j = kappa.

Researcher (k = 1), trained nurses (k = 16) Two independent assessments of six skin sites by researcher and nurses comparing finger and transparent disk method Blanchable and non-blanchable erythema (2) Vanderwee et al. (2006), Belgium

Classification system (number of Author (year), Methods Country/Region categories)

Table 3 (Continued)

Raters (k)

Subjects/Targets (n)

Observations per rater

Normal skin

Results

Notes

J Kottner et al.

A four grade NPUAP classification was applied in two studies. Arnold and Watterworth (1995) reported that agreement among seven registered nurses could not be increased by training (po = 0Æ7). Agreement between two PU experts rating 50 photographs was much higher (Noonan et al. 2006). The NPUAP system including a fifths category ‘unstageable’ was investigated in three studies: j-values ranged from 0Æ56 (Hart et al. 2006) – 0Æ75 (Groeneveld et al. 2004) indicating moderate to substantial agreement. Buckley et al. (2005) reported proportions of agreement of home health care nurses with experts ranging from 39% to 100%. Interrater reliability for the classification PU (yes/no) according to the NPUAP among six trained experienced research nurses was 0Æ69 (ICC) which could be labelled as substantial agreement. The EPUAP system was investigated in six studies. Studies including large sample sizes were conducted by Defloor and Schoonhoven (2004) and Defloor et al. (2006). Interrater reliability among PU experts (j = 0Æ80 (Defloor & Schoonhoven 2004)) was much higher than among nurses who participated in a congress for wound care. Interrater reliability between trained staff nurses and data collectors was high (j = 0Æ75 (Gunningberg 2006)). Reported proportions of agreement ranged from 85% (Verdu 2003) to 62% (Russell & Reynolds 2001). The Stirling classification was investigated in two studies (Healey 1995, Russell & Reynolds 2001). Interrater reliability for the two-digit version as well as for the one-digit version was low: j = 0Æ15 and j = 0Æ22. Proportions of agreement for the two-digit version were 30% (Russell & Reynolds 2001) and 39% (Healey 1995). Interrater reliability of the Torrance (j = 0Æ29, po = 0Æ60) and Surrey classification (j = 0Æ37, po = 0Æ67) was slightly higher indicating fair agreement. The classification applied by Stausberg et al. (2007) was comparable to the NPUAP or EPUAP system, although the descriptions of categories differed slightly. The overall agreement on PU grades was po = 0Æ67 (j = 0Æ50) and the agreement for PU diagnosis (yes/no) was po = 0Æ88 (j = 0Æ29) indicating fair to moderate agreement. The main issue of the synthesised studies was the investigation of interrater reliability or agreement. In only three studies the examination of interrater reliability was part of larger prevalence studies (Groeneveld et al. 2004, Gunningberg 2006, Noonan et al. 2006).

Discussion A main finding of our systematic review was the heterogeneity of studies. It was the reason why a meaningful data

 2009 The Authors. Journal compilation  2009 Blackwell Publishing Ltd, Journal of Clinical Nursing, 18, 315–336

NPUAP (4)

Torrance (5), Stirling 1-digit (5), Stirling 2-digit (15), Surrey (4)

Arnold & Watterworth (1995), USA

Healey (1995), UK

Russell & Rey- Stirling 2-digit nolds (2001), (15), EPUAP (4) UK

Classification system (number of categories)

Author (year), Country/ Region 2 · 16

10

12

PU slides (n = 16)

PU photographs (n = 10)

Assessment of PU photographs Comparison with expert panel

Tissue viability nurses PU photographs (k = 27), district nur- (n = 12) ses (k = 24), acute hospital nurses (k = 25), EPUAP members (k = 21)

Not stated

Not stated

No

Observations Normal per rater skin

Subjects/ targets (n)

Raters (k)

Registered nurses Assessments of (k = 7), PU experts PU slides before and (k not stated) after training, comparison with expert ratings Surrey: nurses Assessment of PU (k = 35) photographs by Torrance: nurses three groups each (k = 37) using one scale Stirling: nurses (k = 37)

Methods

Table 4 Characteristics and findings of interrater reliability studies based on assessment of images included in final data synthesis

Notes

Overall agreement with po recalcuexpert ratings before lated training: po = 0Æ68; overall agreement with expert ratings after training: po = 0Æ71 Torrance: individual categories Only 79 agreement j = 0Æ17 to 0Æ60; nurses overall j = 0Æ29, po = 0Æ60 graded all photoStirling 2 digit: individual graphs categories agreement j = 0Æ02 to 0Æ46; overall j = 0Æ15, po = 0Æ39 Stirling 1 digit: individual categories agreement j = 0Æ12 to 0Æ50; overall j = 0Æ22, po = 0Æ59 Surrey: individual categories j = 0Æ18 to 0Æ64; overall j = 0Æ37, po = 0Æ67 Highest reliability for rating most severe PUs Mean of absolute values of differences between expert panel and all nurses (precision index): 0Æ36 ± 0Æ15 (Stirling 2-digit), 0Æ49 ± 0Æ15 (EPUAP) Agreement between expert panel and all nurses: po = 0Æ30 (Stirling 2-digit), po = 0Æ62 (EPUAP) Significant lack of consensus in acute hospital nurse group

Results

Review Pressure ulcer classifications

 2009 The Authors. Journal compilation  2009 Blackwell Publishing Ltd, Journal of Clinical Nursing, 18, 315–336

327

328

EPUAP, category ‘not known’ was added (5)

Classification system (number of categories) Assessment of PUs including case studies by two groups of nurses Comparison with expert ratings

Methods

Groeneveld et al. (2004), Canada

NPUAP, category ‘unable to stage’ was added (5)

Independent assessments of PUs

Defloor & EPUAP, categories Independent assessments of skin Schoon-hoven ‘normal skin’, alterations by (2004), ‘blanchable erydifferent groups Europe thema’, ‘incontiComparison with nence lesion’ ratings from EPUAP were added (7) trustees

Verdu (2003), Spain

Author (year), Country/ Region

Table 4 (Continued)

Yes

56

Not stated

EPUAP trustees from Photographs showing UK, Italy, the Nether- different skin lands, Denmark, Bel- alterations, applicagium, Ireland (k = 9) tion of transparent PU researchers from the disc (n = 56) Netherlands, Belgium (k = 7) Staff nurses from Belgium (k = 20) PU nurses from the Netherlands, Belgium (k = 17)

Trained staff members PU slides (n = 25) and third and fourth year nursing students (k not stated)

Not stated

No

Observations Normal per rater skin 3

Subjects/ targets (n)

PU photographs, Experimental group: cases I, II, III nurses with similar (n = 3) level of experience using a decision tree for grading (k = 32) Control group: nurses with similar level of experience without using decision tree for grading (k = 34)

Raters (k)

Notes

Proportions of accurate choices in experimental group: po = 0Æ78 (case I), po = 0Æ66 (case II), po = 0Æ44 (case III) Proportions of accurate choices in control group: po = 0Æ85 (case I), po = 0Æ53 (case II), po = 0Æ62 (case III) Using chi-square tests no statically significant differences were found between experimental and control group Overall proportions of accurate choices (both groups): po = 0Æ82 (case I), po = 0Æ59 (case II), po = 0Æ53 (case III) EPUAP trustees: po = 0Æ84 po recalcuPU researchers: j = 0Æ80 lated (multirater), j –range: 0Æ60–0Æ95 Staff nurses: j = 0Æ80 (multirater), j–range: 0Æ48–0Æ98 PU nurses: j = 0Æ78 (multirater), j-range: 0Æ42–0Æ92 Total: j = 0Æ80 (multirater) Deviation from ratings from EPUAP trustees: 5,9%, 9 PU photographs wrongly labelled as incontinence lesion, 21 incontinence lesion photographs not detected j = 0Æ75 (multirater) Part of prevalence study

Results

J Kottner et al.

 2009 The Authors. Journal compilation  2009 Blackwell Publishing Ltd, Journal of Clinical Nursing, 18, 315–336

 2009 The Authors. Journal compilation  2009 Blackwell Publishing Ltd, Journal of Clinical Nursing, 18, 315–336

EPUAP (4)

Gunningberg (2006), Sweden

Assessments of PU photographs

EPUAP, categories Phase 1: Assessments of photographs, ‘normal skin’, comparison with ‘blanchable eryratings of experts thema’, ‘incontiPhase 2: Assessments nence lesion’, ‘don’t know’ were of same set of photographs twice added (8)

Defloor et al. (2006), Europe

Briggs (2006), UK

Trained staff nurses and data collectors (k = 28)

Phase 1: nurses participating at congress for wound care familiar with EPUAP system (k = 473) Phase 2: nurses from university hospital (k = 86)

Registered home care Simultaneous and NPUAP, category nurses (k = 33) ‘cannot be staged’ independent WOC nurses (k = 4) assessment of wound was added (5) photographs including case studies Comparison with WOC nurse ratings Registered nurses, EPUAP, categories Assessment of the experience as same set of PU ‘normal skin’, qualified nurses and skin lesions ‘blanchable eryvaried from a few before and after thema’, ‘moisture month to five years lesion’, ‘combined study of PUCLAS (k = 52) CD-ROM lesions’, ‘don’t know’ were added Comparison with ‘correct’ responses (9)

Raters (k)

Buckley et al. (2005), USA

Methods

Classification system (number of categories)

Author (year), Country/ Region

Table 4 (Continued)

Yes Phase 1: 56 Phase 2: 56 + 56

Photographs showing different skin alterations, use of transparent disc (n = 56)

No

Yes

2 · 20

Photographs of wounds and skin lesions on PUCLAS CD-ROM (n = 20)

10

Not stated

10

Photographs and case studies (n = 10)

PU photographs (n = 10)

Observations Normal per rater skin

Subjects/ targets (n)

Notes

Correct responses before po recalcueducation: po = 0Æ02 (16–20 lated correct answers), po = 0Æ15 (11–15 correct answers), po = 0Æ44 (6–10 correct answers), po = 0Æ39 (1–5 correct answers) Correct responses after education: po = 0Æ08 (16–20 correct answers), po = 0Æ56 (11–15 correct answers), po = 0Æ35 (6–10 correct answers), po = 0Æ02 (1–5 correct answers) Phase 1: interrater agreement: Intrarater multirater–j = 0Æ37, averagereliability j = 0Æ50 (95% CI 0Æ49–0Æ52); study was disagreement with experts: included po = 0Æ55 (grade 1), po = 0Æ44 (incontinence lesions), Phase 2 first assessment: interrater agreement: multirater-j = 0Æ38, average-j = 0Æ51 (95% CI 0Æ49– 0Æ54); second assessment: multirater-j = 0Æ43, average-j = 0Æ55 (95% CI 0Æ53–0Æ58) Mean j = 0Æ75 Part of prevalence study

Complete agreement among WOC nurses except for one PU Agreement of home care nurses with WOC nurses ratings: po = 0Æ39 to po = 1Æ00 (range), po = 0Æ73 (mean)

Results

Review Pressure ulcer classifications

329

330 PU photographs of foot/heel and buttock/hip region (n = 100)

Two assessments of the Trained experienced resame set of PU and search nurses (k = 6) skin lesions by research nurses

Independent assessments of PUs

PU according to NPAUP (yes/no) (2)

PU experts (k = 7)

PU experts (k = 2)

po = proportion of agreement, j = kappa, ICC = intraclass correlation coefficient.

Stausberg et al. Grades 0 (no PU) to 4 (severe (2007), damage) (5) Germany

PU and various skin lesion photographs (n = 160)

Independent assessment of PUs

NPUAP (4)

Noonan et alÆ (2006), USA Localio et al. (2006), USA PU photographs (n = 50)

Yes

Yes

100

Not stated

No

2 · 160

50

Partly trained staff nur- Study 1: Photographs Study 1: 7 Study 2: 17 showing PUs (2), ses and wound/skin venous (3), arterial care nurses from 48 hospitals across the US (1), diabetic foot ulcer (1) (n = 7) (k = 256) Study 2 version 1: Photographs showing PUs with wound descriptors (n = 17); version 2: Photographs showing PUs without wound descriptors (n = 17)

Observations Normal per rater skin

Study 1: Web based assessment of various skin lesions Study 2: Web based assessment of PU photographs according grade Comparison with ratings from expert panel

NPUAP, category ‘unstageable’ was added (5)

Hart et al. (2006), USA

Subjects/ targets (n)

Raters (k)

Methods

Classification system (number of categories)

Author (year), Country/ Region

Table 4 (Continued)

Notes

Study 1: overall agreement with expert ratings for wound identification: j = 0Æ56 (SD = 0Æ22), overall agreement for PU (yes/no): j = 0Æ84 (SD = 0Æ25), Study 2 version 1: overall agreement with expert ratings for PU grading: j = 0Æ72 (SD = 0Æ22) Study 2 version 2: overall agreement with expert ratings for PU grading: j = 0Æ56 (SD = 0Æ17) Nurses with wound/continence/ ostomy care certification had higher j-values for PU identification and grading po = 0Æ90 Part of prevalence study Agreement among nurses across Agreement recalcuboth assessments: agreement lated among six raters: 64%, agreement among five raters: 20%, agreement among four raters: 10%, agreement among three raters: 5%; ICC = 0Æ69 Agreement per photograph: 33% (seven raters), 20% (six raters), 29% (five raters); Agreement grade: po = 0Æ67, mean-j = 0Æ50 (95% CI 0Æ45–0Æ54); Agreement PU (yes/no): po = 0Æ88, meanj = 0Æ29 (95% CI 0Æ23–0Æ36)

Results

J Kottner et al.

 2009 The Authors. Journal compilation  2009 Blackwell Publishing Ltd, Journal of Clinical Nursing, 18, 315–336

Review

synthesis and comparison between study results was almost impossible. We will discuss the five main sources of heterogeneity in detail: quality of studies, assessment methods, qualification and training of raters, properties of the classification systems and sample characteristics and the process of PU classification.

Quality of studies The literature research revealed more than 50 studies that had examined some kind of interrater reliability of PU classifications. When looking at these studies in detail it became evident that relevant information for the interpretation of results was missing in many cases. Therefore we only selected 24 studies for the final data synthesis. One can argue that while planning and conducting PU incidence or prevalence studies the investigation of interrater reliability is not the main concern, however, single statements in the method section, such as ‘Interrater reliability of the research staff in staging lesions was checked monthly by one of the investigators and ranged from 95 to 100% throughout the study’ (Bergstrom et al. 1998) do not provide sufficient information to understand what was done by whom and how many persons and patients were involved. The thorough investigation of interrater reliability is essential in every study or clinical trial to estimate the amount of error inherent in the obtained results (Dunn 2004). It is, therefore, vital to report the measurement process and all relevant information. Our initial literature screening also revealed that information about interrater reliability was completely missing in many studies applying PU classifications. This raises questions about the validity of the study conclusions. Inadequate or missing documentation of interrater reliability is not typical of PU research only, but it was also reported in other fields of health care research (Mulsant et al. 2002). On the other hand, we identified high quality studies which put the main focus on interrater reliability investigation of PU classifications and also stated all relevant information.

Assessment methods The classification of studies according to the assessment methods seemed to be most appropriate. Examination of the patients’ or residents’ skin in real nursing practice is not comparable to the assessment of PU pictures or photographs (Russell 2002b, Defloor et al. 2006). It is assumed that artificial assessment conditions provide an estimate of raters’ knowledge concerning the instrument but does not reveal anything about the skills of raters actually conducting skin inspection, electing and interpreting clinical information and administering the tool (Kobak et al. 2004).

Pressure ulcer classifications

It is unclear whether photographs are harder or easier to assess under practical conditions than skin or wounds (Defloor & Schoonhoven 2004). A real skin assessment provides the observer with much more information. A real wound is threedimensional and observers are able to change perspectives. Additionally, they can examine the texture or smell of the wound and compare the tissue with other skin sites. Despite these advantages the abundance of information may be harder to interpret than that of images of unequivocal quality. When comparing both groups of analysed studies the results are ambiguous. In both groups interrater reliability varies from very low to almost perfect. However, data suggest that interrater reliability based on images can be increased when additional wound descriptors are provided (Hart et al. 2006). This indicates that PU assessment based on photographs alone is less precise. The opposite case could be shown, too. Interrater reliability for the EPUAP system after assessment of real patients was low (Pedley 2004), whereas interrater reliability for photograph ratings using the same classification system was much higher (Defloor & Schoonhoven 2004). However, as the same picture set was used again, the obtained interrater reliability coefficients were low again (Defloor et al. 2006).

Qualification and training of raters The preparation and training of raters are important features of measurement accuracy and interrater reliability levels (Kraemer 1979, Suen 1988, Streiner & Norman 2003). Findings from our final data synthesis revealed that all raters who conducted a real skin examination (table 3) were trained, experienced or specialised in chronic wounds or PUs. Therefore it is difficult to evaluate the impact of training or experience of this group on interrater reliability. However, although all raters were trained or experienced it is obvious that their levels of experience and knowledge are different, but based on our data this information is not sufficient to draw any meaningful conclusions. Findings from studies examining interrater reliability based on images are particularly valuable, because a comparison between trained and untrained raters was possible (Table 4). For instance, while applying the NPUAP classification Arnold and Watterworth (1995) measured an agreement among registered nurses with expert ratings before and after training and showed that agreement could not be enhanced. On the other hand, Briggs (2006) demonstrated that training can improve the quality of ratings. These results indicate that the type of training seems to play an important role. Results indicate that higher qualification and having specialised PU knowledge lead to higher interrater reliability

 2009 The Authors. Journal compilation  2009 Blackwell Publishing Ltd, Journal of Clinical Nursing, 18, 315–336

331

J Kottner et al.

(Defloor & Schoonhoven 2004, Hart et al. 2006, Noonan et al. 2006) and lower qualification to lower interrater reliability (Arnold & Watterworth 1995, Defloor et al. 2006). However, the opposite case can also be shown. Interrater reliability among seven PU experts was only moderate after assessment of 100 photographs (Stausberg et al. 2007). There was no study that randomly selected the raters. Therefore results are not generalisable. Regarding our research question it became evident that in most studies the population of persons working in daily practice using PU classifications (e.g. ward nurses) was not well represented.

Classification systems and sample characteristics In the final data synthesis we identified at least six classification systems which were applied in interrater reliability studies. Irrespective of the study methods the degree of interrater reliability for the most widely used EPUAP and NPUAP classifications ranged from fair to almost perfect. The interrater reliability for the Stirling, Surrey and Torrance scales was always lower and can be labelled as slight to moderate. Irrespective of the strength of the compared agreement estimates they are not comparable. The first reason is the varying number of categories, because agreement coefficients depend on the number of categories (Maclure & Willet 1987, Rigby 2000). Even if one only compared results of the same classification systems with each other, our data shows that the actual number of categories may still be different. The calculation of agreement indices of the NPUAP system, as an example, was based on four (Arnold & Watterworth 1995, Lyder et al. 1999) or five (Groeneveld et al. 2004, Buckley et al. 2005) categories. The EPUAP system was applied using four (Russell & Reynolds 2001, Gunningberg 2006), five (Bours et al. 1999, Halfens et al. 2001, Verdu 2003, Pedley 2004), seven (Defloor & Schoonhoven 2004), eight (Defloor et al. 2006) or even nine (Briggs 2006) categories. The reasons for these varying numbers were different aims and purposes of some studies. For instance Defloor and Schoonhven (2004) aimed to differentiate between pressure ulcers and moisture lesions and to measure interrater reliability of the EPUAP system. However, increasing numbers of categories lead to a decreasing probability of agreement between raters (Rigby 2000). Study results support this assumption (Healey 1995, Russell & Reynolds 2001) but the opposite case could be shown as well (Pedley 2004). The second reason why study results are not comparable are the sample characteristics. Values of reliability coefficients depend on the prevalence of the trait, which is in fact 332

unknown or ‘hidden’ (Shrout 1998, Dunn 2004, Shoukri 2004). Interrater reliability coefficients are measures for relative agreement, which means although the absolute agreement may be high, the reliability coefficient can be low due to very low or very high prevalence rates of the trait (Streiner & Norman 2003). As a consequence, no single coefficient is able to adequately reflect the actual degree of interrater agreement. Therefore j-values should be reported with other agreement coefficients like po-values and vice versa when evaluating interrater reliability adequately. In more than half of the studies included in the final data synthesis only one coefficient was reported. A further consequence of the coefficient being dependent on the prevalence is that detailed information is necessary regarding the sample or targets. For instance, the fact whether normal skin was assessed or not may influence the results. The values of calculated coefficients may also be misleading. For example, Bours et al. (1999) reported almost perfect interrater reliability between staff nurses and a researcher in a hospital (Table 3) when applying the EPUAP system. Among all observations (n = 674) only 17 PUs were rated as grade 1 and three PUs as grade 2. Remaining categories were rated as ‘no PU’. Consequently, the high interrater reliability coefficient contains no information about rating PUs grades 3 and 4. It is impossible to evaluate the interrater reliability or agreement of a five-category classification when in fact there were only three categories observed in the sample. Gawron (1994), for instance, reported po-values of 0Æ90–1Æ00 for the NPUAP system with five categories. The sample consisted of two hospital patients. It is very unlikely that these two patients showed all five categories of PUs. These examples are not only methodological flaws of interrater reliability studies but also a statistical property of the j-coefficient. The value of j does not indicate what was actually measured. In a strict sense the jstatistic is only appropriate for binary data (Donner & Eliasziw 1997, Kraemer et al. 2002). The last reason why study results are not comparable is the classification system itself. Although all grading tools have very much in common, the definition of PU grades is different and therefore the operational definitions of the categories are different too. The main difference between PU classification systems is the definition of grade 1 PUs (Table 1). Finally, apart from one study (Vanderwee et al. 2006) the subjects or targets were not randomly selected. This raises questions about the transferability of results.

Application of tools Results of the final data synthesis indicate that the process of PU diagnosis and grading seems to be important. Lyder et al.

 2009 The Authors. Journal compilation  2009 Blackwell Publishing Ltd, Journal of Clinical Nursing, 18, 315–336

Review

(1999) showed that the agreement in the case of one research nurse specialised in skin assessment for people with darker skin tones was higher than in the case of ‘normal’ skin assessment. For the distinction between blanchable (no PU) and nonblanchable erythema (PU grade 1) several authors used a transparent disc which was pressed lightly on the PU area. Based on the reviewed data there is no evidence that a transparent disc can enhance interrater reliability irrespective whether a real skin inspection was conducted (Halfens et al. 2001, Vanderwee et al. 2006) or results were based on images (Defloor & Schonhoven 2004, Defloor et al. 2006).

Pressure ulcer classifications

Due to the lack of approved and accepted instruments for the evaluation of interrater reliability studies we had to set our own quality criteria. Although every criterion has a sound theoretical foundation, a formal validity testing of the instrument itself was not established. Accepted guidelines for quality evaluation for interrater reliability studies are urgently required. Finally, our findings may be limited due to a possible language bias. We only considered studies in English and German. Perhaps we missed some relevant publications in other languages.

Conclusion Limitations This study had limitations. To gain as many interrater reliability data of PU classification as possible the inclusion criteria were set very broadly, however the use of j-coefficients as agreement measures means that all raters had to be treated symmetrically. In the majority of the included studies this condition was not fulfilled. One or more of the investigated sample of raters can be regarded as a standard (e.g. researcher or PU expert) which was compared to a second group of raters (e.g. ward nurses). In these cases the j-statistic is no longer appropriate, because it can be assumed that the standard is more precise (Fleiss et al. 2003). Consequently many studies investigated diagnostic accuracy rather than interrater reliability. On the other hand, it is impossible to assume that the standards used in the different studies are the same and have the same precision. Investigating diagnostic accuracy instead of interrater reliability would therefore be difficult as well. If a ‘true’ standard is not available the concept of accuracy becomes redundant (Dunn 2004). Further, interrater reliability estimations of PU classifications are limited due to inclusion of categories like ‘incontinence lesion’ or ‘don’t know’. Possibly this would be useful in nursing practice, but the obtained coefficients are not specifically valid for the PU classification system. We did not address the issue of validity of the classifications either. Our focus was put on the interrater reliability which is a measure of relative precision (Dunn 2004). High interrater reliability does not stringently indicate validity (Streiner & Norman 2003, Polit & Beck 2004). Up to date, there is a debate ongoing of how to define PUs, which skin alterations are in fact PUs and which are not (James 1998, Russell 2002a, Parish & Witkowski 2004, Sharp 2004, Defloor et al. 2005, Houwing et al. 2007, National Pressure Ulcer Advisory Panel 2007).

Due to the heterogeneity of the studies it was impossible to draw a meaningful comparison. The impact of the classification systems, their number of categories, the qualification and training of raters, the properties of the rated subjects and the application methods on the degree of interrater reliability is unclear. Therefore, there is a need of well designed interrater reliability studies comparing at least two different classification systems which should be used by a representative sample of raters and should be applied to comparable samples of residents or patients in clinical practice. Results of such studies should be computed using adequate and comparable statistical methods.

Relevance to clinical practice A high level of interrater reliability is the prerequisite for consistent and accurate PU classification. There is presently not enough evidence to recommend one special PU classification system for use in daily practice.

Contributions Study design: JK; data analysis: JK, KR and manuscript preparation: JK, KR, RH, TD.

References Allcock N, Wharrad H & Nicolson A (1994) Interpretation of pressure-sore prevalence. Journal of Advanced Nursing 20, 37–45. Allman RM, Goode PS, Patrick MM, Burst N & Bartolucci AA (1995) Pressure ulcer risk factors among hospitalized patients with activity limitation. Journal of the American Medical Association 273, 865–870. Allman RM (1997) Pressure ulcer prevalence, incidence, risk factors and impact. Clinics in Geriatric Medicine 13, 421–436. Alves MA (2006) Evaluating the Consistency of Pressure Ulcer Grading by Nurses in the Netherlands. Poster session, Dominican

 2009 The Authors. Journal compilation  2009 Blackwell Publishing Ltd, Journal of Clinical Nursing, 18, 315–336

333

J Kottner et al. University of California. Available at: http://www.dominican.edu/ query/ncur/display_ncur.php?id=2851 (accessed 11 January 2008). Arnold N & Watterworth B (1995) Wound staging: can nurses apply classroom education to the clinical setting? Ostomy Wound Management 41, 40–44. Audige´ L, Bhandari M & Kellam J (2004) How reliable are reliability studies of fracture classifications? Acta Orthopaedica Scandinavica 75, 184–194. Banks V (1998) The classification of pressure sores. Journal of Wound Care 7, 21–23. Barbenel JC, Jordan MM, Nicol SM & Clark MO (1977) Incidence of pressure-sores in the greater Glasgow health board area. The Lancet 310(8037), 548–550. Bergquist S & Frantz R (1999) Pressure ulcers in community-based older adults receiving home health care. Advances in Wound Care 12, 339–351. Bergquist S (2001) Subscales, subscores, or summative score: evaluating the contribution of Braden scale items for predicting pressure ulcer risk in older adults receiving home health care. Journal of Wound Ostomy and Continence Nursing 28, 279– 289. Bergstrom N, Braden B, Kemp M, Champagne M & Ruby E (1996) Multi-site study of incidence of pressure ulcers and the relationship between risk level, demographic characteristics, diagnoses and prescription of preventive interventions. Journal of the American Geriatrics Society 44, 22–30. Bergstrom N, Braden B, Kemp M, Champagne M & Ruby E (1998) Predicting pressure ulcer risk: a multisite study of the predictive validity of the Braden scale. Nursing Research 47, 261–269. Bland JM & Altman DG (1986) Statistical methods for assessing agreement between two methods of clinical measurement. The Lancet 327, 307–310. Bours G, Defloor T, Wansink S & Clark M (2002) Summary Report on Pressure Ulcer Prevalence Data Collected in Belgium, Italy, Portugal, Sweden and the United Kingdom over the 14th and 15th of November 2001. European Pressure Ulcer Advisory Panel. Available at http://www.epuap.org/review4_2/page8.html (accessed 13 August 2008). Bours GJJW, Halfens RJG, Lubbers M & Haalboom JRE (1999) The development of a National Registration Form to measure the prevalence of pressure ulcers in the Netherlands. Ostomy Wound Management 45, 28–40. Braden BJ & Bergstrom N (1994) Predictive validity of the Braden scale for pressure sore risk in a nursing home population. Research in Nursing and Health 17, 459–470. Briggs S-L (2006) How accurate are RGNs in grading pressure ulcers? British Journal of Nursing 15, 1230–1234. Brorson S & Hro´bjartsson A (2008) Training improves agreement among doctors using the Neer system for proximal humeral fractures in a systematic review. Journal of Clinical Epidemiology 61, 7–16. Buckley KM, Tran BQ, Adelson LK, Agazio JG & Halstead L (2005) The use of digital images in evaluating homecare nurses’ knowledge of wound assessment. Journal of Wound Ostomy and Continence Nursing 32, 307–316. Buntinx F, Beckers H, De Keyser G, Flour M, Nissen G, Raskin T & De Vet H (1986) Inter-observer variation in the assessment of skin ulceration. Journal of Wound Care 5, 166–170.

334

Calianno C (2007) Pressure ulcers in acute care: a quality issue. Nursing Management 38, 42–51. Carlson EV, Kemp MG & Shott S (1999) Predicting the risk of pressure ulcers in critically ill patients. American Journal of Critical Care 8, 262–269. Dealey C & Lindholm C (2006) Pressure ulcer classification. In Science and Practice of Pressure Ulcer Management (Romanelli M ed). Springer, London, pp. 37–41. Defloor T & Schoonhoven L (2004) Inter-rater reliability of the EPUAP pressure ulcer classification system using photographs. Journal of Clinical Nursing 13, 952–959. Defloor T, Schoonhoven L, Fletcher J, Furtado K, Heyman H, Lubbers M, Witherow A, Bale S, Bellingeri A, Cherry G, Clark M, Colin D, Dassen T, Dealey C, Gulasi L, Haalboom J, Halfens R, Hietanen H, Lindholm C, Moore Z, Romanelli M & Soriano JV (2005) Statement of the European Pressure Ulcer Advisory Panelpressure ulcer classification. Journal of Wound Ostomy and Continence Nursing 32, 302–306. Defloor T, Schoonhoven L, Vanderwee K, Westrate J & Myny D (2006) Reliability of the European Pressure Ulcer Advisory Panel classification system. Journal of Advanced Nursing 54, 189– 198. Derre B, Grypdonck M & Defloor T (1999) The Development of Nonblanchable Erythema in Intensive Care Patients. Poster session at the Sigma Theta Tau 11th International Nursing Research Congress, London. Donner A & Eliasziw M (1997) A hierarchical approach to inferences concerning interobserver agreement for multinomial agreement. Statistics in Medicine 16, 1097–1106. Dunn G (2004) Statistical Evaluation of Measurement Errors: Design and Analysis of Reliability Studies, 2nd edn. Arnold, London. Edwards L & Banks V (1999) Pressure sore classification grading systems. Journal of Community Nursing 13(10), 28–35. European Pressure Ulcer Advisory Panel (1998) Pressure Ulcer Treatment Guidelines. Available at: http://www.epuap.org/ gltreatment.html (accessed 3 December 2007). Ferrell BA, Artinian BM & Sessing D (1995) The Sessing scale for assessment of pressure ulcer healing. American Geriatrics Society 43, 37–40. Feuchtinger J, De Bie R, Dassen T & Halfens R (2006) A 4-cm thermoactive viscoelastic foam pad on the operating room table to prevent pressure ulcer during cardiac surgery. Journal of Clinical Nursing 15, 162–167. Feuchtinger J, Halfens R & Dassen T (2007) Pressure ulcer risk assessment immediately after cardiac surgery – does it make a difference? A comparison of three pressure ulcer risk assessment instruments within a cardiac surgery population. Nursing in Critical Care 12, 42–49. Fleiss JL, Levin B & Paik MC (2003) Statistical Methods for Rates and Proportions, 3rd edn. Wiley, New Jersey. Gajewski BJ, Hart S, Bergquist-Beringer S & Dunton N (2007) Interrater reliability of pressure ulcer staging: ordinal probit Bayesian hierarchical model that allows for uncertain rater response. Statistics in Medicine 26, 4602–4618. Gawron CL (1994) Risk factors for and prevalence of pressure ulcers among hospitalized patients. Journal of Wound Ostomy and Continence Nursing 21, 232–240.

 2009 The Authors. Journal compilation  2009 Blackwell Publishing Ltd, Journal of Clinical Nursing, 18, 315–336

Review Graves N, Birrell FA & Whitby M (2005) Modelling the economic losses from pressure ulcers among hospitalized patients in Australia. Wound Repair and Regeneration 13, 462–467. Groeneveld A, Anderson M, Allen S, Bressmer S, Golberg M, Magee B, Milner M & Young S (2004) The prevalence of pressure ulcers in a tertiary care pediatric and adult hospital. Journal of Wound Ostomy and Continence Nursing 31, 108–120. Gunningberg L (2006) EPUAP pressure ulcer prevalence survey in Sweden. Journal of Wound Ostomy Continence Nursing 33, 258– 266. Haalboom JRE, van Everdingen JJE & Cullum N (1997) Incidence, prevalence and classification. In The Decubitus Ulcer in Clinical Practice (Parish LC, Witkowski JA & Crissey JT eds). Springer, Berlin, pp. 12–23. Halfens RJG, Bours GJJW & Van Ast W (2001) Relevance of the diagnosis ‘stage 1 pressure ulcer’: an empirical study of the clinical course of stage 1 ulcers in acute care and long-term care hospital populations. Journal of Clinical Nursing 10, 748–757. Hart S, Bergquist S, Gajewski B & Dunton N (2006) Reliability testing of the national database of nursing quality indicators pressure ulcer indicator. Journal of Nursing Care Quality 21, 256–265. Healey F (1995) The reliability and utility of pressure sore grading scales. Journal of Tissue Viability 5, 111–114. Healey F (1996) Classification of pressure sores: 2. British Journal of Nursing 5, 567–574. Houwing RH, Arends JW, Canninga-van Dijk MR, Koopman E & Haalboom JRE (2007) Is the distinction between superficial pressure ulcers and moisture lesions justifiable? A clinical-pathologic study. Skinmed 6, 113–117. James HM (1998) Classification and grading of pressure sores. Professional Nurse 13, 669–672. Kobak KA, Engelhardt N, Williams JBW & Lipsitz JD (2004) Rater training in multicenter clinical trials: issues and recommendations. Journal of Clinical Pharmacology 24, 113–117. Kraemer HC (1979) Ramifications of a population model for j as a coefficient of reliability. Psychometrika 44, 461–472. Kraemer HC, Periyakoil VS & Noda A (2002) Kappa coefficients in medical research. Statistics in Medicine 21, 2109–2129. Kwong E, Pang S, Wong T, Ho J, Shao-ling X & Li-jun T (2005) Predicting pressure ulcer risk with the modified Braden, Braaden and Norton scales in acute care hospitals in Mainland China. Applied Nursing Research 18, 122–128. Lahmann NA, Halfens RJG & Dassen T (2005) Prevalence of pressure ulcers in Germany. Journal of Clinical Nursing 14, 165–172. Landis JR & Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33, 159–174. Localio AR, Margolis DJ, Kagan SH, Lowe RA, Kinosian B, Abbuhl S, Kavesh W, Holmes JM, Ruffin A & Baumgarten M (2006) Use of photographs for thee identification of pressure ulcers in elderly hospitalized patients: validity and reliability. Wound Repair and Regeneration 14, 506–513. Lyder CH, Yu C, Emerling J, Mangat R, Stevenson D, Empleo-Frazier O & McKay J (1999) The Braden scale for pressure ulcer risk: evaluating the predictive validity in Black and Latino/Hispanic elders. Applied Nursing Research 12, 60–68.

Pressure ulcer classifications Maclure M & Willet WC (1987) Misinterpretation and misuse of the kappa statistic. American Journal of Epidemiology 126, 161– 169. Marrie RA, Ross JB & Rockwood K (2003) Pressure ulcers: prevalence, staging and assessment of risk. Geriatrics Today 6, 134–140. Mulsant BH, Kastango KB, Rosen J, Stone RA, Mazumdar S & Pollock BG (2002) Interrater reliability in clinical trials of depressive disorders. American Journal of Psychiatry 159, 1598–1600. National Pressure Ulcer Advisory Panel (1989) Pressure ulcers prevalence cost and risk assessment: consensus development conference statement. Decubitus 2, 24–28. National Pressure Ulcer Advisory Panel (1997) Draft definition of stage I pressure ulcers: inclusion of persons with darkly pigmented skin. Advances in Skin and Wound Care 10, 16–19. National Pressure Ulcer Advisory Panel (2007) National Pressure Ulcer Advisory Panel’s Updated Pressure Ulcer Staging System. Advances in Skin and Wound Care 20, 269–274. Nixon J, McElvenny D, Mason S, Brown J & Bond S (1998) A sequential randomised controlled trial comparing a dry viscoelastic polymer pad and standard operating table mattress in the prevention of post-operative pressure sores. International Journal of Nursing Studies 35, 193–203. Nixon J, Thorpe H, Barrow H, Phillips A, Nelson EA, Mason SA & Cullum N (2005) Reliability of pressure ulcer classification and diagnosis. Journal of Advanced Nursing 50, 613–623. Noonan C, Quigley S & Curley MAQ (2006) Skin integrity in hospitalized infants and children: a prevalence study. Journal of Pediatric Nursing 21, 445–453. Parish LC & Witkowski JA (2004) Controversies about the decubitus ulcer. Dermatological Clinics of North America 22, 87–91. Pedley GE (2004) Comparison of pressure ulcer grading scales: a study of clinical utility and inter-rater reliability. International Journal of Nursing Studies 41, 129–140. Polit DF & Beck CT (2004) Nursing Research: Principles and Methods, 7th edn. Lippincott Williams & Wilkins, Philadelphia. Reid J & Morison M (1994) Classification of pressure sore severity. Nursing Times 90, 46–50. Rigby AS (2000) Statistical methods in epidemiology. V. Towards an understanding of the kappa coefficient. Disability and Rehabilitation 22, 339–344. Russell LJ & Reynolds TM (2001) How accurate are pressure ulcer grades? An image-based survey of nurse performance. Journal of Tissue Viability 11, 67–75. Russell L (2002a) Pressure ulcer classification: defining early skin damage. British Journal of Nursing 11, S33–S41. Russell L (2002b) Pressure ulcer classification: the systems and the pitfalls. British Journal of Nursing 11, S49–S59. Schoonhoven L, Bousema MT & Buskens E (2007) The prevalence and incidence of pressure ulcers in hospitalised patients in The Netherlands. International Journal of nursing Studies 44, 927– 935. Schoonhoven L, Defloor T & Grypdonck MHF (2002a) Incidence of pressure ulcers due to surgery. Journal of Clinical Nursing 11, 479–487. Schoonhoven L, Defloor T, Van der Tweel I, Buskens E & Grypdonck MHF (2002b) Risk indicators for pressure ulcers during surgery. Applied Nursing Research 16, 163–173.

 2009 The Authors. Journal compilation  2009 Blackwell Publishing Ltd, Journal of Clinical Nursing, 18, 315–336

335

J Kottner et al. Shea D (1975) Pressure sores: classification and management. Clinical Orthopaedics and Related Research 112, 89–100. Sharp A (2004) Pressure ulcer grading tools: how reliable are they? Journal of Wound Care 13, 75–77. Shrout PE (1998) Measurement reliability and agreement in psychiatry. Statistical Methods in Medical Research 7, 301–317. Shoukri MM (2004) Measures of Interobserver Agreement. Chapman & Hall/CRC, Boca Raton. Stausberg J, Lehmann N, Kro¨ger K, Maier I & Niebel W (2007) Reliability and validity of pressure ulcer diagnosis and grading: an image-based survey. International Journal of Nursing Studies 44, 1316–1323. Stotts NA (2001) Assessing a patient with a pressure ulcer. In The Prevention and Treatment of Pressure Ulcers (Morison MJ ed). Mosby, London, pp. 99–115. Streiner DL & Norman GR (2003) Health Measurement Scales, 3rd edn. Oxford University Press, Oxford. Suen HK (1988) Agreement, reliability, accuracy and validity: toward a clarification. Behavioral Assessment 10, 343–366. Vanderwee K, Grypdonck MHF & Defloor T (2005) Effectiveness of an alternating pressure air mattress for the prevention of pressure ulcers. Age and Ageing 34, 261–267.

336

Vanderwee K, Grypdonck MHF, De Bacquer D & Defloor T (2006) The reliability of two observation methods of nonblanchable erythema, Grade 1 pressure ulcer. Applied Nursing Research 19, 156–162. Vanderwee K, Clark M, Dealey C, Gunningberg L & Defloor T (2007a) Pressure ulcer prevalence in Europe: a pilot study. Journal of Evaluation in Clinical Practice 13, 227–235. Vanderwee K, Grypdonck MHF, De Baquer D & Defloor T (2007b) Effectiveness of turning with unequal time intervals on the incidence of pressure lesions. Journal of Advanced Nursing 57, 59–68. Vanderwee K, Grypdonck M & Defloor T (2007c) Non-blanchable erythema as an indicator for need for pressure ulcer prevention: a randomized-controlled trial. Journal of Clinical Nursing 16, 325– 335. Verdu J (2003) Can a decision tree help nurses to grade and treat pressure ulcers? Journal of Wound Care 12, 45–50. Warner U & Hall DJ (1986) Pressure sores: a policy for prevention. Nursing Times 82, 59–61. Yarkony GM, Kirk PM, Carlson C, Roth EJ, Lovell L, Heinemann A, King R, Lee MY & Betts HB (1990) Classification of pressure ulcers. Archives of Dermatology 126, 1218–1219.

 2009 The Authors. Journal compilation  2009 Blackwell Publishing Ltd, Journal of Clinical Nursing, 18, 315–336