Evaluating physical skills

This article continues our pedagogical series by focusing on evaluation methods for physical skills. These evaluation schemes can be either formal or informal as previous articles have detailed, with their primary purpose being to provide the evaluator with a proper picture of the students’ abilities, strengths, growth, and points upon which to improve. This picture also provides the student with a global picture of where his abilities lay, and provides important feedback for continued progress. Finally, it should provide a reference point for future evaluations to ensure progress is being made, by providing a baseline for comparison.

The best method for providing lasting feedback on student’s progression is with a dichotomic evaluation scheme, as this article will present. In a nutshell, while there are many and varied methods of formal and informal evaluation (see Ranks and curricula, part II for more), the simple fact of evaluating physical skills is a student either can or cannot perform a particular skill or technique. This article will present different evaluation schemes and provide examples of why a dichotomic scheme is preferred for evaluating martial skills.

Percentage scores

Besides the perennial (and widespread) trope of evaluating a student by observing and saying “yup, looks good,” (a horribly insufficient method) the most familiar modern scheme we know of is in the form of percentage grades. We’ve been raised on this scheme, and are used to and familiar with it, but it has serious disadvantages in terms of evaluating physical skills, competence, and providing feedback. An example follows.

1.

Abrazare
Remedy 80%
1st play 75%
2nd play, etc… 60%

In the above example, we see the student performed the remedy quite well, with borderline success in the 2nd play. This however, begs the question: what, precisely, does 60% mean? 60% of what? Compared to what? It is a comparative scheme, usually used to compare one student to another, or a set a baseline. Further, it provides zero feedback to the student on what, precisely, the students needs to improve upon. Finally, when reviewing progress, are you certain you’ll remember how you graded, using what criteria?

Similarly, scores based on an arbitrary number don’t provide much feedback and are very subjective in nature. One evaluator may score someone higher than another without a clear set of guidelines or criteria against which to grade. These criteria, based on clear objectives, are needed (and discussed in previous articles), aiding the objectivity of the evaluation.

Numerical scale

2.

Abrazare
Remedy 1 2 3 4 5 6 7 9 10
1st play 1 2 3 5 6 7 8 9 10
2nd play, etc… 1 2 3 4 5 7 8 9 10

What, precisely, do the above numbers mean? 8 of 10 of what? What is there to correct?

Remarks, of course, can aid in feedback, but the result is still very subjective from the observer’s point of view:

3.

Abrazare
Remedy 1 2 3 4 5 6 7 9 10
Notes: Pretty good, footwork needs work.
1st play 1 2 3 5 6 7 8 9 10
Notes : Wrong tempo, poor form.
2nd play, etc… 1 2 3 4 6 7 8 9 10
Notes: Lacks intent.

A better scheme for evaluating skill and competence is a dichotomous scale. The student either can or cannot, pass or fail, to varying degrees. As discussed in my previous article, the number of steps should be even, to avoid any “fence sitting” –  a common psychological phenomenon often occurring with evaluations. Forcing an even number of steps forces the evaluator to pick: the candidate can, or cannot perform as directed.

Dichotomic

4.

Abrazare
Pass Fail
Remedy x
1st play x
2nd play, etc… x

5.

Abrazare
Fail Pass
1 2 3 4
Remedy x
1st play x
2nd play, etc… x

The above examples provide a better schema in terms of evaluating skills.  Better, but still no feedback and no objectivity. Remarks will, of course, help here.

Grading against criteria

A better scheme is to provide criteria and grade against it. This provides some decent feedback to the student as well as objectivity. For instance, in the IAS evaluation scheme, when grading a Scholar, the barometer for passing is “competence.” Referring to the competency scale, we know that competence means:

Competent:  independent evaluation of a situation, autonomy, transfer of technique across situations.

  • Executes techniques in isolation (set plays, simple phrases) against non-compliant partners.
  • Executes technique without prompting, in tempo and using proper body mechanics.
  • Capable of planning an approach (strategically)
  • Can apply tactical decision making consciously

6.

Abrazare
Fail Pass
1 2 3 4
Remedy x
1st play x
2nd play, etc… x

Legend:

1 – Does not meet the requirements

2 – Additional work required to meet the requirements

3 – Meets the base requirements

4 – Has exceeded the base requirements.

Where the “requirements” are those outlined for Scholar, i.e.: Competence.

Obviously, the numbers are not needed for this type of scale but are kept for reasons that will become clear. Additionally, the numbers are not what matters, since we are not using a percentile grading system. The scale could be 0-3, 1 to 10, 0 to 9, but limiting the scale to four eases grading.

Further conditions may be applied, for instance:

  • The candidate must pass 2 of 3 techniques to pass.
  • The candidate must pass with a cumulative score of 8. (there are those pesky numbers again)
  • The candidate must pass with a cumulative score of 8, with not more than one score below 3.
  • The candidate must have an average score of 2.5 to pass.
  • The candidate is allowed one “missed” technique.
  • Etc., according to your goals.

The numbers play a role in determining points to work on by graphing (preferably on a web graph), providing an easy visual reference. The more symmetrical the object is, the better. The goal becomes to then both “grow” the figure, as well as make it symmetrical. Asymmetry indicates an unbalance in technique, and thus something to work on. The graphs below provide an examples (the individual criteria are fictitious, acting simply as placeholders, but should provide examples of how to proceeed with your own models.)

Evaluation
Chart showing individual technique evaluation.
Dagger evaluation
A dagger evaluation chart, plotting techniques against one another to show where work needs to be done.
System evaluation
Overall evaluation chart, plotting different evaluated components of a system to determine weaknesses.

As you can see in the examples above, by using sums of scores or averages, one can provide an overall picture using a graph once again. Sums require all the evaluations to have the same number of criteria, while averages can be easily plotted. For example:

Abrazare 18/30
Daga 34/60
Spada 56/100
Lanza 9/15

Versus

Abrazare 3.5
Daga 2.5
Spada 3.8
Lanza 2.1

This is a simple model, easily implementable to cover a wide array of situations with little more work or customisation.

Taking the above model further, however, we can expand on feedback, if not objectivity.

7.

Abrazare
Fail Pass
1 2 3 4
Remedy
Footwork x
Mechanics x
Cover x
Structure x
1st play
Footwork x
Mechanics x
Cover x
Structure x
2nd play
Footwork x
Mechanics x
Cover x
Structure x

Legend:

1 – Does not meet the requirements

2 – Additional work required to meet the requirements

3 – Meets the base requirements

4 – Has exceeded the base requirements.

Candidate must have an average of 2.5 per technique, with no more than one missed technique to pass.

Obviously, the feedback is much more rich, giving a much better portrait of what the student needs to work on, and provides a ready reference for comparison at a later date. Remarks, of course, can be added to aid feedback. The first example chart, above, was plotted using this model.

Alternatively, the table could resemble this:

8.

Abrazare
Remedy
Footwork 3
Mechanics 2
Cover 3
Structure 1
1st play
Footwork 1
Mechanics 2
Cover 3
Structure 4
2nd play
Footwork 2
Mechanics 2
Cover 3
Structure 2

Legend:

1 – Does not meet the requirements

2 – Additional work required to meet the requirements

3 – Meets the base requirements

4 – Has exceeded the base requirements.

Candidate must have an average of 2.5 per technique, with no more than one missed technique to pass.

Passing grade is 3 and above.

This is useful for saving space on a page, but lacks the instant visual recognition of a pass or fail position in the chart that the previous example provides.

Finally, the best tool for providing feedback, but consequently the most onerous to implement, uses informative headings for each criterion. These headings are based on common student mistakes, and should be ordered by severity, from “game stopper” through “ok, but needs work” to “excellent.”

9.

Abrazare
Fail Pass
1 2 3 4
Remedy No footwork Unstable footwork, compromises structure Stable footwork, lacks fluidity Stable, fluid footwork
Footwork x
Poor mechanics, no structure Collapsing structure Solid mechanics and structure Structure excellent, collapses opponent
Mechanics x

As can be seen, there is more than ample feedback provided, but the implementation time is long, and its adaptability is nil for other situations.

For most needs, example #6 works well for a variety of situations, with little customisation work, and is quickly implementable. Ideally, example #7 is a good compromise, providing objectivity, feedback and an implementation that is easily portable, once the criteria have been established. Of course, mixing different evaluation schemes is always an option, providing more feedback for specific categories (e.g.: interpretation, tactical, etc.)

The latter suggestion might also require creating evaluation criteria for other areas of evaluation. For instance, what does being “competent” in tactics really mean? It would be best to develop its own grading scheme, eg.: “applies tactics sparingly,” “reads the tactical situation but fails to act,” “correctly applies tactics to the situation,” “uses superior tactics (thinks outside the box)” or however, as an evaluator, one judges the criteria for whatever objectives are being sought.

For the IAS, we seek to evaluate along several lines: performance, understanding, instructional skill and interpretation. As such, here is an example of evaluation points (which may then be broken down into criteria for individual evaluation).

Quality of execution (QoE) – these are evaluated easily using the grading schemes proposed (6 or 7). This includes:

  • General body mechanics (GBM) – can be evaluated as a part of QoE grading, or on its own.
  • Footwork – as general body mechanics, or can be folded into GBM.
  • Attacking – as general body mechanics, or can be folded into GBM.
  • Defenses/set plays
  • Grappling
  • Tactical

Quality of Interpretation (QoI)

Instructional (teaching ability)

There exists a plethora of criteria or points of evaluation that may be of value to an instructor or group, and you may wish define different points of evaluation, perhaps containing, but not limited to the following common evaluation points:

  • Theory: a pass/fail test. It’s dichotomic. Suggest 70-80% passing grade.
  • Set plays. Set plays are static performance of technique, e.g. “do the third master remedy versus a roverso”
  • Applied set plays. Applied set plays are open-ended and don’t define a specific play, but incorporate a decision-making process and tactical considerations, for instance: “defend against a roverso”.
  • Drills. We all have specific drills we want students to learn – these can be evaluated for performace.
  • Cutting or attacking. Attacking in proper tempo, using proper sequence, form, etc.
  • Defence.
  • Flow drills. Quite similar to “drills” above, but evaluation criteria may be diferent, since the goals of each type of drill vary.
  • Application (freeplay/loose play). In short, sparring evaluation.

There are myriad things any teacher may want to evaluate, therefore I hope that the models and examples provided above help instructors in their endeavour to provide objective evaluation and valuable feedback for physical skills. This article will be followed shortly by another on providing diferent types of feedback, and their application.