2. Literature Survey

In the previous chapter, we explained how retrieval of similar cases relates to CBR, fuzzy logic, and weather prediction; namely: CBR depends on retrieval of similar cases, fuzzy logic enables retrieval of similar cases, and the weather prediction technique of analog forecasting depends on retrieval of similar cases. We also explained why case adaptation and case authoring are regarded as main challenges in CBR system development.

In this chapter, we survey the literature to explain how using a fuzzy k-nearest neighbors based technique for retrieval of similar cases, designed and tuned with the help of domain expert, can help us to exploit large databases of cases and available domain knowledge about similarity, and can help us to avoid difficulties of case adaptation and case authoring.

In section 2.1, we describe the main resources for CBR. In section 2.2 we review how fuzzy logic is used in CBR. In section 2.3, we provide a foundation for the fuzzy k-nearest neighbors (fuzzy k-nn) technique. In section 2.4, we review a number of CBR applications that exemplify the fuzzy k-nn technique. In section 2.5, we review weather prediction papers that use CBR and fuzzy logic.

2.1 Resources for case-based reasoning

The main resources for case-based reasoning are (of course): cases, a method for reasoning, and software. The points of this section are: CBR scales up to take advantage of large databases of cases, one can reason on the basis of similarity alone, and domain knowledge improves the process of determination of similarity, and existing software may or may not be helpful.

2.1.1 Large databases of cases

CBR scales up to take advantage of large databases of cases . Creecy et al. (1992) describe an early successful large scale k-nn system, called "PACE." For the 1990 United States Census, 22 million natural language census returns had to be classified into 232 industry categories and 504 occupation categories. The case base consisted of 132,000 previously classified returns.

Before PACE, census classification required expensive, labor-intensive clerical work. An expert system, called "AIOCS," was developed in 1990 to assist clerks. PACE required 4 person-months to be built, whereas AIOCS required 192 person months. PACE successfully processed 60% of returns whereas AIOCS processed 47%.

The larger the database and the denser the examples close to the case, the better the accuracy of PACE's performance.

Confidence measures were a byproduct of the k-nn approach. PACE generated a nearness measure for each example. If any new example was identical or very similar to a previously seen case, then PACE attached high confidence to its results. If no previously seen case matched closely, PACE reported the closest cases and, additionally, informed the user that the results were dubious.

Gentner and Forbus (1991) describe a model of similarity-based retrieval called MAC/FAC, short for "Many Are Collected, Few Are Chosen." The idea is to exploit large case bases with minimal computational cost. Many potentially similar cases are screened using a simple test, then a few probably similar cases are ranked for similarity using a more detailed test. The MAC part encodes structured representations as content vectors whose dot product yields an estimate of how well the corresponding structural similarities will match. The FAC part performs a more detailed, computationally expensive structural mapping. MAC/FAC inspired many researchers.

Scaling up helps us to avoid the case adaptation problem of CBR. The more cases we evaluate, the better the chance that good analogs exist and that there is less requirement for adaptation.

2.1.2 Domain knowledge about similarity

One can reason on the basis of similarity alone, and domain knowledge improves the process of determination of similarity. Cain et al. (1991) use domain knowledge to influence similarity judgement. A few very simple domain-based rules about which combinations of attributes are more important than others significantly improves CBR performance.

Aamodt and Plaza (1994) identify a trend in CBR research and development. Whereas pioneering work stressed the cognitive science based view of CBR as a plausible, general model of intelligence, more recent work emphasizes the importance of knowledge acquisition in the development CBR systems. Based on this trend, Aamodt and Plaza (1994) urge CBR researchers to aim for "flexible user control [and] total interactiveness of systems." CBR becomes more applicable as it integrates knowledge based techniques. Of particular relevance for this thesis, they explain:

The `indexing problem' is a central and much focussed problem in case-based reasoning. It amounts to deciding what types of indexes to use for future retrieval, and how to structure the search space of indexes. Direct indexes [i.e., examining surface features] skips the latter step, but there is still the problem of identifying what types of indexes to use. This is actually a knowledge acquisition problem...

Watson and Marir (1994) survey CBR research and conclude:

Case-based reasoning will be ready for large-scale problems only when retrieval algorithms are efficient in handling thousands of cases. Unlike database searches that target a specific value in a record, retrieval of cases from the case base must be equipped with heuristics that perform partial matches, since in general there is no existing case that exactly matches the new case.

The fuzzy k-nn method acquires domain-based knowledge about how to perform partial matching and this acquired knowledge guides retrieval of similar cases from large case base (Hansen and Riordan 1998). Reasoning on the basis of similarity helps us to avoid the case authoring problem: using similar, automatically-recorded cases eliminates the dependency on having a person handcraft cases.

2.1.3 Commercial software tools

Hashemi (1999) reviewed numerous recently developed CBR software tools. ³⁶ The tools are described in terms of how they assist system development and how the systems operate. In general, the tools share the following properties: For development, the systems help people to construct case libraries. Libraries are constructed through hierarchical organization of existing cases and through helping people to author new problem-solving cases. For operation, the systems help users to retrieve similar and analogous cases. Retrieval uses rules, object hierarchies (i.e., decision trees), and nearest-neighbor algorithms. Each of the reviewed tools is a sort of production system ³⁷ where, in the context of CBR, the productions are problem-solving cases in a library.

CBR software tools commonly assume that cases divide into hierarchies. For example, early applications dealt with dinner-planning: meats divide into classes such as chicken, fish, and so on (Riesbeck and Schank 1989). For another example, an infection-diagnosing system could classify germs as either "gram positive" or "gram negative" (using a test in which a dye is applied to a sample and the sample is examined to see whether the dye was absorbed). Weather cases do not divide into hierarchies, or distinct classes. For any two distinct weather cases, a third case could occur which would be midway between the first two. If the first two cases are used to define hierarchies, or classes, then classification of the third case would be ambiguous.

It can be difficult to adapt available CBR software tools for specific applications. ³⁸ So rather than using such a tool, we chose to develop a unique fuzzy k-nn system using the C programming language. Our three main reasons for building a system from basic components are:

· Data opportunity. The fuzzy k-nn weather forecasting system is not dependent on a library of prototypical cases. It does not require manual or automatic construction of a library. Instead, it uses existing, huge archives of ready-to-use weather data. Moreover, these archives are continuously growing as new airport weather observations are made.
· Data integrability. An advantage of applying fuzzy logic to data mining is that it will enable us to integrate new forms of data with records of old data. For future work, we plan to combine satellite data and airport weather observations. Operationally, satellite data is one of the most valuable forms of predictive guidance for aviation forecasting, particularly short-term projection of satellite images. The archive of hourly airport surface weather observations, which stretches back 36 years, does not contain accompanying satellite images, but in many instances we can reasonably infer what satellite images would have looked like based on the surface observations. For example, if surface conditions change from clear skies to rain, we can reasonably infer that satellite images would have shown a change from clear to overcast. Past cases can be augmented with such inferred attributes, what would have been shown with satellite data had it existed, and then compared with present cases which have actual satellite data. Thus, we can exploit short-range projections of satellite data to search the archive more intelligently.
· System flexibility. The available weather data and related weather forecasting systems and programs depend mainly on C programs, so we will be able to integrate our system effectively by developing parallel, easily modified code.

We are encouraged by the example of Baldwin and Martin (1995) who applied fuzzy logic to data mining (they call their tool a "fuzzy data browser") and found fuzzy logic to be advantageous in three ways.

· Finding relationships. Their system develops fuzzy rules ³⁹ that describe relationships within the data, based on numerous cases, and can thus evaluate new cases accordingly: "Each rule corresponds to a summary of several `similar' cases in the database into a fuzzy prototypical rule. The value of a new case then corresponds to an interpolation between these fuzzy prototypes." An extension of this is "that it is possible to use the rules to highlight anomalous values." New cases that do not fit previous patterns indicate new relationships that may warrant scrutiny.
· Involving and exploiting intelligent users. "The system can run entirely autonomously, or the user can inject expertise either in ... suggesting attributes which should be target for prediction ... or suggesting attributes which should be used to make the prediction ... or suggesting compound features. ... The browser can be used to explore intuition about the underlying relationships in the data, or left to run autonomously and discover relations that can be presented to the user."
· Combining new forms of data with records of old data. "In many cases, data may not be complete-for example, weather records could include daily temperature extremes, rainfall, cloud cover, etc. If a new machine became available for monitoring atmospheric pollution, these measurements could be added to the database, but the earlier records would not contain this information. If there is a relation between atmospheric pollution and some of the other recorded quantities, rules modeling this relationship could be used to make an intelligent guess as to the pollution level for each record where it was not measured." This data integrability advantage of fuzzy logic will enable us to integrate into new cases new forms of data, such as satellite data, and compare such enriched present cases (composite cases) with past cases.

In this section, we described how, to overcome the case adaptation and case authoring problems of CBR system development (which are described in subsection 1.3.2), three strategies have been proposed: knowledge acquisition, partial matching, and use of very large case bases. The rest of this chapter surveys papers that apply fuzzy logic to use these strategies.

2.2 Fuzzy logic and case-based reasoning

López de Mántaras and Plaza (1997) surveyed over 100 recent CBR papers and concluded that

the use of Fuzzy Logic techniques may be relevant in case representation to allow for imprecise and uncertain values in features, [and] case retrieval by means of fuzzy matching techniques. [Moreover] perhaps the most severe limitation of existing systems [all types of CBR systems] is the feature-value representation that is being used for cases. The consequence is that case-based algorithms cannot be applied to knowledge-rich applications that require much more complex case representations, for example cases with higher-order relations between features.

The CBR process of matching cases can be carried out by the fuzzy logic process of measuring the degree of similarity of cases.

CBR and fuzzy logic both deal with how to determine degree of similarity, but they tend to use different approaches. CBR commonly deals with features, geometry, and structure (Bridge 1998, and Liao et al. 1998), whereas fuzzy logic deals explicitly with uncertainty and ambiguity expressed intentionally by humans when they are asked to describe similarity. Fuzzy words describe uncertainty intentionally and fuzzy sets represent the intended uncertainty.

2.2.1 Fuzzy CBR formalism

Fuzzy CBR is a type of CBR that uses fuzzy methods to represent and compare cases, and to form solutions. The implicit principle of the fuzzy k-nn method is expressed by Dubois et al. (1997) as: "the more similar are the problem description attributes, the more similar are the outcome attributes."

Their paper is relevant for this thesis for two reasons. First, for foundation, it provides a general mathematical formalism with which to take advantage of this principle. The fuzzy k-nn method applied to a weather prediction problem is an actualization of this general formalism. In this subsection, we summarize their formalism in order to provide a basis for the fuzzy k-nn method. Neither we nor Dubois et al. (1997) investigate learning aspects of CBR.

Second, it describes how a fuzzy set framework accommodates two types of problems: deterministic and non-deterministic. As we noted in the Introduction, the weather prediction problem, which is deterministic in theory, is non-deterministic in practice because of uncertainty surrounding weather measurements. Fuzzy sets represent this uncertainty.

Our interpretation of fuzzy CBR formalism is consistent with that given by Dubois et al. (1997). This subsection reviews and summarizes their description, nearly verbatim.

A case is viewed as an n-tuple of precise attributes. The set of attributes is divided into two non-empty disjoint sets: the problem description attributes subset and the solution description attributes subset, which are denoted by S and T respectively.

A case is denoted by a tuple (s, t) where s and t stand for complete sets of precise attribute values of S and T respectively.

To perform case-based reasoning, we relate problems with solutions. We assume that we have a finite set M of stored cases called a case base or memory (M is thus a set of pairs (s, t)), and a current problem description denoted by s₀, for which the precise values of all attributes belonging to S are known. Then case-based reasoning aims to extrapolate an estimate of the value t₀ of the attributes in T for the current problem.

In this setting, the implicit case-based reasoning principle is defined as:

The more similar are the problem description attributes, the more similar are the outcome attributes.

This is the implicit principle that underlies all deterministic CBR. If two items are perfectly similar then they are identical. So in theory, if a problem is well-posed-if a problem-to-solution mapping is many-to-one or one-to-one-then the solution is deterministic. In a deterministic setting, the CBR principle may be expressed with the following constraint

"(s₁, t₁), (s₂, t₂) Î M, S(s₁, s₂) £ T(t₁, t₂)

This means that the similarity of s₁ and s₂ constrains the similarity of t₁ and t₂ at a minimum level. In other words, the problem is well posed. For deterministic CBR to be applicable, cases must map to the solution space in a many-to-one or a one-to-one way. Deterministic CBR is inapplicable when cases map to the solution space in a many-to-many way.

Expressed in terms of fuzzy relations on S and T, the implicit CBR principle can be expressed with the following rule:

The more S-similar s₁ and s₂, the more T-similar t₁ and t₂

where (s₁,t₁,), (s₂, t₂) Î M. A problem in this fuzzy CBR framework is denoted with the 4-tuple <M, S, T, s₀> where the above principle is applicable.

All of the applications described in this chapter (ahead in Section 2.4) and our own application (in Chapter 3) exemplify of the use of this principle. A detailed example of the use of this principle is given in the Appendix B: A Worked-out Example of Fuzzy k-nn Algorithm for Prediction.

In reality, however, problems are often ill-posed. ⁴⁰ In the physical world, attributes are rarely known precisely and with certainty. This makes efforts to develop deterministic CBR futile. This is also why very long-range weather predication remains impossible.

Prediction of the sufficiently distant future is impossible by any method, unless the present conditions are known exactly. In view of the inevitable inaccuracy and incompleteness of weather observations, precise very-long range forecasting would seem to be non-existent [and this conclusion does] not depend on whether the atmosphere is deterministic. (Lorenz 1963)

Long-range prediction of many real, dynamical systems is hampered in this way. Given enough time, chaos defeats determinism in natural systems across all scales, ranging from molecular to astronomical. ⁴¹

To deal with non-deterministic problems, Dubois et al. (1997) suggest to relax the constraints of the above-described formalism and to accept that "case-based reasoning can only lead to cautious conclusions." Indeed, when attempting long-range prediction using realistic, dynamical, analogous temporal cases, we should always qualify our conclusions. We should recognize the margin of error. Examining ensembles of cases, such as those shown in the snowboard analogy (Figure 5, page 29), enables us to determine appropriate margins of error to qualify our results.

To relax the constraints, Dubois et al. (1997) reword the principle slightly and use fuzzy sets to modify the constraint.

the more similar s₁ and s₂, the more possible t₁ and t₂ are similar

Yager (1997) also argues for a unified view of fuzzy set theory and CBR, and goes so far as to contend that: "The reasoning process used in FSM [fuzzy systems modeling] and CBR are the same." The main distinction between fuzzy modeling and CBR identified by Yager is that

fuzzy modeling is generally used in environments in which the required solutions are numeric values, whereas the case-based methodology has a more ambitious agenda regarding the domain of the possible solution. This more ambitious agenda comes at the price of not always having available the necessary operations to combine solutions.

Thus fuzzy CBR is a subset of CBR, a type of CBR that uses fuzzy methods to represent and compare cases, and to form solutions

2.2.2 Numerous conditions and partial matching

Fuzzy CBR combines numerous conditions and partial matching in queries. As the number of conditions specified increases, the chance of turning up a good match increases. The same principle underlies a Bayesian case-matching scheme described by Kontkanen et al. (1998). The basic idea in their scheme is to use prior and posterior probabilities of certain cases to adapt cases retrieved from a case base. Prior probability describes the likelihood of a feature prior to the query. Posterior probability describes the likelihood of a feature after the query. For example, at Halifax the prior probability of rain at any given hour may be near 10%, but the posterior probability of rain an hour after rain was observed may be near 90%. Kontkanen et al. (1998) explain how to use such information to weight cases appropriately when making solutions. But both types of probability have their problems. Using prior probability turns up many cases of low similarity, where as posterior probability turns up few cases, or no cases, of high similarity. Kontkanen et al. (1998) explain the necessity of having a "soft" metric for dealing with cases that may or may not have good matches in the case base. Their proposed future work is to further develop and test a "soft constraint approach."

Their system uses cases that are continuous feature vectors which, as they note, are the commonest type of vector in CBR prediction research. They further explain how CBR distance metrics are themselves, in essence, restricting assumptions on the problem domain:

in many traditional CBR systems, the algorithms typically use a distance function (e.g., Euclidean distance) for the feature vectors in order to determine the most relevant data items for the task in question. The use of a specific distance function implicitly assumes that the distance function is relevant with respect to the problem domain probability distribution, and hence restricts the set of distributions considered.

One commonly used simplifying assumption is that all the attributes are independent. This makes integration of probabilities simple and feasible. Each attribute brings its own independent probability distribution into the equation. They formulate a Bayesian similarity metric which exploits posterior probability.

Their experiment was to repeatedly select cases at random from the case base, remove some features, introduce small random modifications to the rest of the features of the case, and use the resulting partial scrambled case as a query on the case base. They found that the Bayesian similarity score produced better results than a simple Hamming distance similarity score. Original cases could be recognized in the case base based on only a few perturbed original attributes.

As they note in their conclusion, basic "Bayesian probability theory can be used as a formalization for the intuitively appealing CBR paradigm." They cite one of the advantages of the Bayesian scheme is that it forces one to explicitly recognize all the assumptions made about the problem domain. Simple distance metrics can conceal assumptions about dependence or independence of features, assumptions that may or may not be correct.

Their conclusion is reasonable and hardly contentious. Basically they conclude that probabilistic methods, carefully used, are useful for CBR. We contend that fuzzy k-nn offers similar opportunities to complete partial cases by querying databases. Moreover, the fuzzy k-nn enables us to explicitly express assumed knowledge of similarity and assumptions about dependence or independence of features acquired from a domain expert. Assumptions about dependence and independence of features are determined by how fuzzy sets are operated upon. For example, the similarity of two case's winds can be computed as a single value based upon the similarity of two interdependent wind variables (direction and speed), while the similarity of two case's humidities can be computed as another single value based upon the independent variable of humidity.

Bosc and Pivert (1992) explain, in formal mathematical terms, how fuzzy sets enable flexible querying of databases. Fuzzy sets enable imprecise queries on a database. Two situations in which imprecise query conditions are useful:

· When the user is imprecise, fuzzy sets model the imprecision.
· When a prespecified number of responses is desired, discriminating margins of fuzzy sets enable elements in the database to be ranked according to degree of similarity.

Bosc and Pivert (1992) suggest, for a research topic, "improvement of the discriminating capability of the fuzzy sets based approach in two extreme cases: no element is selected (null degree) and a too large number of elements which have received a degree equal to 1."

Bosc and Pivert (1992) used trapezoidal fuzzy sets. Such fuzzy sets under-utilize the discriminating power of fuzzy sets. Two elements whose memberships both equal 0 or both equal 1 are seen as equivalent even though they may differ slightly. However: unimodal fuzzy sets discriminate more effectively than trapezoidal fuzzy sets. In Chapter 3, we show how the problem of having too few or too many matches is avoidable through appropriate fuzzy set design. Non-zero, continuously-varying, unimodal fuzzy sets discriminate continuously between cases and, thereby, equip a similarity measuring function with the properties of a formal metric space. ⁴²

2.2.3 Flexible similarity-measuring framework

Fuzzy set theory is a flexible framework for measuring similarity that can: 1) include or exclude the properties of reflexivity, symmetry, monotonicity and transitivity; and 2) subsume both relative and absolute measures of similarity. These are commonly desired properties of similarity measuring functions in CBR.

In a description of a lattice-valued approach ⁴³ to making similarity-measuring functions, Bridge (1998) identifies four properties that must be addressed in the design of similarity-measuring functions.

· reflexivity: x ~ x · "topmost" similarity ⁴⁴
· monotonicity: continuously increasing or decreasing
· symmetry: x ~ y = y ~ x
· transitivity: x ~ y and y ~ z will imply x ~ z

Designers may or may not want to impose the above conditions. Reflexivity is usually a desired property and monotonicity is often desired. Symmetry may or may not be desired, depending on the intent. For example, we would assert that
rain ~ snow = snow ~ rain, but precipitation ~ snow < snow ~ precipitation. ⁴⁵

Transitivity is usually not desired. For instance, knowing the values of
fog ~ rain and rain ~ snow does not necessarily determine the value of fog ~ snow
because the relationship between fog and snow is special and does not necessarily involve rain at all.

Bridge (1998) claims that the lattice-valued approach to measuring similarity is advantageous because it enables us to address all the four properties simultaneously. Similarly, we claim that fuzzy set theory is advantageous because, as a formalism, it accommodates variations of the four properties (Zimmerman 1991).

Bridge (1998) describes two basic types of similarity functions, absolute and relative. An absolute similarity function takes two representations and returns a Boolean result, either similar or not similar (f_Boolean(x, y) = {0, 1}), whereas a relative similarity function returns a number (f_Relative(x, y) = {0, 1} = R). As Bridge explains, the problem with absolutism is that it does not correspond with "people's intuitive concept of similarity, in which there is a notion of `degrees of similarity.'" And the problem with relativism is that

on occasion, the numbers used are arbitrary. This occurs when similarity function designers need something richer than absolute similarity, i.e., they need degrees of freedom, but they do not need to quantify, or cannot properly quantify, the degrees. Any number used in these circumstances will be contrived.

Fuzzy set theory gives CBR system designers and CBR knowledge acquirers a full range of similarity-measuring functionality. The sets enable similarity-measuring results to be qualified or quantified, depending on whether the results are fuzzified of defuzzified, or results can simply be ranked according to degree of similarity.

Bridge (1998) claims versatility as another advantage of the lattice-valued framework:

We claim that the advantages of our [lattice-valued] metric framework are that: it subsumes absolute and relative measures (i.e., these are instantiations of the framework); it introduces (again as instantiations) many other ways of measuring similarity [such as hedge words] (only a few of which other researchers have reported in the literature); and it allows the easy combination of similarity functions.

Fuzzy set theory also subsumes absolute and relative measures. Absolute measures are achieved by prescribing crisp sets, which are a subset of fuzzy sets. Relative measures are achieved by prescribing graded membership. One of the main recommendations of fuzzy set theory is its facility for operating directly with "hedge" words (e.g., Zadeh 1999, Zimmerman 1991, Cox 1992, and Maner and Joyce 1997).

Common types of variables used to describe features in case-based systems are continuous, ordinal, nominal, and interval-specific. Fuzzy set theory complements CBR by enabling us to represent features in another way, as Main et al. (1996) explain:

A large number of the features that characterize cases frequently consist of linguistic variables which are best represented using fuzzy feature vectors.

2.2.4 Numbers and words

Jeng and Liang (1995) propose a fuzzy logic based approach to retrieval and cite three main advantages of the approach. First, it allows numerical features to be converted into fuzzy terms to simplify the matching process. Most existing CBR methods assume qualitative features, such as "weak or powerful weapon," but provide few options for how to deal with numerical features. Second, it allows multiple indexing of a single case based on a single attribute with different degrees of membership. A single case may be applied in various contexts-thus, it supports the relational database model as opposed to the structured database model. And third, it allows greater flexibility in the retrieval of candidate cases. Queries can be modified qualitatively and linguistically to suit special circumstances. For instance, one may request cases describing items that are old or in a smaller subset of very old, depending on the circumstances.

Jeng and Liang (1995) explain how the a-cut applies to CBR. The a-cut describes the subset of cases that have membership above a prescribed threshold. For instance, similar cases may have membership values in the range [0.0...1.0]. If we specify a = 0.5, then the a-cut of similar cases will be only the cases with membership values, m, such that m ³ 0.5. As Jeng and Liang (1995) point out, "delicate tradeoffs are involved in choosing proper a-cuts." For instance, choosing a high level of a increases efficiency by allowing the system to rule out cases with low membership but this tactic may also limit the retrieval flexibility of the system. We will use the a-cut convention when we describe our fuzzy k-nn weather prediction system in subsection 3.2.

2.3 Foundation for fuzzy k-nearest neighbors technique

This section provides a foundation for the fuzzy k-nn technique. Subsection 2.3.1 reviews the more general k-nn technique. Subsection 2.3.2 reviews the fuzzy k-nn technique. Subsection 2.3.4 describes special properties of the fuzzy k-nn technique.

2.3.1 k-nearest neighbors technique

The foundation of the fuzzy k-nn technique is the k-nn technique. The definition of k-nearest neighbors is trivial: For a particular point in question, in a population of points, the k points in the population that are nearest to the point in question. Finding the k-nearest neighbors reliably and efficiently can be difficult. How "nearness" is best measured and how data is best organized are challenging, non-trivial problems.

The implicit assumption in using any k-nearest neighbors technique is that items with similar attributes tend to cluster together. Hence, if unknown attributes of an observed item are sought, then examination of its neighbors should suggest what those unknown attributes may be. Any k-nearest neighbors technique is effective only to the extent that the assumption of clustering behavior is correct. But clustering behavior varies. For example, wealthy suburb dwellers tend to have wealthy neighbors, whereas wealthy city dwellers tend to have mixed "classes" of neighbors. The k-nearest neighbors method is most frequently used to tentatively classify points when firm class bounds are not established.

The vital part of any nearest neighbors technique is an "appropriate" distance metric, or similarity-measuring function, as Dudani (1976) explains.

It is reasonable to assume that observations which are close together (according to some appropriate metric) will have the same classification. Furthermore, it is also reasonable to say that one might wish to weight the evidence of a neighbor close to an unclassified observation more heavily than the weight of another neighbor which is at a greater distance from the unclassified observation. Therefore, one would like to have a weighting function which varies with the distance between the sample and the considered neighbor in such a manner that the value decreases with increasing sample-to-neighbor distance.

Dudani (1976) proceeds to describe a "distance-weighted k-nearest neighbors rule" which can decide classification. Suppose we wish to label a new unknown point. We can decide its label according to the labels of its neighbors, either according to a distance-weighted scheme or according to a simple majority of its neighbors. In an experiment, Dudani (1976) found that "a lower probability of misclassification was obtained for the distance-weighted k-nearest neighbors rule than for a simple majority k-nearest neighbors rule."

The main problem to solve in developing any k-nn technique is to develop an appropriate metric. This problem occurs in numerous AI settings. Saying that a k-nn technique is "distance-based" leads to two questions:

· How is distance determined?
· How is uncertainty computed?

Kamp et al. (1998) explain how CBR, information retrieval systems, and database management systems (DBMS) all involve problem of finding nearest neighbors and of computing with uncertainty as follows.

Handling of incomplete and vague data is closely intertwined with case-based reasoning. However, also within database research there is a growing interest in handling of incomplete and vague data. [In all these settings] nearest-neighbor queries can be directly used to determine the most similar cases.

Kamp et al. (1998) consider multidimensional access methods and contrast three query types: exact match, range, and nearest-neighbor. They offer a general warning about spatial access methods:

It is often argued, or implicitly assumed that cases are points in d-dimensional space and the retrieval process is built upon this assumption. In our experience, this assumption is wrong. Most often, cases are incompletely described; the values of certain attributes are unknown, or only vague.

Kamp et al. (1998) suggest that the greatest opportunity for the development of CBR systems is scaling up systems and integrating them with existing large databases. Kamp et al. (1998) further suggest that

a topic of future research within intelligent retrieval is the integration of domain knowledge and background knowledge to enhance the semantic of the retrieval. This could be done by considering and integrating techniques from knowledge representation [and] in this area, further research includes finding guidelines for finding the right tradeoff between expressiveness and complexity for different application scenarios, the search for approximations, etc.

Kamp et al. (1998) wrote the summary chapter in a book that compiles recent work by CBR researchers in Germany. The need for better schemes for approximation and dealing with imprecision is mentioned throughout the book. The word fuzzy is strangely absent from this book. Computing with imprecision is what fuzzy logic excels at.

[fuzzy logic has a] tolerance for imprecision which can be exploited to achieve tractability, robustness, low solution cost, and better rapport with reality. (Zadeh 1999).

The fuzzy k-nearest neighbors technique described in the following section is a method that combines the advantageous distance-weighted quality recommended by Dudani (1976) with the need for better retrieval semantics (i.e., deal with vague attributes, incorporate domain knowledge, balance expressivity and complexity) recommended by Kamp et al. (1998).

2.3.2 Fuzzy k-nearest neighbors technique

A fuzzy k-nearest neighbors (fuzzy k-nn) technique is simply a nearest neighbors technique in which the basic measurement technique is fuzzy.

As detailed in the previous section, the term nearest neighbors technique refers to a technique that identifies the closest points to a given point in some multi-dimensional space. The motivation for using nearest neighbors techniques is usually to infer what one of the properties of an item is by examining other properties. The underlying assumption is that similar items cluster.

For a simple example, with a given a weather situation, suppose we want to determine what the precipitation type is, rain or snow. And suppose all we know is that the temperature is
-10ºC and that it is precipitating. Examination of all such weather situations would show that precipitation types were usually snow, rarely some other form of frozen or freezing precipitation, and never rain. So, on that basis, we could infer that the precipitation type for the given situation is probably snow. We could classify the precipitation type as "probably snow." Indeed, nearest neighbors techniques are usually used to enable some form of classification. In this thesis, we are more concerned with case-to-case comparison than with case-to-class comparison.

The fuzzy k-nn technique applied to continuous vectors, achieves automatic, expert-like similarity comparison of complex objects. Each comparable attribute in two comparable vectors is compared with an attribute-specific fuzzy set and the results are aggregated to describe the overall similarity of the two vectors. The fuzzy k-nn technique, applied to time series of continuous vectors, achieves effective analog prediction, as we demonstrated by developing a weather prediction system (Hansen and Riordan 1998). ⁴⁶

Keller et al. (1985) define and describe the fuzzy k-nearest neighbors algorithm in a foundational and theoretical paper. The key points of that article are summarized as follows.

Stored cases are labeled into distinct classes. Given a new case to classify, both crisp and fuzzy k-nn algorithms are made to find k-nn. The algorithms differ in two ways. First, the crisp algorithm uses a distance function, whereas the fuzzy algorithm uses fuzzy set based comparisons. Second, the crisp algorithm assigns the new case to the class that is represented by a majority of its k-nn, whereas the fuzzy algorithm assigns the new case varying degrees of membership to all the classes represented by the k-nn, according to the degree to which the new case matches each of the k-nn.

The fuzzy k-nn algorithm determines the degree of membership of any given continuous vector in any class of continuous vectors, as Keller et al. (1985) explain:

The advantage provided by fuzzy sets is that the degree of membership in a set can be specified, rather than just the binary is or isn't a member. This can be especially advantageous in pattern recognition.

Patterns in real-world data are often ambiguous and therefore difficult to classify into crisp sets.

The fuzzy k-nn algorithm described by Keller et al. (1985) is general. It can be used to classify any kinds of continuous vectors into arbitrary classes. In this thesis, continuous vectors describe time-varying weather information. Each new, unique, present case weather acts as a special class-the "class" which best describes present weather and which is centered on the present case. We want to find in the database k nearest neighbors for such a special class of interest.

Comparing the fuzzy k-nn algorithm to a "crisp" k-nn algorithm, Keller et al. (1985) point out a special advantages of the fuzzy k-nn algorithm:

An incorrectly classified sample will not have a membership in any class close to one while a correctly classified sample does posses a membership in the correct class close to one.

Keller et al. (1985) compared the performance of the fuzzy k-nn nearest neighbors with crisp k-nn algorithm (k-means clustering) and found two advantages in fuzzy k-nn method:

· Fuzzy k-nn classification is more accurate than crisp k-nn classification. ⁴⁷
· Fuzzy k-nn classification solutions include useful confidence measures based on to the resulting memberships.

Two contributions of fuzzy logic to case-based reasoning are that it can improve performance of retrieval in terms of accuracy and that it can increase the interpretability of results of retrieval. The accuracy improvement comes from the avoidance of unrealistic absolute classification. The interpretability increase comes from the overall degree of membership of a case in a class which provides a level of assurance to accompany the classification. For example (and to preview the way we use fuzzy logic in Chapters 3 and 4), if an expert user configures similarity-measuring fuzzy sets, and the thus-equipped similarity-measuring algorithm uses a simple maximum operation to aggregate results of numerous similarity tests, and the algorithm reports that a past case has an overall score of "very similar" to a present case, then the user can be assured that every attribute of the past case is at least "very similar" to the present case in accordance with that user's own expressed understanding of similar.

The algorithm for a classic fuzzy nearest neighbors prototype algorithm, proposed by Keller et al. (1985), is shown in Figure 8.

Let W = {Z₁, Z₂, ..., Z_c} be the set of c prototypes representing c classes.

BEGIN

Input x, vector to be classified.
Initialize i = 1.
DO UNTIL (distance from each prototype to x is computed)

Compute distance from Z_i to x.
Increment i.

END DO UNTIL
Initialize i =1.
DO UNTIL (x assigned membership in all classes)

Compute u_i(x) using (1)
Increment i.

END DO UNTIL

END

where

1 / ½x - Z_i½^2/(m-1)
_________________

u_i(x) =

_c

(1)

å(1 / ½x - Z_j½^2/(m-1))

^{j = 1}

where m determines how heavily the distance is weighted when calculating each neighbor's contribution to the membership value.

and where ½x - Z_i½ represents the membership of x in the class of Z_i

Figure 8. Fuzzy nearest prototype algorithm copied from (Keller et al. 1985).

An item may be defined by a collection of continuous attributes and is thus general. A class may be defined by either a specific case (a prototype), an idealized case, or a group of labeled items, and is thus general. The assumption in using prototypes is that prototypes are complete members of the class that they represent.

In addition to comparing fuzzy and crisp k-nn algorithms, Keller et al. (1985) compare fuzzy and crisp nearest prototype classifiers:

Actually, the only difference is that for the nearest prototype classifier, the labeled samples are a set of class prototypes, whereas in the nearest neighbors classifier we use a set of labeled samples that are not necessarily prototypical. Of course, the nearest prototype classifier could be extended to multiple prototypes representing each class, similar to the k-nearest neighbors routine.

To classify a new case using the nearest prototype classifier,

membership in each class is assigned based only on the distance from the prototype(s) of the class.

Keller et al. (1985) describe the benefits of the fuzzy prototype algorithm:

The fuzzy prototype classifier, while not producing error rates as low as the fuzzy nearest neighbors classifier, is computationally attractive and also produces membership assignments that are desirable.

The membership provides a useful level of confidence of the classification.

Our fuzzy k-nn algorithm, described in Chapter 3, may be viewed as a variation of the above algorithm, a variation that includes some steps to reduce computational cost, as explained in Section 3.3.1 (page 89).

2.3.3 Weather situations are not prototypical

Weather situations cannot be condensed and faithfully represented as distinct prototypes. There are no original models of weather situations upon which all other weather situations are patterned: each situation is unique (as explained in section 1.5.3, page 26).

Therefore, to achieve the benefits of the fuzzy nearest prototype algorithm for analog weather prediction, we will treat the present weather case as the only "prototype" against which all stored cases must be compared. Given a new, unique weather case, and a database of past weather cases, we wish to isolate k analogs for the present case among the past cases. The challenging problem is to determine the degree to which past cases are analogs. The subsequent problems of sorting, weighting, and predicting are relatively simple.

Two basic contrasting approaches are crisp k-nn and fuzzy k-nn. With the crisp k-nn one would in advance specify thresholds (ranges) for membership in the set of analogs. If past cases fall within the ranges they would be classified as analogs, and if past cases fall outside the ranges they would be classified as non-analogs. A problem with this approach is that it is unlikely that it will isolate k analogs-it may produce no matches, fewer than k matches, or more than k matches-because some weather cases are rare while others are common. This is the point at which arbitrary search parameter adjustments may need to be made by a user, and thus algorithm autonomy is lost. To isolate k analogs certain ranges would need to be widened or narrowed. This would require user intervention and perhaps successive attempts. Each such range-adjusting intervention is arbitrary, for instance, should one tighten the "temperature fit" or the "wind direction fit." Empty prescribed bins have been a problem in earlier comparable weather analog forecasting systems (Martin 1972).

The fuzzy k-nn algorithm is based on the ideas that analogs are similar cases and that similar is a fuzzy property. For example, if the present temperature is 10°C, then a temperature of 11°C would be considered very similar, 15°C somewhat similar, and 20°C hardly similar.

The fuzzy k-nn algorithm can run autonomously because no arbitrary classification assignments are made by the algorithm. Rather, the fuzzy k-nn algorithm describes all past cases as having varying degrees of membership in the set of analogs for the present case, sorts them, and the k cases with the highest degrees of membership are the k-nn. With a fuzzy distance metric, one pass through the set of past cases will always isolate k-nn. Fuzzy sets emulate arbitrator experts. For example, for purposes of evaluating similarity, a weather forecasting expert may consider a difference of 5 degrees Celsius in temperature as equivalent to a difference of 10 degrees in wind direction. For case-to-case comparison, triangular fuzzy sets are specified accordingly.

The degree to which the fuzzy k-nn are analogs for the present case make the search results interpretable. If the k-nn are analogs to a high degree, then the current case is common and one can associate high confidence in a prediction based on the k-nn. If the k-nn are analogs to a low degree, then the current case is rare and one can associate low confidence in a prediction based on the k-nn.

2.3.4 Properties of the fuzzy k-nn technique

In subsection 2.3.1, we explained how three aims of research into intelligent retrieval are: to integrate domain knowledge into the retrieval process, to deal sensibly with the uncertainty associated with approximately measured variables, and to find the right balance between expressivity and complexity (Kamp et al. 1998). In subsection 2.3.2, we explained how fuzzy set theory is a framework that combines realistic expressions of domain knowledge with uncertain variables, controls expressivity through the shape of fuzzy sets, and controls complexity through the aggregation of a collection of system-modelling fuzzy statements. In this subsection, we describe three special properties of the fuzzy k-nn technique that help it to perform intelligent retrieval.

First, the fuzzy k-nn technique recognizes family resemblance. Pattern recognition depends on being able to detect and describe similarity between objects. Dreyfus (1992) describes two approaches for describing similarity between objects, family resemblance and class membership, as follows.

Family resemblance differs from class membership in several important ways: classes can be defined in terms of traits even if they have no members, whereas family resemblances are recognized only in terms of real of or imaginary examples. Moreover, whereas class membership is all or nothing, family resemblance allows a spectrum ranging from the typical to the atypical.

The fuzzy k-nn technique does not define classes in terms of specific traits. It does not determine degrees of membership of a weather case in predefined fuzzy sets such as low or high temperature. Rather, it measures the degree to which any given case is similar to other cases in the database. In the sense of (Dreyfus 1992), it determines the degree of family resemblances of the case compared to the k most similar cases in the entire population of cases.

Second, the fuzzy k-nn technique avoids the cluster validity problem-because it is not a "fuzzy clustering technique." The purpose of clustering techniques is to identify structure, or to recognize patterns. ⁴⁸ Whereas the purpose of fuzzy k-nn is simply to identify similarity, not to delineate presumed clusters.

Zimmerman (1991) describes fuzzy clustering and we summarize his description as follows: One begins by assuming that the problem of feature extraction has been solved. Each of n items is characterized by p attributes and the task is to divide the items into c categories, 2 £ c < n, homogeneous subsets called "clusters." The number of clusters, c, is usually not known in advance.

Computation of fuzzy k-nn is a preliminary step in the computation of fuzzy clusters. A point with more near neighbors than any other point is, logically, the center of a cluster. Zimmerman (1991) describes how the centers and memberships are calculated by using an iterative scheme to minimize a summation of matrices. The details of this scheme are not relevant here, so we will forgo them. What is relevant here, about fuzzy clustering, is the question of cluster validity.

Yang (1993) reviewed 103 papers dealing with fuzzy clustering and concluded:

We have already reviewed numerous fuzzy clustering algorithms. But it is necessary to presume the number c of clusters for all these algorithms. In general, the number c should be unknown. Therefore the method to find optimal c is very important. This kind of problem is called cluster validity.

The fuzzy k-nn avoids the difficult cluster validity problem by forgoing the problem of identifying global patterns-it assumes that such patterns exist and forgoes the problem of describing such patterns-and, instead, for a specific case proceeds to identify the most relevant information, local patterns, contained in a case base by identifying k-nn for that case.

Third, the fuzzy k-nn technique avoids the "rule explosion problem"-because it is not a conventional fuzzy rule based system. Rule explosion is a major problem in designing fuzzy rule based systems. The number of rules in a system grows exponentially with the number of input and output variables (Kosko 1997). ⁴⁹ For instance, suppose we have a system where each input is represented by three fuzzy sets. If we have two inputs, then nine fuzzy rules are needed (3²=9). If we have three inputs then twenty-seven fuzzy rules are needed (3³=27). Hence, fuzzy systems do not scale up well for systems with many variables, or dimensions.

The fuzzy k-nn algorithm avoids the rule explosion problem because it does not attempt to abstract the complex behavior of a system into rules. Instead, each observed system state is, in effect, a rule and, in that sense, the ratio of rules to model points is contained at 1:1. A fuzzy rule and a fuzzy similarity-measuring function are contrasted in Figure 9.

a₁ Ú b₁® c₁₁
...
a_i Ú b_j ® c_ij

sim ((a₁, b₁), (a₂, b₂)) = m _a(a₁, a₂) Ú m _b(b₁, b₂)

(a) Fuzzy rule based system. Input variables are a and b, and the output variable is c. Input variable a is represented with i fuzzy sets and variable b is represented with j fuzzy sets. The number of required rules equals the product of i·j. As input variables are added, the number of rules grows exponentially.

(b) Fuzzy similarity-measuring function. (a₁, b₁) and (a₂, b₂) describe two states of a 2-D system. m_a and m_b are two similarity-measuring functions used to compare two attributes. The number of required similarity-measuring operations rules equals the number of variables of the system. As cases are accumulated, the number of operations grows linearly.

Figure 9. Rule explosion in a fuzzy rule base contrasts with rule containment in a fuzzy k-nn similarity-measuring function

A possible contribution of case-based reasoning to the field of fuzzy logic based applications would be to help to avoid the rule explosion problem. For the fuzzy k-nn technique, the number of necessary similarity-measuring functions grows linearly with the number of input dimensions, not exponentially.

2.4 Applications that use fuzzy k-nn techniques

This section surveys numerous applications that exemplify fuzzy k-nn techniques. What all these applications have in common is that, in one way or another, they all perform fuzzy logic based matching and exploit descriptions of salient attributes and combinations of attributes acquired from domain experts using fuzzy words. Fuzzy logic directly translates expert descriptions of salient features into a fuzzy set based similarity-measuring algorithms.

Applications surveyed include weather prediction, mergers and acquisitions, residential property valuation, cash flow forecasting, shoe fashion database retrieval, colour matching in plastics production, criminal profiling, identifying freshwater invertebrates, interpreting electronic nose data, and manufacturing failure analysis.

2.4.1 Weather prediction

We built a fuzzy k-nearest neighbors based weather prediction system (Hansen and Riordan 1998). The fuzzy k-nn method is used to acquire knowledge about what salient features of continuous-vector, unique temporal cases indicate significant similarity between cases. Such knowledge is encoded in a similarity-measuring function and thereby used to retrieve k nearest neighbors from a large database. Predictions for the present case are made from a weighted median of the outcomes of analogous past cases, the k-nn. Past cases are weighted according to their degree of similarity to the present case.

Numerically described attributes are fuzzified into memberships in specific fuzzy sets before being compared. The fuzzy k-nn system fits fuzzy sets to the attributes. Because we search for nearest neighbors for a particular case, the centers of triangular fuzzy sets are centered on the attribute values of that particular case.

2.4.2 Mergers and acquisitions

Bonissone and Ayub (1992) developed a system to predict the outcome of mergers and acquisitions. The system could give guidance about what companies might be worth acquiring. The case base has descriptions of companies where attributes describe various economic conditions. Descriptions of past cases refer to five temporal phases: initial conditions, pre-tender, tender-negotiation, outcome, and long-term results. Descriptions of new cases (i.e., prospective purchases) refer to only initial conditions. The system determines the degree to which new cases match past good purchases and bad purchases.

The motivation for applying CBR, as is typical, is to cope with a domain where the domain theory is weak and where knowledge acquisition is difficult.

The motivation for applying fuzzy methods to CBR is to deal with uncertainty in four phases of the CBR process: semantics of abstract features used to describe cases, evaluation of the similarity measures computed across these features, determination of relevancy and saliency of the similar cases, and the solution-adaptation phase.

The system is divided into two parts, one dealing with domain knowledge, the other dealing with case representation. Domain knowledge is organized into three hierarchies: objects, action, and goals.

Individual abstract features are described by applying domain rules to numerous surface features. This object-oriented strategy of bundling related attributes together reduces the dimensionality and thus the complexity of the problem. ⁵⁰

2.4.3 Residential property valuation

Bonissone and Cheetham (1997) developed a system to valuate residential properties. The case base has descriptions of houses in mortgage packages and descriptions of current market conditions. The case base describes hundreds of thousands of real estate transactions. Mortgage packages are investment tools that may contain up to 1000 mortgages. Real estate markets are volatile. A valuation system for mortgages helps an investor to keep better track of their real estate investment.

Previous efforts to automate valuation failed to capture the intrinsic imprecision in sale comparison. Imprecision surrounds terms in the problem: "find the most similar houses, located close to the subject property, sold not too long ago; and selecting a balanced subset of the most promising comparables to derive the final estimate."

Fuzzy sets to measure these properties were constructed by interviewing experts, asking them to describe their preferences for various properties, and making fuzzy sets accordingly. The system was designed to permit the similarity-measuring fuzzy sets to relax (i.e., widen) if the retrieved set was too small.

Our fuzzy k-nn approach, using triangular sets which taper off asymptotically, avoids the problem of retrieving too small a set. The efficacy of this approach, compared to using support-limited trapezoidal sets, is also implied by the conclusion of Liao et al. (1998): "the more the fuzziness in the case attributes, the more the power of the [fuzzy similarity] measure."

The fuzzy CBR system was tested against three other methods: a statistical formula, a fuzzy-neural net, and a human appraiser. The fuzzy CBR system was more accurate than the first two objective methods and a bit less accurate than a human appraiser.

Bonissone and Cheetham (1997) emphasize the efficacy of the system at producing confidence value assessments to accompany house value estimates. The distribution of the retrieved analogs itself implies how much confidence to place in an estimate.

Bonissone and Cheetham (1997) emphasize the transparency of the system workings. Every step used to determine a house value can be reviewed and readily understood by users. The decisions and weights correspond to intuitively understandable expressions that coincide with the specifications of the users, the interviewed experts. Thus fuzzy CBR achieves a unique form of explanation capability.

2.4.4 Cash flow forecasting

Weber-Lee et al. (1995) combine fuzzy logic and CBR in a cash flow prediction system. The most original aspect of their system, as they point out, is the use of a fuzzy Sugeno integral to calculate the similarity of two financial situations rather than the usual weighted mean approach. ⁵¹ This integral calculates the overall similarity according to the "max of the min" of the individual attribute to attribute similarities.

The "max-min" scheme is the simplest possible aggregation scheme for additive fuzzy systems, and it has the mathematical properties of associatively, reflexivity, symmetry, transitivity (Zimmerman 1991). There are more complicated and sophisticated aggregation schemes, for dealing with additive systems and fuzzy rule bases, ⁵² but Weber-Lee et al. (1995) use the simplifying assumption that overall similarity is as only strong as the weakest individual similarity and their encouraging results support this assumption. We regard weather the same way when we apply the fuzzy k-nn method.

Weber-Lee et al. (1995) make an appealing argument for using CBR to forecast cash flow, an argument which could easily be adapted to the problem for forecasting weather, as follows.

The task of forecasting cash flows when performed by a human being works adequately under similar and sequential contexts. The expert aggregates information [such] as a possible recession and becomes subjectively pessimistic.. After a while, the expert cannot remember if the pessimistic approach used, for instance, 5 years ago, actually turned out to be effective, and if so, how effective. The CBR approach overcomes this shortcoming.

One of the main attributes used by Weber-Lee et al. (1995) for CBR-based prediction is that of season itself. Weather patterns change continually as seasons change, and memories of weather situations fade over time. Weather analogs from similar dates in previous years are better analogs than analogs from more recent dates in previous months. Analog forecasting preserves relevant memories of analogous situations from previous years.

2.4.5 Shoe fashion database retrieval

In an application for the shoe fashion industry, Main et al. (1996) used fuzzy feature vectors to enable retrieval of patterns for shoes similar to actual or idealized shoes. The motive was to rapidly manufacture "new" styles of shoes, to imitate currently popular models, by recycling old similar styles. After testing the fuzzy features approach for case selection, they found that "the cases retrieved matched the current case the closest in at least 95% of the tests." The fuzzy vector based retrieval agreed with experts' judgements of what constituted a high, medium, or low dimension on various parts of shoes

2.4.6 Colour matching in plastics production

Cheetham and Graf (1997) built a system to perform color matching in plastic manufacturing. The goal was to determine an optimal combination of colorants to create a specific color of plastic and to do so at minimal cost. The case base consisted of past "recipes" and results. The new case consisted of a color sample to be matched. The system was used for two years at a number of General Electric Plastics sites and lead to significant cost savings.

Fuzzy logic was used to measure the level of satisfaction with several diverse factors, such as match under different lighting conditions, cost of colorants, and opacity of resultant plastic.

The most important concept in the system is that of a fuzzy preference function. Cases are compared and the differences are recorded with real numbers. An expert specifies thresholds that correspond to linguistic terms describing the quality of match. Fuzzy sets for attribute comparisons are constructed accordingly. Five such fuzzy sets enable the simultaneous comparison of five heterogeneous attributes.

Each of the [five properties] is based on different scales of units. By mapping each of these properties to a global scale through the use of fuzzy preferences and linguistic terms such as Excellent, Good, Fair and Poor, it becomes possible to compare one attribute with another.

This integrated approach to comparison of diverse attributes is similar to the fuzzy k-nn weather prediction system. For example, 10 degrees difference is near for wind direction, 5 degrees difference is near for temperature, and so on.

2.4.7 Criminal profiling

In an application to profile criminals, Lefley and Austin (1997) used fuzzy methods to describe criminals modus operendi (MO). An MO is a learned pattern of criminal behavior which a particular habitual criminal tends to follow and which detectives use to identify the particular criminal. Such behaviors are weakly indicative individually and strongly indicative collectively. Unsolved crimes may be solved by associating known offenders with the crimes and investigating those offenders more closely.

Their algorithm used a fuzzy distance measure of several attributes, forward-chaining logic, and calculated similarity according to a summation of individual attribute results.

In practice, the application was simple. Records of criminals and crimes were obtained and several crimes were presented as "unsolved" (i.e., the criminals were not identified). Students reviewed the records all the records and transferred what they read into a questionnaire, a form which the system could process directly. The values in the form were fuzzy descriptors of criminal attributes. In experiments, they found that most students transferred common written criminal records into the fuzzy forms consistently.

Their system performance was encouraging. "Unsolved" crimes were measured as most similar to solved crimes of the actual offender in 70% of the trials. Their conclusion emphasizes the ease, effectiveness and envisioned portability of the matching system.

The system exploits available expertise. The system questionnaires were designed based on important attributes suggested by criminologists.

2.4.8 Identifying freshwater invertebrates

Winder et al. (1997) identify freshwater invertebrates using an approach which is very similar to that used by Lefley and Austin (1997) to identify criminals. Their system identifies samples of invertebrates based on questions answered using a database of characteristics of species suggested by taxonomists.

Again, knowledge is acquired from experts about salient attributes with which to identify individuals, in this case species rather than criminals. Thereafter system construction and use is straightforward. Construction requires converting the knowledge of salient attributes into fuzzy aggregate matching operations. To use the system, the user fills in a questionnaire.

Winder et al. (1997) also note how their fuzzy CBR approach to measuring similarity aims to associate new cases with similar families. As we noted above (subsection 2.3.4, page 62), fuzzy k-nn is a useful tool for the problem of recognizing family resemblance between items, a challenge for AI described by Dreyfus (1992).

2.4.9 Interpreting electronic nose data

Singh (1999) explains how to reduce the ambiguity of classification of a new point by centering a fuzzy k-nn algorithm on the new point. Using the single fuzzy nearest-neighbor is most effective for two pattern recognition problems, a benchmark problem and a realistic problem. ⁵³ The algorithm labels a new case according to the label of its single nearest neighbor, as measured with fuzzy operations, rather than according to the majority of the labels of its k nearest neighbors (k-nn). The crisp k-nn approach they refer to, basically, draws a hypersphere around the point to be labeled, counts the various labels of points within the hypersphere, and assigns to the point to be labeled the label which describes the majority of the points in the hypersphere. There are three problems with this crisp k-nn approach: it results in confusion if there is no clear majority in the hypersphere, results are sensitive to the radius of the hypersphere, and it works poorly for nonlinear data where clusters of different types often overlap near the point to be labeled. The fuzzy single nearest neighbor approach is better, Singh (1999) claims, because it detects the "truly nearest neighbor." ⁵⁴

2.4.10 Electronics manufacturing diagnosis

Göös et al. (1999) describe a fuzzy monitoring and case-based diagnosis tool for an electronics manufacturing process. The tool provides online quality control during manufacturing of electronics parts. Components of parts are measured during manufacturing. Each part (case) is described by 19 attributes, most of these being measured attributes. Three values are associated with each attribute: a lower threshold for acceptance, and optimal value, and an upper threshold for acceptance. The quality of a part is determined according to the degree to which its attributes collectively are similar or dissimilar to the theoretically optimal part.

The case base consists of descriptions of past low quality parts along with descriptions of the cause and remedy of each such situation. When a new part is diagnosed as of low quality, a case base search is performed. A nearest neighbors algorithm with expert-specified weights is used to determine overall quality of the part. Continuous measurements map to linguistic (fuzzy) descriptions of quality.

Case base searches lead to one of two sorts of useful results. First, either a near neighbor is found and the cause of the defect is generally diagnosed and a remedy found. Or, second, a new sort of defect is identified, the user is alerted to this anomaly, and the user revises the case base accordingly.

Results and user feedback from initial fielding of the system are quite positive (Göös et al. 1999). The method is effective at finding manufacturing defects. Users readily accept the results, appreciate the opportunity to learn from the results, and appreciate the opportunity to, in effect, teach the system when anomalous new defects are detected and the case base needs to be revised.

Tobin et al. (1998) use a fuzzy k-nn classification algorithm to enable a system for semiconductor defect detection. Their algorithm follows closely from the fuzzy k-nn algorithm formulation of Keller et al. (1985). The training set of cases consists of records of defective semiconductors. Each record has 35 measured features (measured with optical instruments) and one of 14 types of classes. Classes describe the type of defect and the corrective measure, and are determined by experts. Their system showed a "similar efficacy to the human counterpart." It worked well in the assembly line environment, allowing for a automatic 100% inspection rate, thus the system was judged useful for quality control.

2.5 CBR and fuzzy logic based weather prediction systems

This following three subsections survey weather prediction systems that use CBR, fuzzy logic, and the combination of both. The section Additional References on Analog Forecasting in Meteorology, at the end of this thesis, lists meteorology articles that describe the challenges of implementing analog forecasting. We do not survey these articles here as they are too far from the thesis subjects of CBR and fuzzy logic.

2.5.1 CBR weather prediction systems

Jones and Roydhouse (1995) are among the few who have applied intelligent retrieval to archived meteorological data and referred to CBR. ⁵⁵ Their motive is the same as ours: to exploit large meteorological archives. Jones and Roydhouse (1995) use a structured database approach for retrieving similar, large-scale weather maps (maps that contain abstract structures, such as a high pressure area or a cold front), whereas we use a relational database approach for retrieving similar, local-scale weather reports (reports that contain related variables, such as cloud ceiling height 200 feet and visibility in fog 0.5 miles).

Jones and Roydhouse (1995) describe a prototype system called Metvuw which is intended to help forecasters predict the evolution of cold fronts over the Tasman Sea, between Australia and New Zealand. Forecasting fronts in this area is difficult because there are few weather observations in that area, thus humans are poorly informed and computer models are poorly initialized. They suggest that examination of the evolution of analogous previous weather situations may help forecasters predict for the present situation.

Forecasters provide Metvuw with high-level descriptions of current large-scale, atmospheric pressure patterns (high and low pressure centers and values) and Metvuw retrieves similar cases (i.e., relevant and analogous cases) from a database of 2500 stored cases. Jones and Roydhouse (1995) use a similarity-measuring algorithm which assesses various salient pressure features suggested by meteorologists. ⁵⁶

To minimize computational cost, Jones and Roydhouse (1995) use a two-stage screening process. The stage selects a few candidate cases using a set of rough tests. The second stage ranks the few selected cases using more a costly but accurate process of similarity assessment. Such a strategy is often referred to in CBR as "MAC/FAC," short for Many Are Collected, Few Are Chosen" (as described in subsection 2.1.1 page 42). We use such a strategy when we implement the fuzzy k-nn system (described in Chapter 3).

Although the Metvuw system showed some promise, its development apparently ceased five years ago. We can only speculate as to why, but perhaps one of the following qualities, which we will try to avoid, contributed to cessation of the system's development.

· Non-autonomy. "Metvuw Workbench assists the retrieval of relevant past cases, but leaves their adaptation and interpretation to the user." (Jones and Roydhouse 1995). Forecasters are already deluged with data to interpret as it is. As Conway (1989) points, systems must be convenient and fast for forecasters if they are to be useful. We envision fuzzy k-nn weather prediction as being able to operate in two modes: autonomous guidance-offering, and user-driven weather archive querying.
· Data-intensive. Jones and Roydhouse (1995) expected that a 10-year data set for Metvuw would require approximately one Gigabyte to store, whereas Hansen and Riordan (1998) found that a 36-year data set required 6 Megabytes to store. Metvuw requires roughly three orders of magnitude more data storage because it its cases consist of images describing a large part of the globe, whereas the cases in (Hansen and Riordan 1998) consist of condensed numerical weather observations for one point, an airport.

Bardossy et al. (1995) describe another algorithm that uses a fuzzy-rule base and which has basically the same purpose as that of (Jones and Roydhouse 1995), that is, classification of atmospheric circulation patterns. Their fuzzy rule based algorithm is better at pattern recognition than that of Metvuw. Bardossy et al. (1995) reliably determine the degrees of membership of each new pressure pattern in a set of prototypical templates of pressure patterns, whereas, Jones and Roydhouse (1995) represent all pressure patterns symbolically with trees and this causes misleading results (e.g., slight deepening of low pressure systems causes totally different tree representation to result). In terms of database strategy, Bardossy et al. (1995) use a more of a relational database strategy and Jones and Roydhouse (1995) use more of a structural database strategy.

2.5.2 Fuzzy weather prediction systems

Fuzzy logic has so far seldom been used to predict weather, even though such an application was proposed at least 19 years ago, ⁵⁷ fuzzy logic itself was first described 35 years ago (Zadeh 1965), and fuzzy logic has during recent few years become a mainstream technique in a variety of environmental domains. It represents linguistically-expressed domain knowledge and operates on diverse forms of continuous data-such types of knowledge and data are typical in environmental problems. Environmental domains where fuzzy logic presently operates effectively include agriculture, climatology, earthquakes, ecology, fisheries, geography, geology, hydrology, meteorology, mining, natural resources, oceanography, petroleum industry, risk analysis, and waste management (Hansen et al. 1999).

Maner and Joyce (1997) built a weather prediction system, called WXSYS. They obtained simple weather prediction rules (i.e., weather lore) from experts and weather almanacs, and implemented these rules in a system using a fuzzy logic rule base. For example, one rule they used is: "Weather will be generally clear when the wind shifts to a westerly direction. The greatest change occurs when the wind shifts from east through south to west."

According to Maner and Joyce (1997), there are three reasons why fuzzy logic seems ideally suited for weather forecasting:

· The phrases used in conventional forecasts are inherently and intentionally fuzzy.
· "Fuzzy logic is known to work in this domain."
· "The weather domain meets the general conditions under which a fuzzy solution is thought to be appropriate."

Fuzzy logic has been used to build expert systems to predict fog and to predict wind. Sujitjorn et al. (1994) and Murtha (1995) separately built systems to predict fog at an airport. Hadjimichael et al. (1996) and Kuciauskas et al. (1998) together built a fuzzy system, called MEDEX, for forecasting gale force winds in the Mediterranean. All of these systems are conceptually based on the classic fuzzy rule base approach to fuzzy systems. ⁵⁸ How they differ is in the particular fuzzy rules elicited from experts. For example, the MEDEX system uses rules of the form "if pressure gradient is very large...then...", and so on.

We built a fuzzy expert system for critiquing marine forecasts, called SIGMAR ⁵⁹ (Hansen 1997). Like the above fuzzy expert systems, expert-specified fuzzy sets are at its core. Unlike the above fuzzy expert systems, it does not process a series of fuzzy rules (e.g., if A and B then C). Instead it measures similarity using fuzzy sets: it measures the similarity between a current valid marine forecast and the actual marine observations directly by using fuzzy sets, rather than, as is usually done, indirectly by using categories (e.g., "Observation in category 1 and forecast in category 2.").

Marine forecasters use observations to determine how the accuracy of a forecast is faring. If unforecast significant conditions develop, then forecasts must be amended as soon as possible. The task of monitoring weather observations and assessing their significance in terms of the current forecast is called "keeping a weather watch." For weather forecasting operations, the task is necessary and important. For weather forecasters, the task of continuously monitoring a never-ending list of ever-changing numbers is challenging during periods of rapidly changing weather and boring during periods of slowly changing weather.

SIGMAR continuously critiques marine forecasts: it automatically monitors a stream of real-time of observations, assesses where and to what degree a forecast is accurate or inaccurate, and reports to forecasters. SIGMAR helps marine forecasters to quickly identify any wind reports that contradict the marine forecast. This helps forecasters to respond quickly in situations where marine forecasts need to be amended.

Actually, fuzzy expert systems and CBR systems for weather prediction overlap. Tag et al. (1996), following the example of Bardossy et al. (1995), used fuzzy logic to automate the recognition of patterns of upper air wind flow. This pattern information was used, in a CBR-like way, as predictive input in a fuzzy expert system (MEDEX, described in the previous subsection).

Clustering techniques can be useful for CBR. Fuzzy clustering has been used to emulate human expert classification of climate (McBratney and Moore 1985) and climatological circulation patterns (Bardossy et al. 1995).

To the best of our knowledge, our current line of work is the only work which combines the three topics of fuzzy logic, CBR and weather prediction in a single system (Hansen and Riordan 1998). Given a present incomplete weather case to predict for, we used a fuzzy k-nn algorithm to find similar past weather cases in a huge weather archive to make predictions from.

Granted, the individual three methods are, by themselves, basic: using fuzzy sets to measure similarity is a basic application of fuzzy set theory, k-nearest neighbors is a basic CBR method, and analog forecasting is a primitive weather prediction technique. But when these three methods were combined into one system, with an expert's knowledge of what features are salient and how, with the knowledge encoded as fuzzy sets, and the system was provided with a huge archive of weather observations, the results were encouraging. The system's prediction accuracy was measured with standard meteorological statistics and compared to a benchmark prediction technique, persistence. In realistic simulations, the system was significantly more accurate. This following three chapters build upon the work begun in (Hansen and Riordan 1998).

36 The review by Hashemi (1999) describes eleven CBR software tools: Case-1 by Astea International, ART*Enterprise and CBR-2 (CBR Express, CasePoint, Generator, and Tester) by Inference Corporation, CasePower by Inductive Systems Inc., Eclipse by Haley Enterprises, ESTEEM by Esteem Software Inc., KATE by Acknosoft, ReCall by Isoft, ReMind by Cognitive Systems, S3-Case and CBR Works-4 by TechInno.

37 A production system consists of a collection of productions (rules), a working memory of facts and an algorithm known as forward chaining for producing new facts from old. A rule becomes eligible to "fire" when its conditions match some set of elements currently in working memory. A conflict resolution strategy determines which of several eligible rules (the conflict set) fires next. A condition is a list of symbols which represent constants, which must be matched exactly; variables which bind to the thing they match and "<> symbol" which matches a field not equal to symbol. (Definition from the Free Online Dictionary of Computing, http://wombat.doc.ic.ac.uk/foldoc, downloaded Oct. 27, 1999).

38 "At the risk of being ostracized by the CBR community," Laight posted the following list of CBR downsides to the AI-CBR mailing list in response to a request for downsides, on July 20, 1999. The list, informally posted by Graham Laight to the AI-CBR E-mail list, is the nearest thing to a consensus that we have found so far-none of the many CBR-savvy recipients of Laight's e-mail challenged this list. ("There are currently 660 registered members (as of May 1999) [of the AI -CBR mailing list]," according to Ian Watson, the maintainer of the list (membership number downloaded from http://www.ai-cbr.org/theindex.html on Sept. 25, 1999.)

CBR downsides

High cost of CBR tools.

Absence of a "standard" CBR tool, in the sense that MS Word is word processing standard.

Cost and difficulty of implementing a CBR system.

Difficulty of changing, or adapting, a system once it is implemented.

You may find it just as easy (and successful) to build your application with a normal database package (or even a spreadsheet!), unless your requirements make a good fit with a CBR package. You may even find a document search type application meets your needs.

Cost of maintaining the case base, especially if the scope of the case base changes in any way.

Information in the case base may become outdated.

Each case may have to be carefully prepared by an expert.

Depending on the application, it may be possible to get the information [relevant for problem the application is meant for] in other ways.

39 Baldwin and Martin (1995) regard fuzzy rules which condense data and "fuzzy prototypical cases" as synonymous. An example of a fuzzy rules is "if A and B then C," where A, B, and C are fuzzy sets.

40 The snowboard analogy in Figure 5 illustrates how many-to-one mapping switches to many-to-many mapping over time.

41 Simple three-body systems can have complex orbits, depending on the initial conditions (Lorenz 1993).

42 The critical part of the fuzzy k-nn technique, which we describe in Chapter 3, is a similarity measuring function called sim. The complement of similarity is dissimilarity (1.0 - sim) and this dissimilarity operates like a distance function in a formal metric space. Kasriel (1971) defines a distance function and metric space as follows:

Let d be a nonnegative real-valued function defined on X × X that satisfies the following: For all x, y and z in X,

d(x, y) = 0 if and only if x = y
d(x, y) = d(y, x)
d(x, y) + d(y, z) ³ d(x, z) (triangle inequality).

The function d is called a distance function or a metric for X and (X, d) is called a metric space.

When one uses trapezoidal fuzzy sets, property (a) may be violated.

43 The lattice is a graphical way of designing a similarity measure in which the edges of the graph correspond to similarity-measuring operations. Bridge (1998) defines a lattice as a "partially ordered set of values that satisfies certain properties." The set members are similarity-describing variables which may be in different forms, such as Booleans, numbers, or "hedge words" (e.g., {very, quite, fairly, barely}).

44 The symbol "~" represents the similarity-measuring function, as in (Bridge, 1998). We understand x ~ y to mean "the similarity of x to y." By "topmost," Bridge means the highest possible value of similarity. In a normalized fuzzy set, this equates to 1.0.

45 If asked, "Is precipitation snow?" we would say maybe. If asked, "Is snow precipitation?" we would say yes.

46 To compare vector time series, we used matrix representations: rows for attributes, columns for time steps. (Hansen and Riordan 1998)

47 The fuzzy k-nn algorithm achieves improved accuracy through the avoidance of unrealistic absolute classification during algorithm execution. Keller et al. (1985) claim, "The advantage is that no arbitrary assignments are made by the algorithm." By "not arbitrary," they mean: "The advantage provided by the fuzzy sets is that the degree of membership in a set can be specified, rather than just the binary is or isn't a member."

48 Bezdek and Pal (1992) offer a large collection of papers which describe how to use fuzzy models for pattern recognition.

49 In addition to an exponential increase in number of rules as the numbers of input and output increase, there is a linear increase in number of rules as the expressed precision of input and output dimensions increases. For example, an input may be represented by two fuzzy sets (low and high) or more precisely by thee fuzzy sets (low, medium, and high).

50 We used such an object-oriented strategy in (Hansen and Riordan 1998) and use it again in this thesis. For example, wind is an abstract attribute of a weather case that is composed of surface attributes such as wind speed, wind direction, and "wind run."

51 Note that Weber-Lee et al. (1995) refer to the weighted mean approach as it is used for individual similar case recognition, not for subsequent solution composition. In our system, we will use a sort of weighted mean to compose solutions base on an analog ensemble of weather cases.

52 The Standard Additive Model (SAM) or Center of Gravity (COG) are commonly used fuzzy aggregation schemes (Kosko 1997), schemes used to fuzzify and defuzzify system throughput. COG and SAM are commonly used to compute the output of fuzzy rule bases consisting of 1000's of rules. Figure 9 on page 64 compares the SAM approach and the "max-of-the-min" approach.

53 The two pattern recognition problems attempted by Singh (1999) are detection of spirals in a commonly attempted benchmark problem, and the realistic problem of identification of different blends of coffee according to data collected from an "electronic nose." Singh (1999) does not refer to (Keller et al. 1985) but the basic approach of the fuzzy k-nn algorithm is the same. Singh (1999) does not deal with temporal aspects of the spiral data.

54 Singh (1999) argues, somewhat rhetorically, and with an idealized diagram, for the superiority of a single fuzzy nearest neighbor approach over a non-fuzzy k-nn approach, but tests the single fuzzy nearest neighbor approach against a neural network. The results are encouraging, but still leave us to wonder, if considering only nearest neighbor approaches, what the effects are of reducing k-to as low as k=1? (Singh (1999) intuitively selects k=1.) and what are the effects using of fuzzy measures versus crisp measures. We will examine the effects of varying k and of varying fuzziness in our experiments (Section 4.2, page 102).

55 Jones and Roydhouse (1995) cite two of their own previous related work. We have found no subsequent papers by these authors describing related work.

56 Salient pressure features suggested by meteorologists include: locations of pressure mimima, aligned overlap of minima-bounding rectangles, aspect ratio, density, area, and intensity (Jones and Roydhouse 1995).

57 Silvert (1981) proposed the use of fuzzy logic to formulate and evaluate predictions and proposed its use for verification of weather predictions. [The same "novel" idea occurred to me 15 years later in 1996.] He identified four types of prediction: "sharp (`this horse will win the race'), fuzzy (`this horse will do well'), conditional (`this horse will do well if it doesn't rain'), and probabilistic (`odds on this horse are...')".

Note that the fuzzy type of prediction can, through design, emulate the other types of prediction. If "do well" equates to "winning," then the prediction is sharp. If additional conditions are added, then the prediction is conditional. If "do well" equates to "odds of winning over 90%," then the prediction is probabilistic.

We acknowledge the simmering (cooling?) debate between proponents of probability and proponents of fuzzy logic over the distinct merits of each methodology. However, we point out that probability and fuzzy logic are two distinct means to the same end-the formation of expectation-and thus complementary. Probability forms expectation based on frequencies of past events. Fuzzy logic forms expectation on based on vague rules; these rules can take into account past events, as we do in this thesis, and thus, fuzzy logic can be made to emulate probability. In any case, current indications are that both methodologies will persist, coexist, and provide complementary approaches for problem solving (Bezdek 1994).

58 The fuzzy rule base approach to expert systems is well explained by Zimmerman (1991). Kosko (1997) refers to the rule base as a "fuzzy associative memory" and describes the process of rule resolution as firing all rules partially and in parallel and take a balanced average. Viot (1993) describes a fuzzy rule based system balance an inverted pendulum (a benchmark problem for fuzzy systems) and convincingly demonstrates how simple the system is by providing compilable C code for the system on one page.

59 SIGMAR is short for Significant Information Generated from Marine Area Reports.

	1 / ½x - Z_i½^2/(m-1) _________________
u_i(x) =	_c	(1)
	å(1 / ½x - Z_j½^2/(m-1))
	^{j = 1}

a₁ Ú b₁® c₁₁ ... a_i Ú b_j ® c_ij	sim ((a₁, b₁), (a₂, b₂)) = m _a(a₁, a₂) Ú m _b(b₁, b₂)
(a) Fuzzy rule based system. Input variables are a and b, and the output variable is c. Input variable a is represented with i fuzzy sets and variable b is represented with j fuzzy sets. The number of required rules equals the product of i·j. As input variables are added, the number of rules grows exponentially.	(b) Fuzzy similarity-measuring function. (a₁, b₁) and (a₂, b₂) describe two states of a 2-D system. m_a and m_b are two similarity-measuring functions used to compare two attributes. The number of required similarity-measuring operations rules equals the number of variables of the system. As cases are accumulated, the number of operations grows linearly.