Fuzzy Case-based prediction of ceiling and visibility

Hansen, B. and Riordan, D., 1998: Fuzzy case-based prediction of ceiling and visibility, First Conference on Artificial Intelligence, American Meteorological Society, 118-123.
If you have questions or criticisms regarding this paper, please send us an e-mail.
For a pdf version of this paper, click here.
To skip to the results, click here.

FUZZY CASE-BASED PREDICTION OF CEILING AND VISIBILITY

Bjarne Hansen
Maritimes Weather Centre, Environment Canada
Queen Square, 19th Floor
45 Alderney Drive
Dartmouth, Nova Scotia
Canada B2Y 2N6
E-mail: bjarne@cs.dal.ca

Denis Riordan
Faculty of Computer Science, Dalhousie University,
P.O. Box 1000, Halifax, N.S.,
Canada B3J 2X4
E-mail: riordon@cs.dal.ca

1. ABSTRACT

Operational meteorology involves the application of accumulated human expertise. When developing intelligent systems for operational meteorologists, three aspects of forecasting should be considered, namely: forecasting relies on a variety of techniques, experts convey their knowledge with fuzzy terms, and data are often imprecise. No single artificial intelligence (AI) technique is best for all meteorological applications. When AI methods are hybridized in a system, the system can inherit the strengths of its constituent AI methods. In concept, the process of forecasting can be best described and copied by using an assortment of AI techniques, or by adopting a hybrid-AI approach. Accordingly, a new type of weather forecasting system has been developed, one that combines fuzzy logic (FL) with case-based reasoning (CBR). FL gives us a way to incorporate linguistically expressed relationships into systems. CBR lets us efficiently identify one hundred most analogous and applicable cases out of a library containing over 300,000 cases. A hybrid FL-CBR intelligent system, developed to help predict ceiling and visibility at airports, is described in this paper.

2. INTRODUCTION

Meteorologists use a variety of methods to forecast. The methods are often described with imprecise words; this frustrates designers of intelligent systems who want to automate forecasting techniques. Consider the problem of predicting ceiling and visibility at an airport. A forecaster reasons effectively with rules-of-thumb, based on experience, such as: "If ceiling and visibility are low and wind veers to westerly then conditions will improve." The words low, westerly, and improve are imprecise.

Zadeh (1996) asserts that fuzzy logic lets people "compute with words" (CW). He says the approach is "a necessity when the available information is too imprecise to justify the use of numbers, and second, when there is a tolerance for imprecision which can be exploited to achieve tractability, robustness, low solution cost, and better rapport with reality." Developers of intelligent systems refer to FL as "elicitation logic" because of its facility for capturing the essence of an expert's advice.

Meteorological data can be interpreted effectively with FL-based systems. Experts classify daily atmospheric circulation patterns (CP) by examining isobaric analyses and marking areas of high and low pressure. Bardossy, et al, (1995) have developed a system that classifies CP using fuzzy sets designed to represent high and low. They explain how a fuzzy rule-based approach allows them to "produce a semi-automated classification that combines the expert knowledge of the meteorologist and the speed and objectivity of the computer." The system-generated CP was evaluated to determine whether it could be used for precipitation modelling and it was found that "the information content of the fuzzy classification as measured by precipitation-related indices is similar to that of existing subjective classifications." Fuzzy logic can also be used to interpret real-time weather data more effectively (Hansen, 1997). Main et al (1996) explain how fuzzy logic applies to case-based reasoning. They say, "One of the main tasks involved in the design of case-based systems is determining the features that make up a case and finding a way to index these cases in a case-base for efficient and correct retrieval." They list common types of variables used to describe features in case-based systems: Boolean, continuous, and multi-valued (ordinal, nominal, and interval-specific). They further explain how fuzzy variables allow one to represent features in another way: "A large number of the features that characterize cases frequently consist of linguistic variables which are best represented using fuzzy feature vectors." After testing fuzzy features in case selection, they found that "the cases retrieved matched the current case the closest in at least 95% of the tests."

Aviation forecasting skill increases as one accumulates detailed knowledge of local climatology. Consider how forecasting expertise develops for a particular site, say, along a west-facing coast. At the novice stage, one reasons from basic physics and applies rules of thumb. One might forecast improving conditions due to a wind shift to westerly. However, the climatology of this site suggests that such a wind shift tends to precede deteriorating conditions. Forecasting skill increases as one assimilates this site-specific information on how different weather situations tend to play out at the site. This information is contained in the climate archive.

When experienced forecasters are faced with an unusual problem, they may be reminded of similar past cases and base a forecast on how those cases turned out - this is reasoning by analogy or analog forecasting. The essence of a case-based system is the archive of prototypical cases. The archive for Halifax Intl. Airport consists of 315,576 hourly observations. Analog forecasting relies on the assumption that huge weather archives contain only a small number of nearly analogous, prototypical cases. These cases contain the information that is most pertinent to the current case. Environment Canada (EC) uses an analog forecasting method to prepare public forecasts for days 3, 4, and 5 (Soucy, 1991). Development of the method has been restrained due to three problems: the difficulty of defining weather case analogs, the lack of a systematic way to identify an analog, and the computational cost of searching enormous databases. This paper describes how some of these difficulties can be overcome with a hybrid- AI system.

3. METHOD

Basic meteorological concepts are defined in sections 3.1 and 3.2. The application of FL and CBR is described in sections 3.3 - 3.6.

3.1 Definitions

cig - value of cloud ceiling in meters

vis - value of horizontal visibility in kilometers

alternate and VFR - two flying categories (section 4.1)

ob - one hourly observation from an airport

time of day - proximity of an hour to that day's time of sunrise and sunset (relates to diurnal effects)

time of year - Julian day (relates to seasonal effects)

wind run - vector sum of wind reports during recent hours (relates to origin of air); expressed with distance and direction

time-zero - denotes hour that marks the beginning of a 12-hour forecast period

case - 25 consecutive obs (span 24 hours); middle ob referred to as time-zero; a case is described according to the properties listed in table 1

future obs - 12 hourly obs after time-zero

present case - a case composed of an airport's previous 12 obs, present ob (ob for time-zero), and next 12 future obs

past case - a case drawn from an airport's archive

3.2 Case Library

The past case archive consists of all regular hourly weather observations from Halifax International Airport, Nova Scotia from 1961 through 1996. Special observations that occurred between the regular hourly obs have not been included.

Cases are indexed according to the unique hour of their time-zero ob. The objective is to identify analogous past cases for any given case. The similarity between past cases and the present case is determined by using fuzzy sets.

3.3 Comparing Data with Fuzzy Sets

The similarity of two elements can be expressed with fuzzy membership functions. For instance, suppose one wants to determine the degree to which a wind belongs in the fuzzy set of moderate winds. EC defines moderate winds as winds in the range of 15 to 20 knots. A fuzzy set can be constructed across this range (Figure 1a). The degree to which a wind speed is thought of as moderate is interpreted as follows:

µ(18) = 1.0 implies 18 knots is definitely moderate

µ(22) = 0.8 implies 22 knots is nearly moderate

µ(32) = 0.0 implies 32 knots is not moderate

Fuzzy sets can be designed to be more discriminating: the "narrower" a set is, the more discriminating it is. A fuzzy set for determining whether a wind direction is near 180° can be designed to admit wind directions that are particularly close to 180° and to exclude winds that are far from 180° (Figure 1b). In the analog system, fuzzy sets have been designed to enable the inter-comparison of any two weather cases according to the properties of cases and obs (Table 1). The procedure for developing cig and vis predictions based on analogous cases is described schematically below (Figure 2).

3.4 Retrieving and Adapting Analogous Cases.

The FL-based comparison process for individual weather elements was extended to the comparison of entire weather observations and to the comparison of sets of weather observations. Fuzzy sets were designed to selectively filter the archive for past cases most similar to the present case (Figure 2a). Cases consisted of three type of obs: past, present and future (Figure 2b). Analogous cases were refined from the archive and adapted into forecasts (Figure 2c).

(1) Adapt fuzzy filter. Fuzzy sets can be supported anywhere along the x-axis. Fuzzy sets are designed to maximize near the present case. In the design, words can be represented with fuzzy sets. The degree to which any other case is perceived as similar to the present case is determined by calculating the degree to which its elements are members of the fuzzy sets of the present case.

(2) Ob-to-ob evaluation of time-zero fit. Select potential cases by comparing case's time-zero observations. A precondition for any past case to be analogous is that its time-zero ob should be similar to the present time-zero ob. Only 1-5% of the hourly observations in the archive are quite similar to the present case's time-zero ob. Similarity is determined according to cloud amount and height, visibility, wind direction and speed, time of day, and time of year. Dissimilar obs are marked as having no potential to index an analog and are excluded from case-to-case evaluation.

(3) Case-to-case evaluation of fit. All corresponding obs in two cases are tested for similarity. Strongest similarity is sought between obs that are most proximate to time-zero. A case-to-case comparison consists of one hundred and twenty-three element-to-element comparisons (screening out obs in step 2 results in much lower computing costs). The similarity of a past case is derived from the overall similarity of its corresponding obs. After all ob-to-ob comparisons are made, the overall similarity is tempered according to the similarity of the past case's time of day, time of year, and wind run characteristics. After potential cases are evaluated in this way, their similarity to the present case is represented by a fuzzy membership function µ.

(4) Select the top one hundred cases. Cases are sorted according to values of µ.

(5) Predict ceiling and visibility. The 30th percentile values of the analogous past cases "future obs" are used. The analogs are collected and their cig and vis values are sorted in hourly sets, one set for each hour after time-zero. The distribution of cig and vis within each set can be examined and values of cig and vis at any percentile level can be determined. After testing the system with a range of percentile values, cig and vis predictions were made based on the values of cig and vis at the 30th percentile in the analogous cases.

(2c) Flowchart of analog forecasting method.

Figure 2. Predictions based on analogous cases.

3.5 Simulations

Simulations were conducted with a 36-year long hourly archive of observations. The simulations were designed to be realistic. At the start of every simulation, a 24-hour period of obs was randomly chosen from the archive to represent a present case. During the rest of the simulation that case was completely removed from the archive so as to prevent it from matching with itself.

Forecasts took the form of twelve specific values of both cig and vis for each hour after the present time- zero. Forecast values of cig and vis tended to be smoother than individual cases' series of cig and vis; this smoothness is a result of basing forecast values on the 30th percentile of one hundred actual values. Forecasts were simulated at the rate of about one per minute.

3.6 Value of Hints

In case-to-case evaluation, elements in the ob pairs leading up to and including the present time-zero ob can be directly compared (they represent known values). During early experiments, past cases were selected without regard to elements after the present time-zero. Prediction accuracy was low compared to persistence. When missed forecasts were examined, it was found that when trends in weather elements changed sharply after time-zero, the ceiling and visibility forecasts often went awry. In forecasting terms: when wind shifts occur, ceiling and visibility changes. In the forecasting thought process for Halifax, strong consideration is given to near-term wind (Halifax Intl. is situated 30 km. north from the Atlantic coast near the top of gently sloping terrain).

It was assumed that a forecasting system would have very limited effectiveness if it were to completely disregard near-term wind. To test this assumption, the case-to-case evaluation of fit was modified. Past and present cases were allowed to match according to the characteristics of the winds after their respective time-zeros. In an operational setting, Numerical Weather Prediction (NWP) offers forecasters fuzzy hints about upcoming events. NWP could be adapted to provide hints about near-term wind for case selection.

Various forms of hints could be adapted to guide the case selection. Meteorologists base their predictions on clues such as: "radiation fog occurred last night", "airmass is unchanged for 24 hours", "NWP suggests.", and so on. These clues could be interpreted as fuzzy properties of a case and evaluated in the same way as the other properties describing a case (listed in Table 1).

4. RESULTS

The results are based on 6000 simulations.

4.1 Performance Measures

Ceiling and visibility together imply a flying category. Prediction skill is determined according to the accuracy of predicted flying categories. Three flying categories are defined (in metric) as follows:

Three forecast-actual combinations were recorded: hits, false alarms, and misses. A hit occurs when an event is forecast and it happens. A false alarm occurs when an event is forecast and it does not happen. A miss occurs when an event is not forecast and it does happen. From the frequencies of these events, three values are calculated: Probability of Detection (POD), Frequency of Hits (FOH), and False Alarm Ratio (FAR). The term "reliability" is synonymous with FOH. Values are calculated as follows:

POD = hits / ( hits + misses)

FAR = false alarms / (hits + false alarms)

FOH = hits / ( hits + false alarms)

Persistence forecasts were prepared during the analog simulations so that the effectiveness of the analog system could be evaluated. To forecast persistence, one takes the known values of ceiling and visibility at time-zero and predicts that they will remain identical for the next twelve hours. The analog system was made to forecast persistence for the first hour and to switch to the case-based method after one hour. Persistence forecasting can be quite effective in the short-term. Dalleville et al (1995) compared the skill of persistence-based forecasts with the skill of forecasts produced locally by the National Weather Service and found that "persistence forecasts appeared to have higher skill than the local forecasts for the 3-hour projection." When skill was considered for the six-hour projection, neither method was clearly superior; persistence had a higher Critical Success Index, but locally produced forecasts had a higher Heidke skill score.

4.2 Interpretation of Results

The accuracy of simulated forecasts was verified in two ways. Firstly, the average skill scores for the first six hours of forecasts were calculated for each month (Table 2).

Secondly, the hourly scores for the first twelve hours were plotted (Figure 3).

The experimental results suggest that the proposed analog forecasting method has the potential to be more effective than persistence for short-term prediction of ceiling and visibility. Simulated analog forecasts were more reliable and had fewer false alarms than persistence-based forecasts for the first six hours of the forecast period. The analog method also generated more durable forecasts: the FAR of "below alternate" remained below 50% for twelve hours with the analog method, while the FAR of the persistence-based forecasts rose above 50% after eight hours.

5. SUMMARY

The data-base of hourly climatological observations contains an immense amount of information that may be used to improve forecasting. Our tool provides a prototype mechanism for rapidly searching this database for analogous cases that can be used for the prediction of ceiling and visibility.

The effectiveness of a prototype fuzzy case-based prediction system was evaluated through a comparison with persistence-based forecasts after a series of simulations. Forecasts were examined for skill in the six-hour projection period. Two criteria were considered: the reliability of forecasts of alternate flying conditions, and the false alarm rate of forecasts of below alternate conditions. The fuzzy case-based method produced forecasts that were 2% more reliable and which had 4% fewer false alarms than persistence-based forecasts.

The work described here is part of an ongoing study. The authors are combining knowledge from their respective fields, artificial intelligence and meteorology. The immediate aim of the study is to adapt current artificial intelligence techniques for meteorology. The ultimate aim is to develop more intelligent systems for weather prediction.

6. ACKNOWLEDGEMENTS

We wish to thank our colleagues in Environment Canada for their continued support of our research. In particular, we thank Allan MacAfee for his contribution of ideas and for his critical review of this paper.

7. REFERENCES

Bardossy, A., Duckstein, L., and Bogardi, I. - "Fuzzy rule-based classification of atmospheric circulation patterns", International Journal of Climatology, Vol. 15, 1995, pp. 1087 - 1097

Dalleville, J.P., and Dagostaro, V.J., - "The accuracy of ceiling and visibility forecasts produced by the National Weather Service", Preprints, Sixth Conference on Aviation Weather Systems, Dallas, Texas, American Meteorological Society, 1995, pp. 213 - 218

Hansen, B. - "SIGMAR: A fuzzy expert system for critiquing marine forecasts", AI Applications, Vol. 11, No. 1, 1997, pp. 59 - 68

Main, J., Dillon, T.S., and Khosla, R. - "Use of fuzzy feature vectors and neural networks for case retrieval in case based systems", NAFIPS 1996 Biennial Conference of the North American Fuzzy Information Processing Society, IEEE, New York, NY, 1996, pp. 438 - 443

Soucy, D. - "Revised users guide to days 3-4-5 automated forecast composition program", CMC Technical Document, No. 37, Canadian Meteorological Centre, Environment Canada, Montreal, 1991

Zadeh, L.A. - "Fuzzy Logic = Computing with Words", IEEE Transactions on Fuzzy Systems, Vol. 4, No. 2, May 1996, pp. 103 - 111