Home

Looking for some tried and true Jaxon recipes? You’re in the right place.

On this page you will find recipes for:

Specification Design: Reframing

Situation

The prediction target is a continuous numerical value such as star rating or price movement (a regression problem). In order to use Jaxon, and to bucket outcomes, it is necessary to reframe the prediction as a classification problem.


Remedy

Model the outcomes as a discrete probability distribution. In the spirit of calculus, a continuous data stream (i.e. the possible numerical outcomes) can be placed into arbitrarily granular buckets. Consider reframing a 1-10 star review into 3 classes:

90L.png

90R.png

This allows a choice of granularity (how wide is each class “bucket”), as well as informed decisions about where to focus the model. For ranges that require granular discernment, narrower buckets can be used in combination with wider buckets for less interesting ranges; alternatively, buckets can be sized by area (there’s that calculus again!) in order to arrive at evenly-sized classes. Long tails and rare extreme events can be captured by bounding the first and last bucket on only one side (e.g. x > 12 in the above illustration).


Tradeoffs

Pros:

Cons:


Troubleshooting: what if the bucketing strategy is poor?

A sharp peak in the adjusted distribution:

91.png

A flat curve in the distribution:

92.png


Reframing classification to classification

Reframing can be applied to a priori classification problems as well. Consider navigating a class hierarchy and deciding at which granularity to frame a problem. For example, if classifying recipes into cuisines, is a lower-granularity class of Asian cuisine sufficient for the problem, or is it a better idea to instead use several higher-granularity classes such as Chinese, Japanese, Korean, and Thai? This process may continue (Schezuan, Dim Sum, Sushi, Yakitori, etc.).

93.png

For more on class hierarchies and tradeoffs, see the Cascade solution.

Back to the top ↑


Specification Design: Time Series Reframing

Situation

Time series, in which values accumulate over time, require a window definition to have context. For example, cumulative rainfall or aggregate investment returns over what time period.


Remedy

Select useful time windows over which to standardize. Predict aggregate rainfall in the next 15 minutes or maximum price over the next 15 minutes.

Back to the top ↑


Specification Design: Cascade

Situation

Different aspects of a complex problem are logically composed of multiple classifiers, due to factors such as different training datasets/features or focus. Examples include large class hierarchies or especially handling of rare events. A single model is likely to overvalue the common events, neglecting the rare events of interest. Overweighting the rare events is likely to be suboptimal because of the associated degradation of accuracy for the common events. A naive 2 tier hierarchy suffers from compounding error rates (Tier 1 error X Tier 2 error).


Back to the top ↑


Specification Design: One Versus Rest


Back to the top ↑


Metrics: F-Beta


Back to the top ↑


Dataset Manipulation: Skewed Data