Cookbook | Notion

Home

Looking for some tried and true Jaxon recipes? You’re in the right place.

On this page you will find recipes for:

Specification Design: Reframing

Situation

The prediction target is a continuous numerical value such as star rating or price movement (a regression problem). In order to use Jaxon, and to bucket outcomes, it is necessary to reframe the prediction as a classification problem.

Remedy

Model the outcomes as a discrete probability distribution. In the spirit of calculus, a continuous data stream (i.e. the possible numerical outcomes) can be placed into arbitrarily granular buckets. Consider reframing a 1-10 star review into 3 classes:

Negative (1-3 stars)
Neutral (4-6 stars)
Positive (7-10 stars)

This allows a choice of granularity (how wide is each class “bucket”), as well as informed decisions about where to focus the model. For ranges that require granular discernment, narrower buckets can be used in combination with wider buckets for less interesting ranges; alternatively, buckets can be sized by area (there’s that calculus again!) in order to arrive at evenly-sized classes. Long tails and rare extreme events can be captured by bounding the first and last bucket on only one side (e.g. x > 12 in the above illustration).

Tradeoffs

Pros:

Access to Jaxon platform techniques targeting classification tasks.
Control over probability density; often aligns better with derived business actions (i.e. does it matter if the review is 1 or 2 stars? either way the action is to post an apology from management). quantile regression
This probability density codifies a Bayesian prior, which may be of use for downstream work.

Cons:

Any discretization (bucketing) necessarily reduces precision.

Troubleshooting: what if the bucketing strategy is poor?

A sharp peak in the adjusted distribution:

This is likely to suffer from imbalance issues inherent in severe-skew classification
Remedy: create more buckets around the peak, and fewer on the tail(s)

A flat curve in the distribution:

This might be ok! Ideal, as long as you care equally about all classes.
But beware that this actually represents either the true distribution, or is an intentional transformation to focus on specific sub-ranges.

Reframing classification to classification

Reframing can be applied to a priori classification problems as well. Consider navigating a class hierarchy and deciding at which granularity to frame a problem. For example, if classifying recipes into cuisines, is a lower-granularity class of Asian cuisine sufficient for the problem, or is it a better idea to instead use several higher-granularity classes such as Chinese, Japanese, Korean, and Thai? This process may continue (Schezuan, Dim Sum, Sushi, Yakitori, etc.).

For more on class hierarchies and tradeoffs, see the Cascade solution.

Back to the top ↑

Specification Design: Time Series Reframing

Back to the top ↑