Cheat Sheet | Notion

Home

Want to know the absolute basics of getting started using the Jaxon Platform? Look no further.

<aside> <img src="https://s3-us-west-2.amazonaws.com/secure.notion-static.com/2cf43123-2bd7-488e-8ebc-2c03c954877c/0_JAXON_Logo_Mark_2.jpg" alt="https://s3-us-west-2.amazonaws.com/secure.notion-static.com/2cf43123-2bd7-488e-8ebc-2c03c954877c/0_JAXON_Logo_Mark_2.jpg" width="40px" /> For more detailed information, see our Jaxon U page.

</aside>

What to do:

Import some data and set up a specification.
Add a few labels (at least 10 per class, 100 is better) if not already in place.
Split your original dataset into train and test sets using SmartSplit. The exact split (of the labeled examples) is a judgement call, and not critical. 80/20 is popular. Keep all the unlabeled examples in the training set.

<aside> <img src="https://s3-us-west-2.amazonaws.com/secure.notion-static.com/5a7c72a7-36c9-4639-8726-126aa4481af0/0_JAXON_Logo_Mark_2.jpg" alt="https://s3-us-west-2.amazonaws.com/secure.notion-static.com/5a7c72a7-36c9-4639-8726-126aa4481af0/0_JAXON_Logo_Mark_2.jpg" width="40px" /> Already pre-split? Unless you REALLY know the exact provenance and characteristics of every example (you probably don’t), merge it together and then SmartSplit that.

</aside>
Go straight to the Neural tab. Create a 1-stage training schedule:
1. Model: RoBERTa
2. Augment: checked
  
  <aside> <img src="https://s3-us-west-2.amazonaws.com/secure.notion-static.com/7d07bd92-6708-4730-a25c-0efd9732d528/0_JAXON_Logo_Mark_2.jpg" alt="https://s3-us-west-2.amazonaws.com/secure.notion-static.com/7d07bd92-6708-4730-a25c-0efd9732d528/0_JAXON_Logo_Mark_2.jpg" width="40px" /> Do not check augment if you have more than 10 classes in your spec.
  
  </aside>
3. Epochs: 100
  1. That’s a made up number. 20,000 / num_labeled_examples FTW! Jaxon includes early stopping, so this is just a maximum training budget.
4. Learning Rate (LR): 1e-6
  1. Also a made up number - unfortunately ML practice contains a fair bit of trial and error, and each case is unique. If things go off the rails and you get a poor result, try again with a lower learning rate
When training is finished, create an ensemble consisting of only that model
Use the ensemble to synthetically label the training dataset. (This infers labels for the unlabeled examples)
Export the new dataset.