<aside> <img src="https://s3-us-west-2.amazonaws.com/secure.notion-static.com/2cf43123-2bd7-488e-8ebc-2c03c954877c/0_JAXON_Logo_Mark_2.jpg" alt="https://s3-us-west-2.amazonaws.com/secure.notion-static.com/2cf43123-2bd7-488e-8ebc-2c03c954877c/0_JAXON_Logo_Mark_2.jpg" width="40px" /> For more detailed information, see our Jaxon U page.
</aside>
Import some data and set up a specification.
Add a few labels (at least 10 per class, 100 is better) if not already in place.
Split your original dataset into train and test sets using SmartSplit. The exact split (of the labeled examples) is a judgement call, and not critical. 80/20 is popular. Keep all the unlabeled examples in the training set.
<aside> <img src="https://s3-us-west-2.amazonaws.com/secure.notion-static.com/5a7c72a7-36c9-4639-8726-126aa4481af0/0_JAXON_Logo_Mark_2.jpg" alt="https://s3-us-west-2.amazonaws.com/secure.notion-static.com/5a7c72a7-36c9-4639-8726-126aa4481af0/0_JAXON_Logo_Mark_2.jpg" width="40px" /> Already pre-split? Unless you REALLY know the exact provenance and characteristics of every example (you probably don’t), merge it together and then SmartSplit that.
</aside>
Go straight to the Neural tab. Create a 1-stage training schedule:
Model: RoBERTa
Augment: checked
<aside> <img src="https://s3-us-west-2.amazonaws.com/secure.notion-static.com/7d07bd92-6708-4730-a25c-0efd9732d528/0_JAXON_Logo_Mark_2.jpg" alt="https://s3-us-west-2.amazonaws.com/secure.notion-static.com/7d07bd92-6708-4730-a25c-0efd9732d528/0_JAXON_Logo_Mark_2.jpg" width="40px" /> Do not check augment if you have more than 10 classes in your spec.
</aside>
Epochs: 100
Learning Rate (LR): 1e-6
When training is finished, create an ensemble consisting of only that model
Use the ensemble to synthetically label the training dataset. (This infers labels for the unlabeled examples)
Export the new dataset.
© Copyright Jaxon, Inc. 2023 All rights reserved.