<aside> <img src="https://s3-us-west-2.amazonaws.com/secure.notion-static.com/f801b6d7-fd88-4362-abb8-fdc9a9c104dd/JAXON_Logo_Mark_on_blue.jpg" alt="https://s3-us-west-2.amazonaws.com/secure.notion-static.com/f801b6d7-fd88-4362-abb8-fdc9a9c104dd/JAXON_Logo_Mark_on_blue.jpg" width="40px" /> Users must have selected a Project for the Datasets tab to become available.
</aside>
Prior to importing a dataset file into Jaxon please view the file externally and note down the following:
[”label-1”, “label-2”, “label-n”]
<aside> <img src="https://s3-us-west-2.amazonaws.com/secure.notion-static.com/464d32fd-5052-4ad8-a6f3-915251eaf8f4/JAXON_Logo_Mark_on_blue.jpg" alt="https://s3-us-west-2.amazonaws.com/secure.notion-static.com/464d32fd-5052-4ad8-a6f3-915251eaf8f4/JAXON_Logo_Mark_on_blue.jpg" width="40px" /> Currently, Jaxon supports datasets in CSV, TSV, JSON, XML, and XSL/XSLX formats, either zipped or unzipped as long as all file types within a folder are homogenous.
</aside>
<aside> <img src="https://s3-us-west-2.amazonaws.com/secure.notion-static.com/52b51b4e-6d42-40df-8e55-1ccfaa283217/JAXON_Logo_Mark_on_blue.jpg" alt="https://s3-us-west-2.amazonaws.com/secure.notion-static.com/52b51b4e-6d42-40df-8e55-1ccfaa283217/JAXON_Logo_Mark_on_blue.jpg" width="40px" /> Jaxon will attempt to automatically identify these characteristics, but for best results, we recommend verifying the information.
</aside>
<aside> <img src="https://s3-us-west-2.amazonaws.com/secure.notion-static.com/c3ac0905-ae30-4517-9601-8f7205162597/JAXON_Logo_Mark_on_blue.jpg" alt="https://s3-us-west-2.amazonaws.com/secure.notion-static.com/c3ac0905-ae30-4517-9601-8f7205162597/JAXON_Logo_Mark_on_blue.jpg" width="40px" /> The dataset import may take a few minutes to complete. Once the dataset is imported, the dataset will become available in the Datasets tab.
</aside>
Note that for multi-label datasets, the Labels column must be a Python List.
From here, you can define the Specification and then use the dataset in the rest of the Jaxon Platform.
Once a dataset has been imported, it can be copied. This function creates an exact duplicate of the original dataset.
Fill out the pop up that appears and select Submit ****
Once the dataset has been copied, the new dataset will become available in the Datasets tab.
Any available dataset in the Datasets tab can be split into two smaller sets. The split ratio for both labeled and unlabeled rows is independently controlled. A dataset can be split before creating a Specification or after. In the former case, splitting will place the labeled and unlabeled examples in both the datasets. In the latter case the user is given the ability to steer unlabeled examples based on a user provided ratio.
Most times, splitting a dataset will create two new datasets while also preserving the original. However, if the Flatten feature is being used, one new dataset is created while also preserving the original.
Fill out the pop up that appears and select Submit
<aside> <img src="https://s3-us-west-2.amazonaws.com/secure.notion-static.com/61489a86-9843-4008-b925-e76f37dfbf58/JAXON_Logo_Mark_on_blue.jpg" alt="https://s3-us-west-2.amazonaws.com/secure.notion-static.com/61489a86-9843-4008-b925-e76f37dfbf58/JAXON_Logo_Mark_on_blue.jpg" width="40px" /> The Train set can contain both labeled and unlabeled examples, but the Test set should not contain any unlabeled examples.
</aside>
Once the dataset has been split and/or flattened, the new dataset(s) will become available in the Datasets tab.
SmartSplit is a proprietary means of splitting a dataset into training and holdout datasets in a way that avoids covariate drift and other latent differences between those datasets. Specifically, it aims to improve upon the standard baseline approach of random sampling, using a given predetermined percentage split, such as the typical 80/20 rule of thumb.
The Flatten option removes examples from over-represented classes and flattens the distribution. SmartSplit must also be enabled to use Flatten. SmartSplit ensures that the examples that are discarded from the over-represented classes do not introduce bias into the resulting flattened dataset.
If the Flatten option is checked, the pop up will change. Only one dataset will be created. Fill out the pop up that appears and select Submit
Any two datasets can be merged to provide a combined set. The merge function will merge columns with the same header into a single, combined column. Columns with non-matching names will stay as they are.
<aside> <img src="https://s3-us-west-2.amazonaws.com/secure.notion-static.com/1f078cad-3845-4c56-9805-912d2a4c01e2/JAXON_Logo_Mark_on_blue.jpg" alt="https://s3-us-west-2.amazonaws.com/secure.notion-static.com/1f078cad-3845-4c56-9805-912d2a4c01e2/JAXON_Logo_Mark_on_blue.jpg" width="40px" /> If your data has any labels, make sure the Specification has been assigned and locked before merging datasets for best results.
</aside>