Performance
============
 
Since palettization reduces the size of each weight value, the amount of data to be moved is reduced during prediction. 
This can lead to benefits in memory-bottlenecked models. Note that this latency advantage is available only when 
palettized weights are loaded and are decompressed “just in time” for computation. Starting with `iOS17/macOS14`, this is
more likely to happen for models running primarily on the Neural Engine backend.

For the `per_grouped_channel` palettization mode added in `iOS18/macOS15`, you may see a drop in runtime performance as 
the number of LUTs used to represent a weight tensor increases. Typically, a group size of 8 or 16 gives good accuracy while still increasing speed over an uncompressed model.


## Performance Benchmarks 

In the table below, we provide runtime performance benchmarks on several models, palettized using `coremltools.optimize` APIs. 

### Methodology

The training time compressed models were obtained by fine-tuning the `float32` PyTorch models with weights initialized from the checkpoints linked in the [Model Info](#model-info) table, and using methods from `coremltools.optimize.torch` to perform compression. The datasets used for fine-tuning the models are also linked in the same table, along with the accuracy metric being reported. We used fine-tuning recipes which are commonly used in literature for the task at hand and for standard data augmentations. 

Similarly, the post training compressed models were obtained by compressing the converted `float16` Core ML models, with pre-trained weights, using methods from the `coremltools.optimize.coreml` module. 

The trained and compressed models and the `coremltools.optimize.torch` config files used for compression can be downloaded by clicking the respective links embedded in the model and config names.

The latency numbers were captured using the Xcode **Performance** tab, using the `median` statistic. Compute unit selection is `all` unless otherwise noted. The latency numbers are sensitive to the device state, and may vary depending on the device state and build versions. 

- Device: iPhone 14 Pro (A16), unless otherwise mentioned
- iOS build: iOS17 
- Xcode : Xcode 15

### Model Info

| Model Name                    | Task                 | Pre-trained Weights                                                                | Dataset                                                                                                                         | Accuracy Metric    |
|-------------------------------|----------------------|------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------|--------------------|
| MobileNetv2-1.0               | Image Classification | [Torchvision](https://download.pytorch.org/models/mobilenet_v2-b0353104.pth)       | [ImageNet](https://pytorch.org/vision/main/generated/torchvision.datasets.ImageNet.html)                                        | Top-1 Accuracy (%) |
| MobileNetv3-small             | Image Classification | [Torchvision](https://download.pytorch.org/models/mobilenet_v3_small-047dcff4.pth) | [ImageNet](https://pytorch.org/vision/main/generated/torchvision.datasets.ImageNet.html)                                        | Top-1 Accuracy (%) |
| ResNet50                      | Image Classification | [Torchvision](https://download.pytorch.org/models/resnet50-0676ba61.pth)           | [ImageNet](https://pytorch.org/vision/main/generated/torchvision.datasets.ImageNet.html)                                        | Top-1 Accuracy (%) |
| MobileViTv2-1.0               | Image Classification | cvnets                                                                             | [ImageNet](https://pytorch.org/vision/main/generated/torchvision.datasets.ImageNet.html)                                        | Top-1 Accuracy (%) |
| CenterNet (ResNet34 backbone) | Object Detection     | Torchvision [backbone](https://download.pytorch.org/models/resnet34-b627a593.pth)  | [MS-COCO](https://pytorch.org/vision/main/generated/torchvision.datasets.CocoDetection.html#torchvision.datasets.CocoDetection) | mAP                |

### Results 

| Model Name                                                                                                                                                                      | Config                                                                                                                                         | Optimization Algorithm | Compression Ratio | Latency in ms (per batch) |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------|------------------------|-------------------|---------------------------|
| [MobileNetv2-1.0](https://ml-assets.apple.com/coreml/quantized_models/uncompressed/MobileNetV2Alpha1.mlpackage.zip)                                                             | Float16                                                                                                                                        | n/a                    | 1.0               | 0.48                      |
| [MobileNetv2-1.0](https://ml-assets.apple.com/coreml/quantized_models/training_time_compressed/palettized/MobileNetV2Alpha1ScalarPalettization2Bit.mlpackage.zip)               | [2 bit](https://ml-assets.apple.com/coreml/quantized_models/training_time_compressed/palettized/MobileNetV2Alpha1ScalarPalettization2Bit.yaml) | Differentiable K-Means | 5.92              | 0.47                      |
| [MobileNetv2-1.0](https://ml-assets.apple.com/coreml/quantized_models/training_time_compressed/palettized/MobileNetV2Alpha1ScalarPalettization4Bit.mlpackage.zip)               | [4 bit](https://ml-assets.apple.com/coreml/quantized_models/training_time_compressed/palettized/MobileNetV2Alpha1ScalarPalettization4Bit.yaml) | Differentiable K-Means | 3.38              | 0.45                      |
| [MobileNetv2-1.0](https://ml-assets.apple.com/coreml/quantized_models/post_training_compressed/palettized/MobileNetV2Alpha1ScalarPalettization6Bit.mlpackage.zip)               | 6 bit                                                                                                                                          | K-Means                | 2.54              | 0.48                      |
| [MobileNetv2-1.0](https://ml-assets.apple.com/coreml/quantized_models/post_training_compressed/palettized/MobileNetV2Alpha1ScalarPalettization8Bit.mlpackage.zip)               | 8 bit                                                                                                                                          | K-Means                | 1.97              | 0.45                      |
| [MobileNetv3-small](https://ml-assets.apple.com/coreml/quantized_models/uncompressed/MobileNetV3Small.mlpackage.zip)                                                            | Float16                                                                                                                                        | n/a                    | 1.0               | 0.13                      |
| [MobileNetv3-small](https://ml-assets.apple.com/coreml/quantized_models/training_time_compressed/palettized/MobileNetV3SmallScalarPalettization2Bit.mlpackage.zip)              | [2 bit](https://ml-assets.apple.com/coreml/quantized_models/training_time_compressed/palettized/MobileNetV3SmallScalarPalettization2Bit.yaml)  | Differentiable K-Means | 5.82              | 0.13                      |
| [MobileNetv3-small](https://ml-assets.apple.com/coreml/quantized_models/training_time_compressed/palettized/MobileNetV3SmallScalarPalettization4Bit.mlpackage.zip)              | [4 bit](https://ml-assets.apple.com/coreml/quantized_models/training_time_compressed/palettized/MobileNetV3SmallScalarPalettization4Bit.yaml)  | Differentiable K-Means | 3.47              | 0.13                      |
| [MobileNetv3-small](https://ml-assets.apple.com/coreml/quantized_models/post_training_compressed/palettized/MobileNetV3SmallScalarPalettization6Bit.mlpackage.zip)              | 6 bit                                                                                                                                          | K-Means                | 2.6               | 0.13                      |
| [MobileNetv3-small](https://ml-assets.apple.com/coreml/quantized_models/post_training_compressed/palettized/MobileNetV3SmallScalarPalettization8Bit.mlpackage.zip)              | 8 bit                                                                                                                                          | K-Means                | 1.93              | 0.13                      |
| [ResNet50](https://ml-assets.apple.com/coreml/quantized_models/uncompressed/ResNet50.mlpackage.zip)                                                                             | Float16                                                                                                                                        | n/a                    | 1.0               | 1.52                      |
| [ResNet50](https://ml-assets.apple.com/coreml/quantized_models/training_time_compressed/palettized/ResNet50ScalarPalettization2Bit.mlpackage.zip)                               | [2 bit](https://ml-assets.apple.com/coreml/quantized_models/training_time_compressed/palettized/ResNet50ScalarPalettization2Bit.yaml)          | Differentiable K-Means | 7.63              | 1.43                      |
| [ResNet50](https://ml-assets.apple.com/coreml/quantized_models/training_time_compressed/palettized/ResNet50ScalarPalettization4Bit.mlpackage.zip)                               | [4 bit](https://ml-assets.apple.com/coreml/quantized_models/training_time_compressed/palettized/ResNet50ScalarPalettization4Bit.yaml)          | Differentiable K-Means | 3.9               | 1.41                      |
| [ResNet50](https://ml-assets.apple.com/coreml/quantized_models/post_training_compressed/palettized/ResNet50ScalarPalettization6Bit.mlpackage.zip)                               | 6 bit                                                                                                                                          | K-Means                | 2.65              | 1.37                      |
| [ResNet50](https://ml-assets.apple.com/coreml/quantized_models/post_training_compressed/palettized/ResNet50ScalarPalettization8Bit.mlpackage.zip)                               | 8 bit                                                                                                                                          | K-Means                | 1.99              | 1.4                       |
| [CenterNet (ResNet34 backbone)](https://ml-assets.apple.com/coreml/quantized_models/uncompressed/CenterNetResNet34.mlpackage.zip)                                               | Float16                                                                                                                                        | n/a                    | 1.0               | 6.85                      |
| [CenterNet (ResNet34 backbone)](https://ml-assets.apple.com/coreml/quantized_models/training_time_compressed/palettized/CenterNetResNet34ScalarPalettization2Bit.mlpackage.zip) | [2 bit](https://ml-assets.apple.com/coreml/quantized_models/training_time_compressed/palettized/CenterNetResNet34ScalarPalettization2Bit.yaml) | Differentiable K-Means | 7.71              | 6.37                      |
| [CenterNet (ResNet34 backbone)](https://ml-assets.apple.com/coreml/quantized_models/training_time_compressed/palettized/CenterNetResNet34ScalarPalettization4Bit.mlpackage.zip) | [4 bit](https://ml-assets.apple.com/coreml/quantized_models/training_time_compressed/palettized/CenterNetResNet34ScalarPalettization4Bit.yaml) | Differentiable K-Means | 3.94              | 6.67                      |
| [CenterNet (ResNet34 backbone)](https://ml-assets.apple.com/coreml/quantized_models/post_training_compressed/palettized/CenterNetResNet34ScalarPalettization6Bit.mlpackage.zip) | 6 bit                                                                                                                                          | K-Means                | 2.65              | 6.71                      |
| [CenterNet (ResNet34 backbone)](https://ml-assets.apple.com/coreml/quantized_models/post_training_compressed/palettized/CenterNetResNet34ScalarPalettization8Bit.mlpackage.zip) | 8 bit                                                                                                                                          | K-Means                | 2.0               | 6.85                      |
