🥞Type for NNC

Type for neural network computations

Overview

F16 and F32 precision are different floating-point formats used in ML.

F16, also known as half-precision, uses 16 bits to represent a floating-point number.

F32, also known as single-precision, uses 32 bits.

Differences

Range:
- F16 has a smaller range than F32, it can represent smaller and larger numbers with less precision
Memory usage:
- F16 requires less memory than F32, which can be beneficial for training and deployment of larger models or training with larger mini-batches
Speed
- Some hardware can run operations in F16 faster than in F32 which can result in faster training and inference times.

Conclusion

In general, F32 is the standard type for neural network computations, but there is a trend in deep learning towards using F16 instead of F32 because lower precision calculations seem to be not critical for neural networks. However, some computations, such as large reductions, should still be left in F32 for numeric stability.

Reference

FP64, FP32, FP16, BFLOAT16, TF32, and other members of the ZOOMedium

Mixed precision | TensorFlow CoreTensorFlow

PreviousMeasuring model quality NextToken

Last updated 2 years ago

Was this helpful?