Abstract

The cross-entropy and its related terms from information theory (e.g.~entropy, Kullback–Leibler divergence) are used throughout artificial intelligence and machine learning. This includes many of the major successes, both current and historic, where they commonly appear as the natural objective of an optimisation procedure for learning model parameters, or their distributions. This paper presents a novel derivation of the differential cross-entropy between two 1D probability density functions represented as piecewise linear functions. Implementation challenges are resolved and experimental validation is presented, including a rigorous analysis of accuracy and a demonstration of using the presented result as the objective of a neural network. Previously, cross-entropy would need to be approximated via numerical integration, or equivalent, for which calculating gradients is impractical. Machine learning models with high parameter counts are optimised primarily with gradients, so if piecewise linear density representations are to be used then the presented analytic solution is essential. This paper contributes the necessary theory for the practical optimisation of information theoretic objectives when dealing with piecewise linear distributions directly. Removing this limitation expands the design space for future algorithms.
Original languageEnglish
Article number2041
Number of pages31
JournalTransactions on Machine Learning Research
Volume2024
Publication statusPublished - 10 Apr 2024

Fingerprint

Dive into the research topics of 'The Cross-entropy of Piecewise Linear Probability Density Functions'. Together they form a unique fingerprint.

Cite this