We use a sample of 53 galaxy clusters at 0.03 < z < 0.1 with available masses derived from the caustic technique and with velocity dispersions computed using 208 galaxies on average per cluster, in order to investigate the scaling between richness, mass and velocity dispersion. A tight scaling between richness and mass is found, with an intrinsic scatter of only 0.19 dex in mass and with a slope one, i.e. clusters that have twice as many galaxies are twice as massive. When richness is measured without any knowledge of the cluster mass or linked parameters (such as r(200)), it can predict mass with an uncertainty of 0.29 +/- 0.01 dex. As a mass proxy, richness competes favourably with both direct measurements of mass given by the caustic method, which has typically 0.14 dex errors (versus 0.29) and X-ray luminosity, which offers a similar 0.30 dex uncertainty. The similar performances of X-ray luminosity and richness in predicting cluster masses has been confirmed using cluster masses derived from velocity dispersion fixed by numerical simulations. These results suggest that cluster masses can be reliably estimated from simple galaxy counts, at least at the redshift and masses explored in this work. This has important applications in the estimation of cosmological parameters from optical cluster surveys, because in current surveys clusters detected in the optical range outnumber, by at least one order of magnitude, those detected in X-ray. Our analysis is robust from an astrophysical perspective because the adopted masses are among the most hypothesis-parsimonious estimates of cluster mass and from a statistical perspective, because our Bayesian analysis accounts for terms usually neglected, such as the Poisson nature of galaxy counts, the intrinsic scatter and uncertain errors. The data and code used for the stochastic computation are provided in the paper.