We re-visit the complexity of polynomial time pre-processing (kernelization) for the d-Hitting Set problem. This is one of the most classic problems in Parameterized Complexity by itself, and, furthermore, it encompasses several other of the most well-studied problems in this field, such as Vertex Cover, Feedback Vertex Set in Tournaments (FVST) and Cluster Vertex Deletion (CVD). In fact, d-Hitting Set encompasses any deletion problem to a hereditary property that can be characterized by a finite set of forbidden induced subgraphs. With respect to bit size, the kernelization complexity of d-Hitting Set is essentially settled: there exists a kernel with O(kd) bits (O(kd) sets and O(kd−1) elements) and this it tight by the result of Dell and van Melkebeek [STOC 2010, JACM 2014]. Still, the question of whether there exists a kernel for d-Hitting Set with fewer elements has remained one of the most major open problems in Kernelization. In this paper, we first show that if we allow the kernelization to be lossy with a qualitatively better loss than the best possible approximation ratio of polynomial time approximation algorithms, then one can obtain kernels where the number of elements is linear for every fixed d. Further, based on this, we present our main result: we show that there exist approximate Turing kernelizations for d-Hitting Set that even beat the established bit-size lower bounds for exact kernelizations – in fact, we use a constant number of oracle calls, each with “near linear” (O(k1+ϵ)) bit size, that is, almost the best one could hope for. Lastly, for two special cases of implicit 3-Hitting set, namely, FVST and CVD, we obtain the “best of both worlds” type of results – (1 + ϵ)-approximate kernelizations with a linear number of vertices. In terms of size, this substantially improves the exact kernels of Fomin et al. [SODA 2018, TALG 2019], with simpler arguments.