Subgroup discovery (SGD) is a local knowledge discovery method. It identifies subsets in a set of elements that 'stands out' with respect to some property of the elements.
What's new in it then from clustering? My current understanding (need to check and update): Compared to global methods like decision tree, regression, or compressed sensing, SGD is a local method meaning it does not attempt to classify all the elements of the parent set into subsets, rather it tries to find subclasses which are of high quality with respect to the desired property.
It has been used to find out subgroup properties that contribute to one of the two crystal structures of 82 octet binaries (Ghiringhelli et al. 2015). SGD predicts two subgroups which contain elements that have either a zinc blend structure or a rock salt structure (Goldsmith et al 2017, Boley et al. 2017). For a review of the SGD, see Atzmueller (2015).
New (~2024–25): GitHub Repo with Python implementation Also see Lopez-Martinez-Carrasco et al. (2024). (I tried a Java implementation few years ago; it was useful but a bit clunky.)
Given: Sample $S \subseteq P$, Target variable $y:P\rightarrow {a, b, c, \cdots}$, and Features $x_j: P\rightarrow X_j$
Define: Propositions $Pi_x = {\pi_1,\cdots, \pi_k}$, Selection language $\mathcal{L}_x = {\sigma(i)=\pi_{j_1}(i)\wedge\cdots\wedge \pi_{j_t}(i)}$
Optimize: $f(Q)=\textrm{cov}(Q)^\gamma \textrm{eff}(Q)_+$ where $Q=\{i \in S: \sigma(i)= \textrm{True}\}$ (extension), $\textrm{cov}(Q)=|Q|/|S|$ (coverage), $\textrm{eff}(Q)= \frac{H_y(S)-H_y(Q)}{H_y(S)}$ (effect), and $H_y(Q) = -\sum_v p_Q(y=v) \log p_Q(y=v)$ (entropy).