Real-world face hallucination is a challenging image translation problem. There exist various unknown transformations in real-world LR images that are hard to be modeled using traditional image degradation procedures. To address this issue, this paper proposes a novel pipeline, which consists of a style Variational Autoencoder (styleVAE) and an SR network incorporated with an attention mechanism. To get real-world-like low-quality images paired with the HR images, we design the styleVAE to transfer the complex nuisance factors in real-world LR images to the generated LR images. We also use mutual information estimation (MI) to get better style information. In addition, both global and local attention residual blocks are proposed to learn long-range dependencies and local texture details, respectively. It is worth noticing that styleVAE is presented in a plug-and-play manner and thus can help to improve the generalization and robustness of our SR method as well as other SR methods. Extensive experiments demonstrate that our method is effective and generalizable both quantitatively and qualitatively.