Deep forgery detection on video data has attracted remarkable research attention in recent years due to its potential in defending forgery attacks. However, existing methods either only focus on the visual evidence within individual images, or are too sensitive to fluctuations across frames. To address these issues, this paper propose a novel model, named Bita-Net, to detect forgery faces in video data. The network design of Bita-Net is inspired by the mechanism of how human beings detect forgery data, i.e. browsing and scrutinizing, which is reflected by the two-pathway architecture of Bita-Net. Concretely, the browsing pathway scans the entire video at a high frame rate to check the temporal consistency, while the scrutinizing pathway focuses on analyzing key frames of the video at a lower frame rate. Furthermore, an attention branch is introduced to improve the forgery detection ability of the scrutinizing pathway. Extensive experiment results demonstrate the effectiveness and generalization ability of Bita-Net on various popular face forensics detection datasets, including FaceForensics++, CelebDF, DeepfakeTIMIT and UADFV.