A preliminary exploration of the intelligent traffic diversion system for images and videos in social media based on large models and target recognition

Advancing Trustworthy Information Ecosystems

Project Overview

This project addresses the critical issue of fake content proliferation on social media, where images and short videos have become central to information dissemination. The decreasing technical barriers to creating Deepfakes, AI-generated content (AIGC), and localized alterations have led to an industrialization and increased concealment of deceptive content. The societal impact spans from financial market manipulation and public opinion distortion to personal defamation and the intentional tarnishing of national images. Current platform mechanisms for identifying and flagging AI-generated content lag behind, with limited accuracy and scope, and often lack clear user warnings, making it difficult for the average user to discern authenticity.

Our research focuses on the authenticity verification of AI-generated images and videos, aiming to systematically identify various types of manipulated content produced by deep generative models. The scope includes static images (with a focus on deep learning-based Deepfake image authenticity research) and dynamic videos (emphasizing multimodal deepfake detection and video authenticity verification, specifically including cross-modal consistency analysis of Asian facial features in videos and audio). We aim to establish a rigorous problem framework and research methodology to clearly define the boundaries and objectives of fake content detection, hoping to lay a solid theoretical foundation and research basis for building a controllable and trustworthy intelligent content ecosystem.

Research Progress

Current Progress: 25% (Preparation Phase)

  • Completed initial literature review and technical roadmap design.
  • Established cloud server infrastructure.
  • Initiated dataset collection for text and multimodal samples.
  • Developed preliminary image recognition module for synthetic feature detection.

Core Modules

Deepfake Image Authenticity Detection (D-ImageNet)

This module focuses on detecting forgery features in AI-generated images, particularly face swapping and facial attribute editing. It employs the D-ImageNet model, integrating YOLOv9 and SAM (Segment Anything Model) for precise localization and segmentation of key areas. The detection process incorporates frequency domain analysis (Fast Fourier Transform) to capture synthetic texture anomalies, PRNU (Photo Response Non-Uniformity) residual analysis to distinguish real from fake images, and edge consistency analysis to identify boundary artifacts and lighting inconsistencies. The model utilizes multi-scale feature fusion and spatial attention mechanisms to optimize feature extraction efficiency and detection robustness.

Multimodal Deepfake Detection & Video Authenticity Verification (D-VideoNet)

Built upon the D-VideoNet framework, this module targets video forgery types such as facial reenactment and lip-sync manipulation through cross-modal consistency verification. D-VideoNet standardizes input interfaces and uses a fusion representation learning mechanism to uniformly model image, audio, and video temporal features. Video frame analysis utilizes I3D three-dimensional convolutional networks and temporal attention mechanisms to detect inter-frame structural anomalies, face-swapping edge artifacts, and unnatural motion patterns. Audio-visual synchronization verification integrates SyncNet and Wav2Lip algorithms to extract lip movements, facial muscle changes, and speech features (using a CNN-LSTM network based on Mel spectrogram and prosody modeling) to achieve high-precision consistency discrimination.

Project Timeline

  • Apr-Jun 2025: Literature survey, technical design, dataset preparation, initial image module development.
  • Jul-Dec 2025: Video and audio analysis module development, system integration, and preliminary testing of multimodal consistency.
  • Jan-Feb 2026: System refinement, user interface development (browser plugin, application prototype, API), and blockchain integration for tamper-proof logging.
  • Mar-Apr 2026: Pilot deployment in news media and social media platforms, user feedback collection, and public awareness campaigns.