Sentic-Kinesis Research: Unified World Modeling
Research focused on the intersection of mechanistic interpretability and physical world modeling. Exploring the integration of causal reasoning encoders with velocity-based flow matching to enable granular, vector-driven control over emotional and kinetic trajectories in generative media.
Core Thesis
Transitioning from static generative denoising to Velocity-Based World Modeling. Our mission is to deliver high-fidelity, unified image/video generation optimized for consumer hardware (8GB–12GB VRAM) by integrating mechanistic interpretability with physical trajectory reasoning.
The Reasoning-First Stack (R&D Roadmap)
- Encoder Strategy: Gemma-4 E4B. Investigating methods to project the model's internal 'Thinking Mode' (<|channel>thought) into a dense semantic map for latent priming.
- Steering Layer: Emotion and Emotion-Deflection Probes. Proposed exposure of verified E4B emotional geometry (PC1 Valence / PC2 Arousal) as direct inference-level steering vectors.
Planned implementation of a three-tier control system:
- Coordinate-Based Presets: Integration of a 171-concept Look-Up Table (LUT) for automated manifold calibration.
- Subtext Injection: Dual-channel deflection logic to phase-shift internal intent against visual output for micro-expression synthesis.
- Mechanistic Emotional Sliders: Direct residual stream injection of PC1 (Valence) and PC2 (Arousal) eigenvectors for manual fine-tuning.
- Base Model: Cosmos-Predict 2.5-2B (Unified Flow Matching). Intended use of native physics-awareness and 3D RoPE for spatiotemporal consistency.
- VAE: Native (Wan 2.1 - 4x8x8). Prioritizing unified video/image output without the "lobotomization" effects of static-image VAE overrides.
Technical Pillars
- Velocity Trajectory Priming (vₜ = ε - x): Planned use of filtered reasoning states to define the physical "intent" of the motion path. This targets the inherent Flow Matching formulation of Cosmos-Predict to ensure motion alignment and prevent static-pose artifacts.
- Neural Steering via Interpretability: Moving beyond prompt engineering. By aiming to hook into the established Valence/Arousal axes of Gemma-4, leveraging the findings of Anthropic, the framework will provide a mathematical coordinate system for "mood" control.
- High-Noise Bias Conditioning: Following NVIDIA's 4.1 ablation studies, the framework will utilize a shifted logit-normal distribution during training. This will force model attention onto the reasoning vectors by biasing the training toward the high-noise ceiling, ensuring the "physics roadmap" isn't ignored by the decoder when pixel-level information is at its minimum.
References & Foundational Research
Status: Architecture Phase. No implementation until the hardware/stack convergence is finalized.