We proposed a novel approach to investigate how problem-solving strategies, identified using response time and eye-tracking data, can impact individuals' performance on the Object Assembly (OA) task. To conduct an integrated assessment of spatial reasoning ability and problem-solving strategy, we applied the Multimodal Joint-Hierarchical Cognitive Diagnosis Model (MJ-DINA) to analyze the performance of young students (aged 6 to 14) on 17 OA items. The MJ-DINA model consists of three sub-models: a Deterministic Inputs, Noisy "and" Gate (DINA) model for estimating spatial ability, a lognormal RT model for response time, and a Bayesian Negative Binomial (BNF) model for fixation counts. In the DINA model, we estimated five spatial cognitive attributes aligned with problem-solving processes: encoding, falsification, mental rotation, mental displacement, and intractability recognition. Our model fits the data adequately, with Gelman-Rubin convergence statistics near 1.00 and posterior predictive -values between 0.05 and 0.95 for the DINA, Log RT, and BNF sub-models, indicating reliable parameter estimation. Our findings indicate that individuals with faster processing speeds and fewer fixation counts, which we label Reflective-Scanner, outperformed the other three identified problem-solving strategy groups. Specifically, sufficient eye movement was a key factor contributing to better performance on spatial reasoning tasks. Additionally, the most effective method for improving individuals' spatial task performance was training them to master the falsification attribute. This research offers valuable implications for developing tailored teaching methods to improve individuals' spatial ability, depending on various problem-solving strategies.