Design concept evaluation is a key process in the new product development process with a significant impact on the product’s success and total cost over its life cycle. This paper is motivated by two limitations of the state-of-the-art in concept evaluation: (1) the amount and diversity of user feedback and insights utilized by existing concept evaluation methods such as quality function deployment are limited. (2) Subjective concept evaluation methods require significant manual effort which in turn may limit the number of concepts considered for evaluation. A deep multimodal design evaluation (DMDE) model is proposed in this paper to bridge these gaps by providing designers with an accurate and scalable prediction of new concepts’ overall and attribute-level desirability based on large-scale user reviews on existing designs. The attribute-level sentiment intensities of users are first extracted and aggregated from online reviews. A multimodal deep regression model is then developed to predict the overall and attribute-level sentiment values based on the features extracted from orthographic product images via a fine-tuned ResNet-50 model and from product descriptions via a fine-tuned bidirectional encoder representations from transformer model and aggregated using a novel self-attention-based fusion model. The DMDE model adds a data-driven, user-centered loop within the concept development process to better inform the concept evaluation process. Numerical experiments on a large dataset from an online footwear store indicate a promising performance by the DMDE model with 0.001 MSE loss and over 99.1% accuracy.