The recent development in engineering design has incorporated customer preferences by involving a choice model. In generating a choice model to produce a good quality estimate of parameters related to product attributes, a high-quality choice set is essential. However, the choice set data are often not available. This research proposes a methodology that utilizes online data and customer reviews to construct customer choice sets in the absence of both the actual choice set and the customer sociodemographic data. The methodology consists of three main parts, i.e., clustering the products based on their attributes, clustering the customers based on their reviews, and constructing the choice sets based on a sampling probability scenario that relies on product and customer clusters. The proposed scenario is called Normalized, which multiplies the product cluster and customer cluster fractions to obtain the probability sampling distribution. There are two utility functions proposed, i.e., a linear combination of product attributes only and a function that includes the interactions of product attributes and customer reviews. The methodology is implemented to a data set of laptops. The Normalized scenario performs significantly better than the baseline, Random, in predicting the test set data. Moreover, the inclusion of customer reviews into the utility function also significantly increases the predictive ability of the model. The research shows that using the product attribute data and customer reviews to construct choice sets generates choice models with higher predictive ability than randomly constructed choice sets.