Predicting the heating load of a building is critical for efficient system operation and cost reduction. Besides the time series, building load data also includes geographical context. It is challenging for the traditional time series model to represent the load data’s time and spatial relations simultaneously. On the other hand, the dependence relationship between the long-time series is notoriously hard to describe in the conventional paradigm. This paper proposes a CNN-LSTM algorithm based on the attention mechanism, combining CNN-LSTM’s capacity to concurrently capture temporal and spatial features with the ability of the attention mechanism to simulate long-term dependence. In addition, the heating load of a university in Xi ‘an is adopted as a case study. Single CNN, LSTM models, and models based on attention mechanism, were used for comparison. The prediction results showed that the CNNLSTM model was more precise than a single CNN or LSTM model, and the global capture ability of the attention mechanism further increased the accuracy. Compared to the CNN-LSTM model, the AT-CNN-LSTM exhibited a 1.2% improvement in goodness-of-fit R2, a 25.9% drop in RMSE, a 25.4% decrease in CV-RMSE, and a 26.1% decline in MAE. In contrast, the R2 of the AT-CNN-LSTM model improved by 15.8% on average, RMSE reduced by 31.3%, CV-RMSE fell by 31.5%, and MAE decreased by 32.4% on average, compared to the single model. The paper’s findings will provide a basis for selecting a high-precision prediction model for building load forecasting.