Shuo Yang

I have been working under Prof. Sriraam Natarajan in the machine learning field for 6 years and obtained my PhD degree from Indiana University, School of Informatics and Computing in 2017. My research focuses on machine learning and its application in medical domains, including knowledge-based learning, cost-sensitive learning, dynamic probabilistic models, continuous-time probabilistic logic models and statistical relational learning in hybrid domains.

paradiso banner


Shuo Yang, Mohammed Korayem, Khalifeh AlJadda, Trey Grainger and Sriraam Natarajan, Combining content-based and collaborative filtering for job recommendation system: A cost-sensitive statistical relational learning approach, Knowledge-Based Systems 136C (2017) pp. 37-45. [Impact Factor: 4.529] Code

Shuo Yang, Fabian Hadiji, Kristian Kersting, Shaun Grannis and Sriraam Natarajan, Modeling Heart Procedures from EHRs: An Application of Exponential Families, IEEE International Conference on Bioinformatics and Biomedicine (IEEE BIBM), 2017. [Acceptance rate: 19.0%]

Nandini Ramanan, Shuo Yang, Shaun Grannis and Sriraam Natarajan, Discriminative Boosted Bayes Networks for Learning Multiple Cardiovascular Procedures, IEEE International Conference on Bioinformatics and Biomedicine (IEEE BIBM), 2017. [Acceptance rate (short): 19.6%]

Shuo Yang, Somdeb Sarkhel, Saayan Mitra and Viswanathan Swaminathan, Personalized Video Recommendations for Shared Accounts, IEEE International Symposium on Multimedia (ISM), 2017.

Shuo Yang, Tushar Khot, Kristian Kersting and Sriraam Natarajan, Learning Continuous-Time Bayesian Networks in Relational Domains: A Non-Parametric Approach, 30th AAAI Conference on Artificial Intelligence (AAAI), 2016. [Acceptance rate: 26%, oral presentation] Code

Haley MacLeod, Shuo Yang, Kim Oakes, Kay Connelly and Sriraam Natarajan, Identifying Rare Diseases from Behavioural Data: A Machine Learning Approach, First IEEE Conference on Connected Health: Applications, Systems and Engineering Technologies (CHASE), 2016. Code

Shuo Yang, Kristian Kersting, Greg Terry, Jeffrey Carr and Sriraam Natarajan, Modeling Coronary Artery Calcification Levels From Behavioral Data in a Clinical Study, Artificial Intelligence in Medicine (AIME), 2015.

Shuo Yang, Tushar Khot, Kristian Kersting, Gautam Kunapuli, Kris Hauser and Sriraam Natarajan, Learning from Imbalanced Data in Relational Domains: A Soft Margin Approach, International Conference on Data Mining (ICDM), 2014. [(short) Overall acceptance rate: 19.53%] Code

Shuo Yang and Sriraam Natarajan, Knowledge Intensive Learning: Combining Qualitative Constraints with Causal Independence for Parameter Learning in Probabilistic Models, European Conference on Machine Learning (ECMLPKDD), 2013. [Acceptance rate: 27.7%]

Shuo Yang and Desong Bian, Automatic Detection of T-wave End in ECG Signals, International Symposium on Intelligent Information Technology Application, 2008.

Shuo Yang and Desong Bian, Automatic Detection of QRS Onset in ECG Signals, IEEE International Symposium on IT in Medicine and Education, 2008.


Knowledge-based Learning

In many domains where there are considerable amount of factors influencing the target variable, the dimension of the parameter space for probabilistic models is exponential in the number of variables, which would require significant amount of training samples to guarantee a reasonable prediction accuracy. For this project, we proposed a way to incorporate the domain knowledge on the independence of causal influence and qualitative constraints which greatly improves the prediction performance by reducing the dimension of feature space as well as constraining the searching space.


Cost-Sensitive Learning

In this project, we consider the problem of incorporating the domain knowledge on different weights of positive samples and negative samples. One of the motivations is the class-imbalance situation in many relational domains where the classifier boundary could be easily dominated by the majority class and overfitting on its outliers. Hence, it is essential to steer the training process toward focusing more on the minority class by assigning different costs on false positive and false negative samples. Besides the requirement enforced by such data properties, there are also practical demands in certain domains, such as the diagnosis problem in medical domains, the quality checking in manufacturing data, the recommendation prediction in recommender systems, etc.


Sequence Data Mining

In most realistic domains, the variables transit between its possible states over time. The data is generated by the dynamic processes with multiple observations at different time points. Dynamic models are needed for modeling such transition intensities over time.


Office Location

  • 919 E 13th St
  • Bloomington, IN 47408


  • shuoyang (at)

Social links