Jeremy J. Yang
Informatics Ph.D. candidate, Cheminformatics Track, Data Science Minor
Indiana University School of Informatics & Computing
Integrative Data Science Lab
I am a graduate student pursuing a PhD in
Informatics (cheminformatics track) advised by
by Prof. David Wild, who leads the
Integrative Data Science Lab (IDSL)
in the Indiana University School of
Informatics and Computing.
The IDSL focuses on the development of algorithms and tools for large scale
integrative data mining of drug discovery, chemical & biological data,
with emphasis on semantic web technologies such as in the projects
I have also been involved in
a startup data science based company founded by professors David Wild and
I am also a research scientist in the
University of New Mexico
Translational Informatics Division,
focused on biomolecular and biomedical data science, and developing
computational and informatics to support this research, including
screening informatics support for the
Illuminating the Druggable
Genome (IDG), and DrugCentral.
See our public web apps.
- DrugCentral 2018: an update ,
Ursu O, et al., Nucleic Acids Research, doi:10.1093/nar/gky963, 29 October 2018.
- Unexplored therapeutic opportunities in the human genome, TI Oprea et al., Nature Reviews Drug Disc (2018), doi:10.1038/nrd.2018.14.
- Drug target ontology to classify and integrate drug discovery data, Lin et al., Journal of Biomedical Semantics (2017) 8:50 DOI 10.1186/s13326-017-0161-x.
- "PIBAS FedSPARQL: a web-based platform for integration and exploration of bioinformatics datasets", Djokic-Petrovic et al., J Biomed Semantics (2017), 8:42, doi:10.1186/s13326-017-0151-z.
Formalizing drug indications on the road to therapeutic intent, SJ Nelson et al., J Am Med Inform Assoc 2017, doi: 10.1093/jamia/ocx064.
TIN-X: Target Importance and Novelty Explorer, DC Cannon, JJ Yang, SL Mathias, O Ursu, S Mani, A Waller, SC Schürer, LJ Jensen, LA Sklar, CG Bologa, TI Oprea, Bioinformatics, 2017, btx200, doi: 10.1093/bioinformatics/btx200.
Pharos: Collating protein information to shed light on the druggable genome, Nguyen et al., Nucl Acids Res (2016), DOI:10.1093/nar/gkw1072.
DrugCentral: online drug compendium, Ursu O, Holmes J, Knockel J, Bologa CG, Yang JJ, Mathias SL, Nelson SJ, Oprea TI, Nucleic Acids Res. (2016), DOI: 10.1093/nar/gkw993.
- Badapple: promiscuity patterns from noisy evidence ,
Yang JJ, Ursu O, Lipinski CA, Sklar LA, Oprea TI Bologa CG,
J. Cheminfo. 8:29 (2016), DOI: 10.1186/s13321-016-0137-3.
- Novel Phenotypic Outcomes Identified for a Public Collection of Approved Drugs from a Publicly Accessible Panel of Assays
, Lee, et al, (2015) PLoS ONE 10(7): e0130796. DOI: 10.1371/journal.pone.0130796.
BioAssay Research Database (BARD): chemical biology and probe-development enabled by structured metadata and result types,
Howe, et al., Nucleic Acids Res. 2015 Jan; 43:D1163-70. DOI: 10.1093/nar/gku1244.
An Overview of the
Challenges in Designing, Integrating, and Delivering BARD: A Public Chemical-Biology Resource and
Query Portal for Multiple Organizations, Locations, and Disciplines,
de Souza, et al., J Biomol Screen January 17, 2014, DOI: 10.1177/1087057113517139.
The CARLSBAD Database:
A Confederated Database of Chemical Bioactivities,
S. L. Mathias, J. Hines-Kay, J. J. Yang, G. Zahoransky-Kohalmi, C. G. Bologa, O. Ursu and T. I. Oprea,
Database, 2013, bat044, DOI: 10.1093/database/bat044.
Drugs, Targets and Clinical Outcomes into an Integrated Network
Affords a New Platform for Computer-Aided Drug Repurposing;
Tudor I. Oprea, Sonny Kim Nielsen, Oleg Ursu, Jeremy J. Yang,
Olivier Taboureau, Stephen L. Mathias, Irene Kouskoumvekaki,
Larry A. Sklar, Cristian G. Bologa, J. Mol. Info., 30 (2-3), 100-111, 2011,
Analysis and hit filtering of a very large library of compounds screened against Mycobacterium tuberculosis, Ekins, et al., Mol. BioSyst., 2010, 6, 2316-2324,
- Bibliological data science and drug discovery, ACS National Meeting in Philadelphia, Aug 21, 2016.
- The Language Diversity of Computing, UNM Biomedical Info Seminar Series, Oct 15, 2015.
- Molecular scaffolds are special and useful guides for discovery, ACS National Meeting, Sept. 8, 2013, Indianapolis, IN.
- The BADAPPLE promiscuity plugin for BARD: Evidence-based promiscuity scores, ACS National Meeting, Sept. 9, 2013, Indianapolis, IN.
- How am I supposed to organize a protein database when I can't even organize my address book?, CINF Flash session, ACS National Meeting, March 25, 2012, San Diego, CA.
- Cheminformatics Software Development Case Studies, guest lecture given via webcast to SOIC I571 "Chemical Information Technology" class on Oct 24, 2011.
- UNMCMD Screening Informatics, guest lecture given via webcast to SOIC I571 "Chemical Information Technology" class on Nov 21, 2011.
- Applications in Biocomputing, UNM Cyberinfrastructure Day, April 22, 2010.
- Open Phenotypic Drug Discovery Resource, Open PHACTS: Linking life science data, Feb 18-19, 2016, Vienna, Austria.
- Development of a Screening Informatics System at the UNM Center for Molecular Discovery, ACS National Meeting, March 26, 2012, San Diego, CA.
- CARLSBAD: Confederated Annotated Research Libraries of Small-molecule Biological Activity Data, OpenEye CUP meeting, Santa Fe, NM, 2012.
- UNM Division of Biocomputing public web applications: Computational tools for cheminformatics and molecular discovery, ChemAxon US User Group Meeting, Boston, September 13-15, 2010.
- INFO_I-590, "Topics in Informatics: Data Science for Drug
Discovery, Health and
Translational Medicine". This innovative data science course was
introduced by Prof. Wild in 2013 and updated for 2017
by Prof. Joanne Luciano, assisted by JT Wolohan and myself as
- INFO_I-590, "Applied Data Science ".
The classic data science workflow is the framework for understanding
how to apply data wrangling, semantics, machine learning and other skills
in realistic scenarios. Developed by
Prof. Joanne Luciano in 2017, assisted by JT Wolohan, Kaicheng Yang,
and myself as associate instructor.
- INFO_I-590, "Real World Data Science ".
Developed by Prof. Wild, Prof. Luciano, and industry partner Sara Bigelow,
in collaboration with Lilly, with de-identified clinical trials datasets,
employing analysis tools KNIME and Tableau. Associate instructor, spring 2018.
"It is easy to lie with statistics. It is hard to tell the truth without statistics." - Andrejs Dunkels