AJ Piergiovanni

I am a PhD student in Computer Science at Indiana University advised by Dr. Michael Ryoo. I work on computer vision, machine learning and robotics. I'm interested in the activity detection tasks, and especially enjoy applications to sports videos.

I am currently an intern at Google Brain.

In 2015, I recieved a BS in Computer Science and Mathematics from Rose-Hulman Institute of Technology. In my free time, I climb mountains.

ajpiergi@indiana.edu | GitHub | Google Scholar


  • AJ Piergiovanni, A. Angelova and M. S. Ryoo, "Learning Differentiable Grammars for Continuous Data", arXiv:1902.00505, February 2019. [arXiv]
  • AJ Piergiovanni, A. Angelova, A. Toshev, and M. S. Ryoo, "Evolving Space-Time Neural Architectures for Videos", arXiv:1811.10636, November 2018. [arXiv]
  • AJ Piergiovanni, and M. S. Ryoo, “Representation Flow for Action Recognition”, arXiv:1810.01455, October 2018. [arXiv] [project page]
  • AJ Piergiovanni, A. Wu, and M. S. Ryoo, "Learning Real-World Robot Policies by Dreaming", arXiv:1805.07813, May 2018. [arXiv][project page]
  • AJ Piergiovanni and M. S. Ryoo, "Unseen Action Recognition with Multimodal Learning", arXiv:1806.08251, June 2018. [arXiv]
  • AJ Piergiovanni and M. S. Ryoo, "Fine-grained Activity Recognition in Baseball Videos", CVPR Workshop on Computer Vision in Sports (CVsports), June 2018. [arXiv] [github code]
  • AJ Piergiovanni and M. S. Ryoo, "Temporal Gaussian Mixture Layer for Videos", arXiv:1803.06316, March 2018. [arXiv]
  • AJ Piergiovanni and M. S. Ryoo, "Learning Latent Super-Events to Detect Multiple Activities in Videos", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018. [arXiv] [github code]
  • AJ Piergiovanni*, C. Fan*, and M. S. Ryoo, "Learning Latent Sub-events in Activity Videos Using Temporal Attention Filters", the 31st AAAI Conference on Artificial Intelligence (AAAI), February 2017. [arXiv] [github code]


MLB-YouTube: A dataset for activity recogntion in continuous and segmented baseball videos. Also included is dense text annotations from the commentators to allow for video captions and learning video and language relationships.