I am a Senior Research Scientist at Google DeepMind where I work on generative modeling, self-supervised learning and multimodal learning in the video domain. I am interested in effectively incorporating the time dimension to learn more general representations towards the goal of compositional generalization. I received my PhD from the Computer Science & Engineering Department at the University of Michigan, Ann Arbor under the supervision of Professor Honglak Lee. During my PhD, I mainly focused on building models for future frame prediction using self-supervised and supervised approaches. I also contributed to building world models successfully applied in model-based reinforcement learning.
Fun Facts: I played for my national basketball team (I am originally from Ecuador). I was also second best scorer in the nation in a national championship I played back in the day. I was part of a team that beat the media's projected champion during a championship in Quito (the guy that was best scorer in the national championship played for the other team :P). Let's have a Curry-range 3-point shootout. Ok, I'll stop now ...
ViC-MAE: Self-Supervised Representation Learning
from Images and Video with Contrastive Masked Autoencoders. New! In European Conference on Computer Vision (ECCV), 2024
|
Text Prompting for Multi-Concept Video Customization by Autoregressive Generation. New!
In AI4CC Workshop at CVPR, 2024
|
StoryBench: A Multifaceted Benchmark for Continuous Story Visualization
In Neural Information Processing Systems (NeurIPS) Datasets and Benchmarks Track, 2023
|
Phenaki: Variable Length Video Generation From Open Domain Textual Description
In International Conference on Learning Representations (ICLR), 2023
|
RiCS: A 2D Self-Occlusion Map for Harmonizing Volumetric Objects Best Paper Award - Runner up
In CVPR Workshop in AI for Content Creation, 2022
|
Task-Generic Hierarchical Human Motion Prior using VAEs
In International Conference on 3D Vision (3DV), 2021
|
Contact-Aware Retargeting of Skinned Motion
In International Conference on Computer Vision (ICCV), 2021
|
Stochastic Scene-Aware Motion Prediction
In International Conference on Computer Vision (ICCV), 2021
|
Single-image Full-body Human Relighting
In Eurographics Symposium on Rendering (EGSR), 2021
|
Contact and Human Dynamics from Monocular Video Spotlight
In European Conference on Computer Vision (ECCV), 2020
Project page PDF ArXiv (Spotlight acceptance rate: 5%) |
High Fidelity Video Prediction with Large Stochastic Recurrent Neural Networks
In Advances in Neural Information Processing Systems (NeurIPS), 2019
|
Unsupervised Learning of Object Structure and Dynamics from Videos
In Advances in Neural Information Processing Systems (NeurIPS), 2019
|
Learning Latent Dynamics for Planning from Pixels
In Proceedings of the 36th International Conference on Machine Learning (ICML), 2019
|
MT-VAE: Learning Motion Transformations to Generate Multimodal Human Dynamics
In European Conference on Computer Vision (ECCV), 2018
|
Hierarchical Long-term Video Prediction without Supervision
In Proceedings of the 35th International Conference on Machine Learning (ICML), 2018
|
Neural Kinematic Networks for Unsupervised Motion Retargetting Oral Presentation
In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018
Project page PDF ArXiv (Oral acceptance rate: 2.1%) |
Learning to Generate Long-term Future via Hierarchical Prediction
In Proceedings of the 34th International Conference on Machine Learning (ICML), 2017
|
Decomposing Motion and Content for Natural Video Sequence Prediction
In International Conference on Learning Representations (ICLR), 2017
|
Improving Object Detection with Deep Convolutional Networks via Bayesian Optimization and Structured Prediction Oral Presentation
In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015
Project page PDF ArXiv (Oral acceptance rate: 3.3%) |
Who Do I Look Like? Determining Parent-Offspring Resemblance via Genetic Features
In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014
|